From kraemerf at de.ibm.com Tue Jan 3 16:12:26 2017 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Tue, 3 Jan 2017 17:12:26 +0100 Subject: [gpfsug-discuss] SAVE THE DATE - IBM Spectrum Scale (GPFS) Strategy Days 2017, Stuttgart/Ehningen, Germany In-Reply-To: References: Message-ID: Save the Date - as there is a large request for a German speaking Spectrum Scale event here is the next event. Am 8. - 9. M?rz 2017 finden die #WhatsUp IBM - Spectrum Scale Strategy Days Expertentage 2017 statt. Sehr geehrte Damen und Herren, das Team der Konferenz l?dt Sie herzlich ein, an dieser kostenfreien Veranstaltung auf dem IBM Campus in Ehningen (bei Stuttgart) teilzunehmen. Die Expertentage stehen unter dem Leitgedanken, sowohl technische Neuerungen und Funktionen im Detail zu erl?utern, als auch praktische Tipps und Erfahrungen aus Projekten auszutauschen. Aufgrund der regen Nachfrage und vorliegenden Themen werden sich auch die diesj?hrigen Expertentage ?ber zwei Tage erstrecken, um somit den komplexen neuen Funktionen sowie auch dem Erfahrungsaustausch unter Kollegen und anwesenden Experten zwischen den Vortr?gen entsprechend gerecht zu werden.? Die Veranstaltung richtet sich an alle, die die M?glichkeiten von Spectrum Scale innerhalb kurzer Zeit besser nutzen m?chten und/oder sich ?ber die mit Spectrum Scale gemachten Erfahrungen austauschen wollen. Das zweit?gige Programm der Expertentage informiert neben Produktupdates, technischen Details und Serviceangeboten auch ?ber zuk?nftige Releases. Die genaue Programm?bersicht kommt ab Mitte Januar 2017 auf die Registrierungsseite. Anmeldung ist aber schon m?glich unter: 1) Anmeldelink f?r Expertentage 2017 https://www.ibm.com/events/wwe/grp/grp312.nsf/Registration.xsp?openform&seminar=Z9AH7POE&locale=de_DE Beginn 8. M?rz 2017 um 10:00 Uhr, Ende am 9. M?rz gegen 16:00 Uhr Sie oder ihre Kollegen besch?ftigen sich erstmalig mit Spectrum Scale oder m?chten ihr Basis Wissen auffrischen ? F?r Spectrum Scale Einsteiger bieten wir am 7. M?rz zus?tzlich einen Tag an, an dem die Grundlagen von Spectrum Scale und Elastic Storage Server vermittelt werden. 2) Anmeldelink f?r Einsteigertag 2017 https://www.ibm.com/events/wwe/grp/grp312.nsf/Registration.xsp?openform&seminar=3ACDRTOE&locale=de_DE Beginn am 7. M?rz 2017 um 10:00 Uhr, Ende gegen 17:00 Uhr TEILNEHMERKREIS: Kunden, IBM Vertriebspartner und IBM Mitarbeiter mit fundiertem Spectrum Scale (GPFS) Basiswissen. Es ist ein Workshop von Experten f?r Experten. Die Teilnahme an dem Workshop ist kostenfrei. Sprache ist Deutsch. Ort der Veranstaltung: IBM Deutschland GmbH , IBM-Allee 1 (Navigationssystem: Am Keltenwald 1), 71139 Ehningen (bei Stuttgart) IBM Spectrum Scale (GPFS) ist eine bew?hrte, skalierbare und hochleistungsf?hige L?sung f?r Daten-, Objekt- und Dateimanagement, die in vielen Branchen weltweit intensiv eingesetzt wird. Spectrum Scale bietet vereinfachtes Datenmanagement und integrierte Tools f?r den Informationslebenszyklus, die mehrere Petabytes an Daten und Milliarden Dateien verwalten k?nnen. IBM Spectrum Scale Version 4, das softwaredefinierte Speichersystem f?r die Cloud, f?r Big Data, High Performance Computing und Analysen, bietet erweiterte Sicherheit, Leistungsverbesserungen durch Flash-Speicher Integration und h?here Benutzerfreundlichkeit f?r weltweit operierende Unternehmen, die mit anspruchsvollen und datenintensiven Anwendungen arbeiten. Das Konferenz Team freut sich auf Sie: - Heiko Lehmann, mailto:heiko.lehmann at de.ibm.com - Olaf Weiser, mailto:olaf.weiser at de.ibm.com - Ulf Troppens, mailto:troppens at de.ibm.com - Frank Kraemer, mailto:kraemerf at de.ibm.com - Goetz Mensel, mailto:goetz.mensel at de.ibm.com Appendix: Redbooks/Redpapers Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/redpapers/pdfs/redp5411.pdf IBM Spectrum Scale Security http://www.redbooks.ibm.com/redpieces/pdfs/redp5426.pdf IBM Spectrum Archive Enterprise Edition V1.2.2: Installation and Configuration Guide http://www.redbooks.ibm.com/redpieces/pdfs/sg248333.pdf Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach mailto:kraemerf at de.ibm.com voice: +49-(0)171-3043699 / +4970342741078 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Jan 3 20:27:17 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 3 Jan 2017 14:27:17 -0600 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> this follows the IP what ever node the ip lands on. the ganesha.nfsd process seems to stop working. any ideas? there is nothing helpful in the logs. time mount ces200:/vol/aggr14/temp403 /mnt/test mount.nfs: mount system call failed real 1m0.000s user 0m0.000s sys 0m0.010s From Valdis.Kletnieks at vt.edu Tue Jan 3 21:00:44 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Tue, 03 Jan 2017 16:00:44 -0500 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> Message-ID: <177090.1483477244@turing-police.cc.vt.edu> On Tue, 03 Jan 2017 14:27:17 -0600, Matt Weil said: > this follows the IP what ever node the ip lands on. the ganesha.nfsd > process seems to stop working. any ideas? there is nothing helpful in > the logs. Does it in fact "stop working", or are you just having a mount issue? Do already existing mounts work? Does 'ps' report the process running? Any log messages? > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s Check the obvious stuff first. Is temp403 exported to your test box? Does tcpdump/wireshark show the expected network activity? Does wireshark flag any issues? Is there a firewall issue (remember to check *both* ends :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From abeattie at au1.ibm.com Tue Jan 3 22:19:20 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Tue, 3 Jan 2017 22:19:20 +0000 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu>, <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov><28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu><4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov><0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov><5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu><5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu><45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Tue Jan 3 22:40:48 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Tue, 03 Jan 2017 22:40:48 +0000 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu>, <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov><28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu><4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov><0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov><5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu><5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu><45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Andrew, You may have been stung by: 2.34 What considerations are there when running on SELinux? https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux I've see this issue on a customer site myself. Matt, Could you increase the logging verbosity and check the logs further? As per http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm -- Lauz On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie wrote: >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Tue Jan 3 22:56:48 2017 From: Valdis.Kletnieks at vt.edu (Valdis Kletnieks) Date: Tue, 03 Jan 2017 17:56:48 -0500 Subject: [gpfsug-discuss] What is LTFS/EE now called, and what version should I be on? Message-ID: <186951.1483484208@turing-police.cc.vt.edu> So we have GPFS Advanced 4.2.1 installed, and the following RPMs: % rpm -qa 'ltfs*' | sort ltfsle-2.1.6.0-9706.x86_64 ltfsle-library-2.1.6.0-9706.x86_64 ltfsle-library-plus-2.1.6.0-9706.x86_64 ltfs-license-2.1.0-20130412_2702.x86_64 ltfs-mig-1.2.1.1-10232.x86_64 What release of "Spectrum Archive" does this correspond to, and what release do we need to be on if I upgrade GPFS to 4.2.2.1? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From janfrode at tanso.net Tue Jan 3 23:14:21 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 4 Jan 2017 00:14:21 +0100 Subject: [gpfsug-discuss] What is LTFS/EE now called, and what version should I be on? In-Reply-To: <186951.1483484208@turing-police.cc.vt.edu> References: <186951.1483484208@turing-police.cc.vt.edu> Message-ID: This looks like Spectrum Archive v1.2.1.0 (Build 10230). Newest version available on fixcentral is v1.2.2.0, but it doesn't support GPFS v4.2.2.x yet. -jf On Tue, Jan 3, 2017 at 11:56 PM, Valdis Kletnieks wrote: > So we have GPFS Advanced 4.2.1 installed, and the following RPMs: > > % rpm -qa 'ltfs*' | sort > ltfsle-2.1.6.0-9706.x86_64 > ltfsle-library-2.1.6.0-9706.x86_64 > ltfsle-library-plus-2.1.6.0-9706.x86_64 > ltfs-license-2.1.0-20130412_2702.x86_64 > ltfs-mig-1.2.1.1-10232.x86_64 > > What release of "Spectrum Archive" does this correspond to, > and what release do we need to be on if I upgrade GPFS to 4.2.2.1? > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 4 01:21:34 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 3 Jan 2017 19:21:34 -0600 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: nsds and ces nodes are RHEL 7.3 nfsv3 clients are old ubuntu lucid. we finally just removed the IP that seemed to... when moved to a ces node caused it to stop responding. it hung up a few more times but has been working fine now for the last few hours. maybe a bad client apple out there finally gave up ;-) PMR 50787 122 000 waiting on IBM. On 1/3/17 4:19 PM, Andrew Beattie wrote: > Matt > > What Operating system are you running? > > I have an open PMR at present with something very similar > when ever we publish an NFS export via the protocol nodes the nfs > service stops, although we have no issues publishing SMB exports. > > I"m waiting on some testing by the customer but L3 support have > indicated that they think there is a bug in the SElinux code, which is > causing this issue, and have suggested that we disable SElinux and try > again. > > My clients environment is currently deployed on Centos 7. > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Matt Weil > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding > Date: Wed, Jan 4, 2017 6:27 AM > > this follows the IP what ever node the ip lands on. the ganesha.nfsd > process seems to stop working. any ideas? there is nothing > helpful in > the logs. > > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s > user 0m0.000s > sys 0m0.010s > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 4 01:29:36 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 3 Jan 2017 19:29:36 -0600 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Message-ID: On 1/3/17 4:40 PM, Laurence Horrocks-Barlow wrote: > Andrew, > > You may have been stung by: > > 2.34 What considerations are there when running on SELinux? > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux se is disabled here. Also if you strace the parent ganesha.nfsd process it dies. Is that a bug? > > I've see this issue on a customer site myself. > > > Matt, > > Could you increase the logging verbosity and check the logs further? > As per > http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm yes bumped it to the max of 3 not much help. > > -- Lauz > > On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie > wrote: > > Matt > > What Operating system are you running? > > I have an open PMR at present with something very similar > when ever we publish an NFS export via the protocol nodes the nfs > service stops, although we have no issues publishing SMB exports. > > I"m waiting on some testing by the customer but L3 support have > indicated that they think there is a bug in the SElinux code, > which is causing this issue, and have suggested that we disable > SElinux and try again. > > My clients environment is currently deployed on Centos 7. > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Matt Weil > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding > Date: Wed, Jan 4, 2017 6:27 AM > > this follows the IP what ever node the ip lands on. the > ganesha.nfsd > process seems to stop working. any ideas? there is nothing > helpful in > the logs. > > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s > user 0m0.000s > sys 0m0.010s > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Wed Jan 4 02:16:54 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Tue, 03 Jan 2017 21:16:54 -0500 Subject: [gpfsug-discuss] What is LTFS/EE now called, and what version should I be on? In-Reply-To: References: <186951.1483484208@turing-police.cc.vt.edu> Message-ID: <200291.1483496214@turing-police.cc.vt.edu> On Wed, 04 Jan 2017 00:14:21 +0100, Jan-Frode Myklebust said: > This looks like Spectrum Archive v1.2.1.0 (Build 10230). Newest version > available on fixcentral is v1.2.2.0, but it doesn't support GPFS v4.2.2.x > yet. That's what I was afraid of. OK, shelve that option, and call IBM for the efix. (The backstory: IBM announced a security issue in GPFS: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009639&myns=s033&mynp=OCSTXKQY&mynp=OCSWJ00&mync=E&cm_sp=s033-_-OCSTXKQY-OCSWJ00-_-E A security vulnerability has been identified in IBM Spectrum Scale (GPFS) that could allow a remote authenticated attacker to overflow a buffer and execute arbitrary code on the system with root privileges or cause the server to crash. This vulnerability is only applicable if: - file encryption is being used - the key management infrastructure has been compromised -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From rkomandu at in.ibm.com Wed Jan 4 07:17:25 2017 From: rkomandu at in.ibm.com (Ravi K Komanduri) Date: Wed, 4 Jan 2017 12:47:25 +0530 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu><28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu><4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov><0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov><5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu><5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu><45b19a50-bb70-1025-71ea-80a260623712@wustl.edu><0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Message-ID: My two cents, Have the SELinux enabled on my RH7.3 cluster (where CES nodes are RH 7,3). GPFS latest version(4.2.2) is on the cluster. Non SELinux env, should mount w/o issues as well Tried mounting for 50 iters as V3 for 2 different mounts from 4 client nodes. Ran successfully. My client nodes are RH/SLES clients Could you elaborate further. With Regards, Ravi K Komanduri From: Matt Weil To: Date: 01/04/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES nodes mount nfsv3 not responding Sent by: gpfsug-discuss-bounces at spectrumscale.org On 1/3/17 4:40 PM, Laurence Horrocks-Barlow wrote: Andrew, You may have been stung by: 2.34 What considerations are there when running on SELinux? https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux se is disabled here. Also if you strace the parent ganesha.nfsd process it dies. Is that a bug? I've see this issue on a customer site myself. Matt, Could you increase the logging verbosity and check the logs further? As per http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm yes bumped it to the max of 3 not much help. -- Lauz On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie wrote: Matt What Operating system are you running? I have an open PMR at present with something very similar when ever we publish an NFS export via the protocol nodes the nfs service stops, although we have no issues publishing SMB exports. I"m waiting on some testing by the customer but L3 support have indicated that they think there is a bug in the SElinux code, which is causing this issue, and have suggested that we disable SElinux and try again. My clients environment is currently deployed on Centos 7. Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Matt Weil Sent by: gpfsug-discuss-bounces at spectrumscale.org To: Cc: Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding Date: Wed, Jan 4, 2017 6:27 AM this follows the IP what ever node the ip lands on. the ganesha.nfsd process seems to stop working. any ideas? there is nothing helpful in the logs. time mount ces200:/vol/aggr14/temp403 /mnt/test mount.nfs: mount system call failed real 1m0.000s user 0m0.000s sys 0m0.010s _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 4 09:06:29 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 4 Jan 2017 09:06:29 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: , Message-ID: Simon, Is this PMR still open or was the issue resolved? I'm very interested to know as 4.2.2 is on my roadmap. Thanks Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 20 December 2016 17:14 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB issues Nope, just lots of messages with the same error, but different folders. I've opened a pmr with IBM and supplied the usual logs. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Christof Schmitt [christof.schmitt at us.ibm.com] Sent: 19 December 2016 17:31 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB issues >From this message, it does not look like a known problem. Are there other messages leading up to the one you mentioned? I would suggest reporting this through a PMR. Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/19/2016 08:37 AM Subject: [gpfsug-discuss] SMB issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Wed Jan 4 10:20:30 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 4 Jan 2017 10:20:30 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: Its still open. I can say we are happily running 4.2.2, just not the SMB packages that go with it. So the GPFS part, I wouldn't have thought would be a problem to upgrade. Simon On 04/01/2017, 09:06, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A" wrote: >Simon, > >Is this PMR still open or was the issue resolved? I'm very interested to >know as 4.2.2 is on my roadmap. > >Thanks >Richard > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon >Thompson (Research Computing - IT Services) >Sent: 20 December 2016 17:14 >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] SMB issues > > >Nope, just lots of messages with the same error, but different folders. > >I've opened a pmr with IBM and supplied the usual logs. > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org >[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Christof Schmitt >[christof.schmitt at us.ibm.com] >Sent: 19 December 2016 17:31 >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] SMB issues > >From this message, it does not look like a known problem. Are there other >messages leading up to the one you mentioned? > >I would suggest reporting this through a PMR. > >Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ >christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) > > > >From: "Simon Thompson (Research Computing - IT Services)" > >To: "gpfsug-discuss at spectrumscale.org" > >Date: 12/19/2016 08:37 AM >Subject: [gpfsug-discuss] SMB issues >Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > >Hi All, > >We upgraded to 4.2.2.0 last week as well as to >gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. > >We've since been getting random users reporting that they get access >denied errors when trying to access folders. Some seem to work fine and >others not, but it seems to vary and change by user (for example this >morning, I could see all my folders fine, but later I could only see >some). From my Mac connecting to the SMB shares, I could connect fine to >the share, but couldn't list files in the folder (I guess this is what >users were seeing from Windows as access denied). > >In the log.smbd, we are seeing errors such as this: > >[2016/12/19 15:20:40.649580, 0] >../source3/lib/sysquotas.c:457(sys_get_quota) > sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! > > > >Reverting to the previous version of SMB we were running >(gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. > >Before I log a PMR, has anyone else seen this behaviour or have any >suggestions? > >Thanks > >Simon > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From laurence at qsplace.co.uk Wed Jan 4 17:13:50 2017 From: laurence at qsplace.co.uk (laurence at qsplace.co.uk) Date: Wed, 04 Jan 2017 17:13:50 +0000 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Message-ID: Hi Matt, The only time I've seen strace "crash" ganesha is when having selinux enabled which ofc was related to selinux. Have you also changed NFS's logging level (also in the link given)? Check the current level with: mmnfs configuration list | grep LOG_LEVEL I find INFO or DEBUG enough to get just that little extra nugget of information you need, however if that's already at FULL_DEBUG and your still not finding anything helpful it might be time to log a PMR. --Lauz On 2017-01-04 01:29, Matt Weil wrote: > On 1/3/17 4:40 PM, Laurence Horrocks-Barlow wrote: > >> Andrew, >> >> You may have been stung by: >> >> 2.34 What considerations are there when running on SELinux? >> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux [1] > se is disabled here. > Also if you strace the parent ganesha.nfsd process it dies. Is that a bug? > >> I've see this issue on a customer site myself. >> >> Matt, >> >> Could you increase the logging verbosity and check the logs further? As per >> http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm [2] > yes bumped it to the max of 3 not much help. > > -- Lauz > > On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie wrote: > > Matt > > What Operating system are you running? > > I have an open PMR at present with something very similar > when ever we publish an NFS export via the protocol nodes the nfs service stops, although we have no issues publishing SMB exports. > > I"m waiting on some testing by the customer but L3 support have indicated that they think there is a bug in the SElinux code, which is causing this issue, and have suggested that we disable SElinux and try again. > > My clients environment is currently deployed on Centos 7. > > Andrew Beattie > Software Defined Storage - IT Specialist > > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > ----- Original message ----- > From: Matt Weil > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding > Date: Wed, Jan 4, 2017 6:27 AM > > this follows the IP what ever node the ip lands on. the ganesha.nfsd > process seems to stop working. any ideas? there is nothing helpful in > the logs. > > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s > user 0m0.000s > sys 0m0.010s > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss [3] -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [3] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [3] Links: ------ [1] https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux [2] http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm [3] http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Wed Jan 4 17:55:13 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Wed, 4 Jan 2017 12:55:13 -0500 Subject: [gpfsug-discuss] strange mmchnsd error? Message-ID: [root at cl001 ~]# cat chnsd_home_flh %nsd: nsd=r10f1e5 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e5 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 %nsd: nsd=r10f1e6 servers=cl006,cl007,cl008,cl001,cl002,cl003,cl004,cl005 %nsd: nsd=r10f6e6 servers=cl005,cl006,cl007,cl008,cl001,cl002,cl003,cl004 %nsd: nsd=r10f1e7 servers=cl004,cl005,cl006,cl007,cl008,cl001,cl002,cl003 %nsd: nsd=r10f6e7 servers=cl003,cl004,cl005,cl006,cl007,cl008,cl001,cl002 %nsd: nsd=r10f1e8 servers=cl002,cl003,cl004,cl005,cl006,cl007,cl008,cl001 %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 %nsd: nsd=r10f1e9 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e9 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 [root at cl001 ~]# mmchnsd -F chnsd_home_flh mmchnsd: Processing disk r10f6e5 mmchnsd: Processing disk r10f6e6 mmchnsd: Processing disk r10f6e7 mmchnsd: Processing disk r10f6e8 mmchnsd: Node cl005.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl006.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl007.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl008.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Error found while processing stanza %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 mmchnsd: Processing disk r10f1e9 mmchnsd: Processing disk r10f6e9 mmchnsd: Command failed. Examine previous error messages to determine cause. I comment out the r10f6e8 line and then it completes? I have some sort of fabric san issue: [root at cl005 ~]# for i in {1..8}; do ssh cl00$i lsscsi -s | grep 38xx | grep 1.97 | wc -l; done 80 80 80 80 68 72 70 72 but i'm suprised removing one line allows it to complete. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Jan 4 17:58:25 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 4 Jan 2017 17:58:25 +0000 Subject: [gpfsug-discuss] strange mmchnsd error? In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DCF61@CHI-EXCHANGEW1.w2k.jumptrading.com> ENODEV usually means that the disk device was not found on the server(s) in the server list. In this case c100[5-8] do not apparently have access to r10f6e8, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Wednesday, January 04, 2017 11:55 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] strange mmchnsd error? [root at cl001 ~]# cat chnsd_home_flh %nsd: nsd=r10f1e5 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e5 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 %nsd: nsd=r10f1e6 servers=cl006,cl007,cl008,cl001,cl002,cl003,cl004,cl005 %nsd: nsd=r10f6e6 servers=cl005,cl006,cl007,cl008,cl001,cl002,cl003,cl004 %nsd: nsd=r10f1e7 servers=cl004,cl005,cl006,cl007,cl008,cl001,cl002,cl003 %nsd: nsd=r10f6e7 servers=cl003,cl004,cl005,cl006,cl007,cl008,cl001,cl002 %nsd: nsd=r10f1e8 servers=cl002,cl003,cl004,cl005,cl006,cl007,cl008,cl001 %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 %nsd: nsd=r10f1e9 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e9 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 [root at cl001 ~]# mmchnsd -F chnsd_home_flh mmchnsd: Processing disk r10f6e5 mmchnsd: Processing disk r10f6e6 mmchnsd: Processing disk r10f6e7 mmchnsd: Processing disk r10f6e8 mmchnsd: Node cl005.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl006.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl007.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl008.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Error found while processing stanza %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 mmchnsd: Processing disk r10f1e9 mmchnsd: Processing disk r10f6e9 mmchnsd: Command failed. Examine previous error messages to determine cause. I comment out the r10f6e8 line and then it completes? I have some sort of fabric san issue: [root at cl005 ~]# for i in {1..8}; do ssh cl00$i lsscsi -s | grep 38xx | grep 1.97 | wc -l; done 80 80 80 80 68 72 70 72 but i'm suprised removing one line allows it to complete. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Jan 4 19:57:07 2017 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 4 Jan 2017 19:57:07 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server Message-ID: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> I?m getting stumped trying to test out TCT on a centos based 4.2.2.0 cluster and getting the following error when I?m trying to install the gpfs.tct.server rpm. rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm error: Failed dependencies: redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 I realize that Centos isn?t ?officially? supported but this is kind of lame to check for the redhat-release package instead of whatever library (ssl) or some such that is installed instead. Anyone able to do this or know a workaround? I did a quick search on the wiki and in previous posts on this list and didn?t see anything obvious. Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jan 4 20:00:50 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 4 Jan 2017 20:00:50 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server Message-ID: Just add ??nodeps? to the rpm install line, it will go just fine. Been working just fine on my CentOS system using this method. rpm -ivh --nodeps gpfs.tct.server-1.1.2_987.x86_64.rpm Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Mark.Bush at siriuscom.com" Reply-To: gpfsug main discussion list Date: Wednesday, January 4, 2017 at 1:57 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] TCT and redhat-release-server I?m getting stumped trying to test out TCT on a centos based 4.2.2.0 cluster and getting the following error when I?m trying to install the gpfs.tct.server rpm. rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm error: Failed dependencies: redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 I realize that Centos isn?t ?officially? supported but this is kind of lame to check for the redhat-release package instead of whatever library (ssl) or some such that is installed instead. Anyone able to do this or know a workaround? I did a quick search on the wiki and in previous posts on this list and didn?t see anything obvious. Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevindjo at us.ibm.com Wed Jan 4 20:04:23 2017 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Wed, 4 Jan 2017 20:04:23 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> References: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Wed Jan 4 20:10:11 2017 From: orichards at pixitmedia.com (Orlando Richards) Date: Wed, 4 Jan 2017 20:10:11 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: References: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> Message-ID: This is an RPM dependency check, rather than checking anything about the system state (such as the contents of /etc/redhat-release). In the past, I've built a dummy rpm with no contents to work around these. I don't think you can do a "--force" on a yum install - so you can't "yum install gpfs.tct.server" unless you do something like that. Would be great to get it removed from the rpm dependencies if possible. On 04/01/2017 20:04, Kevin D Johnson wrote: > I believe it's checking /etc/redhat-release --- if you create that > file with the appropriate red hat version number (like /etc/issue for > CentOS), it should work. > > ----- Original message ----- > From: "Mark.Bush at siriuscom.com" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] TCT and redhat-release-server > Date: Wed, Jan 4, 2017 2:58 PM > > I?m getting stumped trying to test out TCT on a centos based > 4.2.2.0 cluster and getting the following error when I?m trying to > install the gpfs.tct.server rpm. > > rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm > > error: Failed dependencies: > > redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 > > I realize that Centos isn?t ?officially? supported but this is > kind of lame to check for the redhat-release package instead of > whatever library (ssl) or some such that is installed instead. > > Anyone able to do this or know a workaround? I did a quick search > on the wiki and in previous posts on this list and didn?t see > anything obvious. > > Mark > > This message (including any attachments) is intended only for the > use of the individual or entity to which it is addressed and may > contain information that is non-public, proprietary, privileged, > confidential, and exempt from disclosure under applicable law. If > you are not the intended recipient, you are hereby notified that > any use, dissemination, distribution, or copying of this > communication is strictly prohibited. This message may be viewed > by parties at Sirius Computer Solutions other than those named in > the message header. This message does not contain an official > representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions > immediately and (i) destroy this message if a facsimile or (ii) > delete this message immediately if this is an electronic > communication. Thank you. > > Sirius Computer Solutions > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Orlando Richards* VP Product Development, Pixit Media 07930742808|orichards at pixitmedia.com www.pixitmedia.com |Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevindjo at us.ibm.com Wed Jan 4 20:15:19 2017 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Wed, 4 Jan 2017 20:15:19 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: References: , <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Jan 4 20:16:37 2017 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 4 Jan 2017 20:16:37 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: References: Message-ID: <3EBE8846-7757-4957-9F01-DE4CAE558106@siriuscom.com> Success! Thanks Robert. From: "Oesterlin, Robert" Reply-To: gpfsug main discussion list Date: Wednesday, January 4, 2017 at 2:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] TCT and redhat-release-server Just add ??nodeps? to the rpm install line, it will go just fine. Been working just fine on my CentOS system using this method. rpm -ivh --nodeps gpfs.tct.server-1.1.2_987.x86_64.rpm Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Mark.Bush at siriuscom.com" Reply-To: gpfsug main discussion list Date: Wednesday, January 4, 2017 at 1:57 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] TCT and redhat-release-server I?m getting stumped trying to test out TCT on a centos based 4.2.2.0 cluster and getting the following error when I?m trying to install the gpfs.tct.server rpm. rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm error: Failed dependencies: redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 I realize that Centos isn?t ?officially? supported but this is kind of lame to check for the redhat-release package instead of whatever library (ssl) or some such that is installed instead. Anyone able to do this or know a workaround? I did a quick search on the wiki and in previous posts on this list and didn?t see anything obvious. Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Jan 5 20:00:36 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 5 Jan 2017 15:00:36 -0500 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? Message-ID: I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jan 5 20:06:18 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 5 Jan 2017 20:06:18 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> There may be an issue with one of the other NSDs in the file system according to the ?mmadddisk: File system home has some disks that are in a non-ready state.? message in our output. Best to check the status of the NSDs in the file system using the `mmlsdisk home` and if any disks are not ?up? then run the `mmchdisk home start -a` command after confirming that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes tspreparedisk ?s | dshbak ?c` for that. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:01 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] nsd not adding with one quorum node down? I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Jan 5 20:13:28 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 5 Jan 2017 15:13:28 -0500 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Bryan: Have you ever attempted to do this knowing that one quorum server is down? *all* nsdservers will not see the nsd about to be added? How about temporarily removing quorum from a nsd server...? Thanks On Thu, Jan 5, 2017 at 3:06 PM, Bryan Banister wrote: > There may be an issue with one of the other NSDs in the file system > according to the ?mmadddisk: File system home has some disks that are in > a non-ready state.? message in our output. Best to check the status of > the NSDs in the file system using the `mmlsdisk home` and if any disks are > not ?up? then run the `mmchdisk home start -a` command after confirming > that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes > tspreparedisk ?s | dshbak ?c` for that. > > > > Hope that helps, > > -Bryan > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss- > bounces at spectrumscale.org] *On Behalf Of *J. Eric Wonderley > *Sent:* Thursday, January 05, 2017 2:01 PM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] nsd not adding with one quorum node down? > > > > I have one quorum node down and attempting to add a nsd to a fs: > [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee > /root/adddisk_flh_home.out > Verifying file system configuration information ... > > The following disks of home will be formatted on node cl003: > r10f1e5: size 1879610 MB > Extending Allocation Map > Checking Allocation Map for storage pool fc_ssd400G > 55 % complete on Thu Jan 5 14:43:31 2017 > Lost connection to file system daemon. > mmadddisk: tsadddisk failed. > Verifying file system configuration information ... > mmadddisk: File system home has some disks that are in a non-ready state. > mmadddisk: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > mmadddisk: Command failed. Examine previous error messages to determine > cause. > > Had to use -v no (this failed once before). Anyhow I next see: > [root at cl002 ~]# mmgetstate -aL > > Node number Node name Quorum Nodes up Total nodes GPFS state > Remarks > ------------------------------------------------------------ > ------------------------ > 1 cl001 0 0 8 down > quorum node > 2 cl002 5 6 8 active > quorum node > 3 cl003 5 0 8 arbitrating > quorum node > 4 cl004 5 6 8 active > quorum node > 5 cl005 5 6 8 active > quorum node > 6 cl006 5 6 8 active > quorum node > 7 cl007 5 6 8 active > quorum node > 8 cl008 5 6 8 active > quorum node > [root at cl002 ~]# mmlsdisk home > disk driver sector failure holds > holds storage > name type size group metadata data status > availability pool > ------------ -------- ------ ----------- -------- ----- ------------- > ------------ ------------ > r10f1e5 nsd 512 1001 No Yes allocmap add > up fc_ssd400G > r6d2e8 nsd 512 1001 No Yes ready > up fc_8T > r6d3e8 nsd 512 1001 No Yes ready > up fc_8T > > Do all quorum node have to be up and participating to do these admin type > operations? > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jan 5 20:27:24 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 5 Jan 2017 20:27:24 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> Removing the quorum designation is an option. However I believe the file system manager must be assigned to the file system in order for the mmadddisk to work. If the file system manager is not assigned (mmlsmgr to check) or continuously is reassigned to nodes but that fails (check /var/adm/ras/mmfs.log.latest on all nodes) or is blocked from being assigned due to the apparent node recovery in the cluster indicated by the one node in the ?arbitrating? state, then the mmadddisk will not succeed. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] nsd not adding with one quorum node down? Bryan: Have you ever attempted to do this knowing that one quorum server is down? *all* nsdservers will not see the nsd about to be added? How about temporarily removing quorum from a nsd server...? Thanks On Thu, Jan 5, 2017 at 3:06 PM, Bryan Banister > wrote: There may be an issue with one of the other NSDs in the file system according to the ?mmadddisk: File system home has some disks that are in a non-ready state.? message in our output. Best to check the status of the NSDs in the file system using the `mmlsdisk home` and if any disks are not ?up? then run the `mmchdisk home start -a` command after confirming that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes tspreparedisk ?s | dshbak ?c` for that. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:01 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] nsd not adding with one quorum node down? I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jan 5 20:44:33 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 5 Jan 2017 20:44:33 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> Looking at this further, the output says the ?The following disks of home will be formatted on node cl003:? however that node is the node in ?arbitrating? state, so I don?t see how that would work, -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister Sent: Thursday, January 05, 2017 2:27 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] nsd not adding with one quorum node down? Removing the quorum designation is an option. However I believe the file system manager must be assigned to the file system in order for the mmadddisk to work. If the file system manager is not assigned (mmlsmgr to check) or continuously is reassigned to nodes but that fails (check /var/adm/ras/mmfs.log.latest on all nodes) or is blocked from being assigned due to the apparent node recovery in the cluster indicated by the one node in the ?arbitrating? state, then the mmadddisk will not succeed. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:13 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] nsd not adding with one quorum node down? Bryan: Have you ever attempted to do this knowing that one quorum server is down? *all* nsdservers will not see the nsd about to be added? How about temporarily removing quorum from a nsd server...? Thanks On Thu, Jan 5, 2017 at 3:06 PM, Bryan Banister > wrote: There may be an issue with one of the other NSDs in the file system according to the ?mmadddisk: File system home has some disks that are in a non-ready state.? message in our output. Best to check the status of the NSDs in the file system using the `mmlsdisk home` and if any disks are not ?up? then run the `mmchdisk home start -a` command after confirming that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes tspreparedisk ?s | dshbak ?c` for that. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:01 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] nsd not adding with one quorum node down? I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Thu Jan 5 21:38:39 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Thu, 05 Jan 2017 16:38:39 -0500 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <28063.1483652319@turing-police.cc.vt.edu> On Thu, 05 Jan 2017 20:44:33 +0000, Bryan Banister said: > Looking at this further, the output says the ???The following disks of home > will be formatted on node cl003:??? however that node is the node in > ???arbitrating??? state, so I don???t see how that would work, The bigger question: If it was in "arbitrating", why was it selected as the node to do the formatting? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Thu Jan 5 21:53:17 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 05 Jan 2017 16:53:17 -0500 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? Message-ID: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. % getfacl -R /path/to/source > /root/perms.ac % setfacl --restore=/root/perms.acl * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. Any ideas? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 5 22:01:18 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 5 Jan 2017 22:01:18 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Message-ID: Hi Jaime, IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? Kevin > On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: > > Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? > > * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. > > * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. > % getfacl -R /path/to/source > /root/perms.ac > % setfacl --restore=/root/perms.acl > > * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. > > Any ideas? > > Thanks > Jaime > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From laurence at qsplace.co.uk Thu Jan 5 22:03:53 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Thu, 5 Jan 2017 22:03:53 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Message-ID: <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> Are you talking about the GPFSUG github? https://github.com/gpfsug/gpfsug-tools The patched rsync there I believe was done by Orlando. -- Lauz On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: > Hi Jaime, > > IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? > > Kevin > >> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >> >> Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? >> >> * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. >> >> * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. >> % getfacl -R /path/to/source > /root/perms.ac >> % setfacl --restore=/root/perms.acl >> >> * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. >> >> Any ideas? >> >> Thanks >> Jaime >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From robbyb at us.ibm.com Thu Jan 5 22:18:08 2017 From: robbyb at us.ibm.com (Rob Basham) Date: Thu, 5 Jan 2017 22:18:08 +0000 Subject: [gpfsug-discuss] TCT and CentOS Message-ID: An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Thu Jan 5 22:42:28 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Thu, 05 Jan 2017 17:42:28 -0500 Subject: [gpfsug-discuss] TCT and CentOS In-Reply-To: References: Message-ID: <32702.1483656148@turing-police.cc.vt.edu> On Thu, 05 Jan 2017 22:18:08 +0000, "Rob Basham" said: > By way of introduction, I am TCT architect across all of IBM's storage > products, including Spectrum Scale. There have been queries as to whether or > not CentOS is supported with TCT Server on Spectrum Scale. It is not currently > supported and should not be used as a TCT Server. Is that a "we haven't qualified it and you're on your own" not supported, or "there be known dragons" not supported? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From gmcpheeters at anl.gov Thu Jan 5 23:34:04 2017 From: gmcpheeters at anl.gov (McPheeters, Gordon) Date: Thu, 5 Jan 2017 23:34:04 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <28063.1483652319@turing-police.cc.vt.edu> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> <28063.1483652319@turing-police.cc.vt.edu> Message-ID: You might want to check the gpfs logs on the node cl003. Often the message "Lost connection to file system daemon.? means that the daemon asserted while it was doing something... hence the lost connection. If you are checking the state and seeing it in arbitrating mode immed after the command fails that also makes sense as it?s now re-joining the cluster. If you aren?t watching carefully you can miss these events due to way mmfsd will resume the old mounts, hence you check the node with ?df? and see the file system is still mounted, then assume all is well, but in fact mmfsd has died and restarted. Gordon McPheeters ALCF Storage (630) 252-6430 gmcpheeters at anl.gov On Jan 5, 2017, at 3:38 PM, Valdis.Kletnieks at vt.edu wrote: On Thu, 05 Jan 2017 20:44:33 +0000, Bryan Banister said: Looking at this further, the output says the ?The following disks of home will be formatted on node cl003:? however that node is the node in ?arbitrating? state, so I don?t see how that would work, The bigger question: If it was in "arbitrating", why was it selected as the node to do the formatting? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbyb at us.ibm.com Fri Jan 6 00:28:47 2017 From: robbyb at us.ibm.com (Rob Basham) Date: Fri, 6 Jan 2017 00:28:47 +0000 Subject: [gpfsug-discuss] TCT and CentOS Message-ID: An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri Jan 6 02:16:04 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 05 Jan 2017 21:16:04 -0500 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> Message-ID: <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Great guys!!! Just what I was looking for. Everyone is always so helpful on this forum. Thanks a lot. Jaime Quoting "Laurence Horrocks-Barlow" : > Are you talking about the GPFSUG github? > > https://github.com/gpfsug/gpfsug-tools > > The patched rsync there I believe was done by Orlando. > > -- Lauz > > > On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >> Hi Jaime, >> >> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >> used it and it works great ? can?t remember where we downloaded it >> from, though. Maybe someone else on the list who *isn?t* having a >> senior moment can point you to it? >> >> Kevin >> >>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >>> >>> Does anyone know of a functional standard alone tool to >>> systematically and recursively find and replicate ACLs that works >>> well with GPFS? >>> >>> * We're currently using rsync, which will replicate permissions >>> fine, however it leaves the ACL's behind. The --perms option for >>> rsync is blind to ACLs. >>> >>> * The native linux trick below works well with ext4 after an >>> rsync, but makes a mess on GPFS. >>> % getfacl -R /path/to/source > /root/perms.ac >>> % setfacl --restore=/root/perms.acl >>> >>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>> recursive option. >>> >>> Any ideas? >>> >>> Thanks >>> Jaime >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University >>> of Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From S.J.Thompson at bham.ac.uk Fri Jan 6 07:17:46 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 6 Jan 2017 07:17:46 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk>, <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Message-ID: Just a cautionary note, it doesn't work with symlinks as it fails to get the acl and so doesn't copy the symlink. So you may want to run a traditional rsync after just to get all your symlinks on place. (having been using this over the Christmas period to merge some filesets with acls...) Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jaime Pinto [pinto at scinet.utoronto.ca] Sent: 06 January 2017 02:16 To: gpfsug main discussion list; Laurence Horrocks-Barlow Cc: support at scinet.utoronto.ca Subject: Re: [gpfsug-discuss] replicating ACLs across GPFS's? Great guys!!! Just what I was looking for. Everyone is always so helpful on this forum. Thanks a lot. Jaime Quoting "Laurence Horrocks-Barlow" : > Are you talking about the GPFSUG github? > > https://github.com/gpfsug/gpfsug-tools > > The patched rsync there I believe was done by Orlando. > > -- Lauz > > > On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >> Hi Jaime, >> >> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >> used it and it works great ? can?t remember where we downloaded it >> from, though. Maybe someone else on the list who *isn?t* having a >> senior moment can point you to it? >> >> Kevin >> >>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >>> >>> Does anyone know of a functional standard alone tool to >>> systematically and recursively find and replicate ACLs that works >>> well with GPFS? >>> >>> * We're currently using rsync, which will replicate permissions >>> fine, however it leaves the ACL's behind. The --perms option for >>> rsync is blind to ACLs. >>> >>> * The native linux trick below works well with ext4 after an >>> rsync, but makes a mess on GPFS. >>> % getfacl -R /path/to/source > /root/perms.ac >>> % setfacl --restore=/root/perms.acl >>> >>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>> recursive option. >>> >>> Any ideas? >>> >>> Thanks >>> Jaime >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University >>> of Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jtucker at pixitmedia.com Fri Jan 6 08:29:53 2017 From: jtucker at pixitmedia.com (Jez Tucker) Date: Fri, 6 Jan 2017 08:29:53 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Message-ID: <4a934973-691c-977a-1d19-81102ecb3d37@pixitmedia.com> Hi, Here: https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync For those of you with Pixit Media / ArcaStream support, just install our maintained ap-rsync which has this patch and additional fixes for other 'fun' things that break between GPFS and rsync. If anyone wants to contribute to the git repo wave your arms. Jez On 05/01/17 22:01, Buterbaugh, Kevin L wrote: > Hi Jaime, > > IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? > > Kevin > >> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >> >> Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? >> >> * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. >> >> * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. >> % getfacl -R /path/to/source > /root/perms.ac >> % setfacl --restore=/root/perms.acl >> >> * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. >> >> Any ideas? >> >> Thanks >> Jaime >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Fri Jan 6 08:31:16 2017 From: jtucker at pixitmedia.com (Jez Tucker) Date: Fri, 6 Jan 2017 08:31:16 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Message-ID: <6928a73b-a8fa-4255-813a-0ddd6c9579f7@pixitmedia.com> Some of the 'fun things' being such as that very issue. On 06/01/17 07:17, Simon Thompson (Research Computing - IT Services) wrote: > Just a cautionary note, it doesn't work with symlinks as it fails to get the acl and so doesn't copy the symlink. > > So you may want to run a traditional rsync after just to get all your symlinks on place. (having been using this over the Christmas period to merge some filesets with acls...) > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jaime Pinto [pinto at scinet.utoronto.ca] > Sent: 06 January 2017 02:16 > To: gpfsug main discussion list; Laurence Horrocks-Barlow > Cc: support at scinet.utoronto.ca > Subject: Re: [gpfsug-discuss] replicating ACLs across GPFS's? > > Great guys!!! > Just what I was looking for. > Everyone is always so helpful on this forum. > Thanks a lot. > Jaime > > Quoting "Laurence Horrocks-Barlow" : > >> Are you talking about the GPFSUG github? >> >> https://github.com/gpfsug/gpfsug-tools >> >> The patched rsync there I believe was done by Orlando. >> >> -- Lauz >> >> >> On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >>> Hi Jaime, >>> >>> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >>> used it and it works great ? can?t remember where we downloaded it >>> from, though. Maybe someone else on the list who *isn?t* having a >>> senior moment can point you to it? >>> >>> Kevin >>> >>>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >>>> >>>> Does anyone know of a functional standard alone tool to >>>> systematically and recursively find and replicate ACLs that works >>>> well with GPFS? >>>> >>>> * We're currently using rsync, which will replicate permissions >>>> fine, however it leaves the ACL's behind. The --perms option for >>>> rsync is blind to ACLs. >>>> >>>> * The native linux trick below works well with ext4 after an >>>> rsync, but makes a mess on GPFS. >>>> % getfacl -R /path/to/source > /root/perms.ac >>>> % setfacl --restore=/root/perms.acl >>>> >>>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>>> recursive option. >>>> >>>> Any ideas? >>>> >>>> Thanks >>>> Jaime >>>> >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University >>>> of Toronto. >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Fri Jan 6 08:50:43 2017 From: orichards at pixitmedia.com (Orlando Richards) Date: Fri, 6 Jan 2017 08:50:43 +0000 Subject: [gpfsug-discuss] (Re)Introduction Message-ID: Hi folks, Since I've re-joined this list with my new identity, I thought I'd ping over a brief re-intro email. Some of you will know me from my past life working for the University of Edinburgh, but in November last year I joined the team at Pixit Media / ArcaStream. For those I've not met before - I've been working with GPFS since 2007 in a University environment, initially as an HPC storage engine but quickly realised the benefits that GPFS could offer as a general file/NAS storage platform as well, and developed its use in the University of Edinburgh (and for the national UKRDF service) in that vein. These days I'm spending a lot of my time looking at the deployment, operations and support processes around GPFS - which means I get to play with all sorts of hip and trendy buzzwords :) -- *Orlando Richards* VP Product Development, Pixit Media 07930742808|orichards at pixitmedia.com www.pixitmedia.com |Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Fri Jan 6 08:51:19 2017 From: orichards at pixitmedia.com (Orlando Richards) Date: Fri, 6 Jan 2017 08:51:19 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Message-ID: Glad to see it's still doing good work out there! :) On 06/01/2017 02:16, Jaime Pinto wrote: > Great guys!!! > Just what I was looking for. > Everyone is always so helpful on this forum. > Thanks a lot. > Jaime > > Quoting "Laurence Horrocks-Barlow" : > >> Are you talking about the GPFSUG github? >> >> https://github.com/gpfsug/gpfsug-tools >> >> The patched rsync there I believe was done by Orlando. >> >> -- Lauz >> >> >> On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >>> Hi Jaime, >>> >>> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >>> used it and it works great ? can?t remember where we downloaded it >>> from, though. Maybe someone else on the list who *isn?t* having a >>> senior moment can point you to it? >>> >>> Kevin >>> >>>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto >>>> wrote: >>>> >>>> Does anyone know of a functional standard alone tool to >>>> systematically and recursively find and replicate ACLs that works >>>> well with GPFS? >>>> >>>> * We're currently using rsync, which will replicate permissions >>>> fine, however it leaves the ACL's behind. The --perms option for >>>> rsync is blind to ACLs. >>>> >>>> * The native linux trick below works well with ext4 after an >>>> rsync, but makes a mess on GPFS. >>>> % getfacl -R /path/to/source > /root/perms.ac >>>> % setfacl --restore=/root/perms.acl >>>> >>>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>>> recursive option. >>>> >>>> Any ideas? >>>> >>>> Thanks >>>> Jaime >>>> >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University >>>> of Toronto. >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Orlando Richards* VP Product Development, Pixit Media 07930742808|orichards at pixitmedia.com www.pixitmedia.com |Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From erich at uw.edu Fri Jan 6 19:07:22 2017 From: erich at uw.edu (Eric Horst) Date: Fri, 6 Jan 2017 11:07:22 -0800 Subject: [gpfsug-discuss] undo fileset inode allocation Message-ID: Greetings all, I've been setting up and migrating to a new 225TB filesystem on 4.2.1. Separate data and metadata disks. There are about 20 independent filesets as second level directories which have all the files. One of the independent filesets hit its inode limit of 28M. Without carefully checking my work I accidentally changed the limit to 3.2B inodes instead of 32M inodes. This ran for 15 minutes and when it was done I see mmdf shows that I had 0% metadata space free. There was previously 72% free. Thinking about it I reasoned that as independent filesets I might get that metadata space back if I unlinked and deleted that fileset. After doing so I find I have metadata 11% free. A far cry from the 72% I used to have. Are there other options for undoing this mistake? Or should I not worry that I'm at 11% and assume that whatever was preallocated will be productively used over the life of this filesystem? Thanks, -Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Jan 6 20:08:17 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 6 Jan 2017 20:08:17 +0000 Subject: [gpfsug-discuss] undo fileset inode allocation In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064E1624@CHI-EXCHANGEW1.w2k.jumptrading.com> Honestly this sounds like you may be in a very dangerous situation and would HIGHLY recommend opening a PMR immediately to get direct, authoritative instruction from IBM, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Eric Horst Sent: Friday, January 06, 2017 1:07 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] undo fileset inode allocation Greetings all, I've been setting up and migrating to a new 225TB filesystem on 4.2.1. Separate data and metadata disks. There are about 20 independent filesets as second level directories which have all the files. One of the independent filesets hit its inode limit of 28M. Without carefully checking my work I accidentally changed the limit to 3.2B inodes instead of 32M inodes. This ran for 15 minutes and when it was done I see mmdf shows that I had 0% metadata space free. There was previously 72% free. Thinking about it I reasoned that as independent filesets I might get that metadata space back if I unlinked and deleted that fileset. After doing so I find I have metadata 11% free. A far cry from the 72% I used to have. Are there other options for undoing this mistake? Or should I not worry that I'm at 11% and assume that whatever was preallocated will be productively used over the life of this filesystem? Thanks, -Eric ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Tomlinson at awe.co.uk Mon Jan 9 15:09:43 2017 From: Paul.Tomlinson at awe.co.uk (Paul.Tomlinson at awe.co.uk) Date: Mon, 9 Jan 2017 15:09:43 +0000 Subject: [gpfsug-discuss] AFM Migration Issue Message-ID: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR From janfrode at tanso.net Mon Jan 9 15:29:45 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 09 Jan 2017 15:29:45 +0000 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: <201701091501.v09F1i5A015912@msw1.awe.co.uk> References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: Untested, and I have no idea if it will work on the number of files and directories you have, but maybe you can fix it by rsyncing just the directories? rsync -av --dry-run --include='*/' --exclude='*' source/ destination/ -jf man. 9. jan. 2017 kl. 16.09 skrev : > Hi All, > > We have just completed the first data move from our old cluster to the new > one using AFM Local Update as per the guide, however we have noticed that > all date stamps on the directories have the date they were created on(e.g. > 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), > whereas all the files have the correct dates. > > Has anyone else seen this issue as we now have to convert all the > directory dates to their original dates ! > > > > > The information in this email and in any attachment(s) is > commercial in confidence. If you are not the named addressee(s) > or > if you receive this email in error then any distribution, copying or > use of this communication or the information in it is strictly > prohibited. Please notify us immediately by email at > admin.internet(at)awe.co.uk, and then delete this message from > your computer. While attachments are virus checked, AWE plc > does not accept any liability in respect of any virus which is not > detected. > > AWE Plc > Registered in England and Wales > Registration No 02763902 > AWE, Aldermaston, Reading, RG7 4PR > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jan 9 15:48:43 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 9 Jan 2017 15:48:43 +0000 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: <201701091501.v09F1i5A015912@msw1.awe.co.uk> References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on..... (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Tomlinson at awe.co.uk Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Tomlinson at awe.co.uk Mon Jan 9 16:00:04 2017 From: Paul.Tomlinson at awe.co.uk (Paul.Tomlinson at awe.co.uk) Date: Mon, 9 Jan 2017 16:00:04 +0000 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: <201701091552.v09Fq4kj012315@msw1.awe.co.uk> Hi, We have already come across the issues you have seen below, and worked around them. If you run the pre-fetch with just the --meta-data-only, then all the date stamps are correct for the dirs., as soon as you run --list-only all the directory times change to now. We have tried rsync but this did not appear to work. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs Sent: 09 January 2017 15:49 To: gpfsug-discuss at spectrumscale.org Subject: EXTERNAL: Re: [gpfsug-discuss] AFM Migration Issue Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on..... (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Tomlinson at awe.co.uk Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR From YARD at il.ibm.com Mon Jan 9 19:12:08 2017 From: YARD at il.ibm.com (Yaron Daniel) Date: Mon, 9 Jan 2017 21:12:08 +0200 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: Hi Do u have nfsv4 acl's ? Try to ask from IBM support to get Sonas rsync in order to migrate the data. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Jan-Frode Myklebust To: gpfsug main discussion list Date: 01/09/2017 05:30 PM Subject: Re: [gpfsug-discuss] AFM Migration Issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Untested, and I have no idea if it will work on the number of files and directories you have, but maybe you can fix it by rsyncing just the directories? rsync -av --dry-run --include='*/' --exclude='*' source/ destination/ -jf man. 9. jan. 2017 kl. 16.09 skrev : Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From mimarsh2 at vt.edu Mon Jan 9 20:16:55 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 9 Jan 2017 15:16:55 -0500 Subject: [gpfsug-discuss] replication and no failure groups Message-ID: All, If I have a filesystem with replication set to 2 and 1 failure group: 1) I assume replication won't actually happen, correct? 2) Will this impact performance i.e cut write performance in half even though it really only keeps 1 copy? End goal - I would like a single storage pool within the filesystem to be replicated without affecting the performance of all other pools(which only have a single failure group) Thanks, Brian Marshall VT - ARC -------------- next part -------------- An HTML attachment was scrubbed... URL: From YARD at il.ibm.com Mon Jan 9 20:34:29 2017 From: YARD at il.ibm.com (Yaron Daniel) Date: Mon, 9 Jan 2017 22:34:29 +0200 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi 1) Yes in case u have only 1 Failure group - replication will not work. 2) Do you have 2 Storage Systems ? When using GPFS replication write stay the same - but read can be double - since it read from 2 Storage systems Hope this help - what do you try to achive , can you share your env setup ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Brian Marshall To: gpfsug main discussion list Date: 01/09/2017 10:17 PM Subject: [gpfsug-discuss] replication and no failure groups Sent by: gpfsug-discuss-bounces at spectrumscale.org All, If I have a filesystem with replication set to 2 and 1 failure group: 1) I assume replication won't actually happen, correct? 2) Will this impact performance i.e cut write performance in half even though it really only keeps 1 copy? End goal - I would like a single storage pool within the filesystem to be replicated without affecting the performance of all other pools(which only have a single failure group) Thanks, Brian Marshall VT - ARC_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From eric.wonderley at vt.edu Mon Jan 9 20:47:12 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 9 Jan 2017 15:47:12 -0500 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi Yaron: This is the filesystem: [root at cl005 net]# mmlsdisk work disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ nsd_a_7 nsd 512 -1 No Yes ready up system nsd_b_7 nsd 512 -1 No Yes ready up system nsd_c_7 nsd 512 -1 No Yes ready up system nsd_d_7 nsd 512 -1 No Yes ready up system nsd_a_8 nsd 512 -1 No Yes ready up system nsd_b_8 nsd 512 -1 No Yes ready up system nsd_c_8 nsd 512 -1 No Yes ready up system nsd_d_8 nsd 512 -1 No Yes ready up system nsd_a_9 nsd 512 -1 No Yes ready up system nsd_b_9 nsd 512 -1 No Yes ready up system nsd_c_9 nsd 512 -1 No Yes ready up system nsd_d_9 nsd 512 -1 No Yes ready up system nsd_a_10 nsd 512 -1 No Yes ready up system nsd_b_10 nsd 512 -1 No Yes ready up system nsd_c_10 nsd 512 -1 No Yes ready up system nsd_d_10 nsd 512 -1 No Yes ready up system nsd_a_11 nsd 512 -1 No Yes ready up system nsd_b_11 nsd 512 -1 No Yes ready up system nsd_c_11 nsd 512 -1 No Yes ready up system nsd_d_11 nsd 512 -1 No Yes ready up system nsd_a_12 nsd 512 -1 No Yes ready up system nsd_b_12 nsd 512 -1 No Yes ready up system nsd_c_12 nsd 512 -1 No Yes ready up system nsd_d_12 nsd 512 -1 No Yes ready up system work_md_pf1_1 nsd 512 200 Yes No ready up system jbf1z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf1z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z4 nsd 4096 2034 No Yes ready up sas_ssd4T work_md_pf1_2 nsd 512 200 Yes No ready up system work_md_pf1_3 nsd 512 200 Yes No ready up system work_md_pf1_4 nsd 512 200 Yes No ready up system work_md_pf2_5 nsd 512 199 Yes No ready up system work_md_pf2_6 nsd 512 199 Yes No ready up system work_md_pf2_7 nsd 512 199 Yes No ready up system work_md_pf2_8 nsd 512 199 Yes No ready up system [root at cl005 net]# mmlsfs work -R -r -M -m -K flag value description ------------------- ------------------------ ----------------------------------- -R 2 Maximum number of data replicas -r 2 Default number of data replicas -M 2 Maximum number of metadata replicas -m 2 Default number of metadata replicas -K whenpossible Strict replica allocation option On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel wrote: > Hi > > 1) Yes in case u have only 1 Failure group - replication will not work. > > 2) Do you have 2 Storage Systems ? When using GPFS replication write stay > the same - but read can be double - since it read from 2 Storage systems > > Hope this help - what do you try to achive , can you share your env setup ? > > > > Regards > > > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Server, **Storage and Data Services* > *- > Team Leader* Petach Tiqva, 49527 > *Global Technology Services* Israel > Phone: +972-3-916-5672 <+972%203-916-5672> > Fax: +972-3-916-5672 <+972%203-916-5672> > Mobile: +972-52-8395593 <+972%2052-839-5593> > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/09/2017 10:17 PM > Subject: [gpfsug-discuss] replication and no failure groups > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > All, > > If I have a filesystem with replication set to 2 and 1 failure group: > > 1) I assume replication won't actually happen, correct? > > 2) Will this impact performance i.e cut write performance in half even > though it really only keeps 1 copy? > > End goal - I would like a single storage pool within the filesystem to be > replicated without affecting the performance of all other pools(which only > have a single failure group) > > Thanks, > Brian Marshall > VT - ARC_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From YARD at il.ibm.com Mon Jan 9 20:53:38 2017 From: YARD at il.ibm.com (Yaron Daniel) Date: Mon, 9 Jan 2017 20:53:38 +0000 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi So - do u able to have GPFS replication for the MD Failure Groups ? I can see that u have 3 Failure Groups for Data -1, 2012,2034 , how many Storage Subsystems you have ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "J. Eric Wonderley" To: gpfsug main discussion list Date: 01/09/2017 10:48 PM Subject: Re: [gpfsug-discuss] replication and no failure groups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Yaron: This is the filesystem: [root at cl005 net]# mmlsdisk work disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ nsd_a_7 nsd 512 -1 No Yes ready up system nsd_b_7 nsd 512 -1 No Yes ready up system nsd_c_7 nsd 512 -1 No Yes ready up system nsd_d_7 nsd 512 -1 No Yes ready up system nsd_a_8 nsd 512 -1 No Yes ready up system nsd_b_8 nsd 512 -1 No Yes ready up system nsd_c_8 nsd 512 -1 No Yes ready up system nsd_d_8 nsd 512 -1 No Yes ready up system nsd_a_9 nsd 512 -1 No Yes ready up system nsd_b_9 nsd 512 -1 No Yes ready up system nsd_c_9 nsd 512 -1 No Yes ready up system nsd_d_9 nsd 512 -1 No Yes ready up system nsd_a_10 nsd 512 -1 No Yes ready up system nsd_b_10 nsd 512 -1 No Yes ready up system nsd_c_10 nsd 512 -1 No Yes ready up system nsd_d_10 nsd 512 -1 No Yes ready up system nsd_a_11 nsd 512 -1 No Yes ready up system nsd_b_11 nsd 512 -1 No Yes ready up system nsd_c_11 nsd 512 -1 No Yes ready up system nsd_d_11 nsd 512 -1 No Yes ready up system nsd_a_12 nsd 512 -1 No Yes ready up system nsd_b_12 nsd 512 -1 No Yes ready up system nsd_c_12 nsd 512 -1 No Yes ready up system nsd_d_12 nsd 512 -1 No Yes ready up system work_md_pf1_1 nsd 512 200 Yes No ready up system jbf1z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf1z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z4 nsd 4096 2034 No Yes ready up sas_ssd4T work_md_pf1_2 nsd 512 200 Yes No ready up system work_md_pf1_3 nsd 512 200 Yes No ready up system work_md_pf1_4 nsd 512 200 Yes No ready up system work_md_pf2_5 nsd 512 199 Yes No ready up system work_md_pf2_6 nsd 512 199 Yes No ready up system work_md_pf2_7 nsd 512 199 Yes No ready up system work_md_pf2_8 nsd 512 199 Yes No ready up system [root at cl005 net]# mmlsfs work -R -r -M -m -K flag value description ------------------- ------------------------ ----------------------------------- -R 2 Maximum number of data replicas -r 2 Default number of data replicas -M 2 Maximum number of metadata replicas -m 2 Default number of metadata replicas -K whenpossible Strict replica allocation option On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel wrote: Hi 1) Yes in case u have only 1 Failure group - replication will not work. 2) Do you have 2 Storage Systems ? When using GPFS replication write stay the same - but read can be double - since it read from 2 Storage systems Hope this help - what do you try to achive , can you share your env setup ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services- Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Brian Marshall To: gpfsug main discussion list Date: 01/09/2017 10:17 PM Subject: [gpfsug-discuss] replication and no failure groups Sent by: gpfsug-discuss-bounces at spectrumscale.org All, If I have a filesystem with replication set to 2 and 1 failure group: 1) I assume replication won't actually happen, correct? 2) Will this impact performance i.e cut write performance in half even though it really only keeps 1 copy? End goal - I would like a single storage pool within the filesystem to be replicated without affecting the performance of all other pools(which only have a single failure group) Thanks, Brian Marshall VT - ARC_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From eric.wonderley at vt.edu Mon Jan 9 21:01:14 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 9 Jan 2017 16:01:14 -0500 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi Yuran: We have 5...4x md3860fs and 1x if150. the if150 requires data replicas=2 to get the ha and protection they recommend. we have it presented in a fileset that appears in a users work area. On Mon, Jan 9, 2017 at 3:53 PM, Yaron Daniel wrote: > Hi > > So - do u able to have GPFS replication for the MD Failure Groups ? > > I can see that u have 3 Failure Groups for Data -1, 2012,2034 , how many > Storage Subsystems you have ? > > > > > Regards > > > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Server, **Storage and Data Services* > *- > Team Leader* Petach Tiqva, 49527 > *Global Technology Services* Israel > Phone: +972-3-916-5672 <+972%203-916-5672> > Fax: +972-3-916-5672 <+972%203-916-5672> > Mobile: +972-52-8395593 <+972%2052-839-5593> > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > > > > From: "J. Eric Wonderley" > To: gpfsug main discussion list > Date: 01/09/2017 10:48 PM > Subject: Re: [gpfsug-discuss] replication and no failure groups > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Yaron: > > This is the filesystem: > > [root at cl005 net]# mmlsdisk work > disk driver sector failure holds > holds storage > name type size group metadata data status > availability pool > ------------ -------- ------ ----------- -------- ----- ------------- > ------------ ------------ > nsd_a_7 nsd 512 -1 No Yes ready > up system > nsd_b_7 nsd 512 -1 No Yes ready > up system > nsd_c_7 nsd 512 -1 No Yes ready > up system > nsd_d_7 nsd 512 -1 No Yes ready > up system > nsd_a_8 nsd 512 -1 No Yes ready > up system > nsd_b_8 nsd 512 -1 No Yes ready > up system > nsd_c_8 nsd 512 -1 No Yes ready > up system > nsd_d_8 nsd 512 -1 No Yes ready > up system > nsd_a_9 nsd 512 -1 No Yes ready > up system > nsd_b_9 nsd 512 -1 No Yes ready > up system > nsd_c_9 nsd 512 -1 No Yes ready > up system > nsd_d_9 nsd 512 -1 No Yes ready > up system > nsd_a_10 nsd 512 -1 No Yes ready > up system > nsd_b_10 nsd 512 -1 No Yes ready > up system > nsd_c_10 nsd 512 -1 No Yes ready > up system > nsd_d_10 nsd 512 -1 No Yes ready > up system > nsd_a_11 nsd 512 -1 No Yes ready > up system > nsd_b_11 nsd 512 -1 No Yes ready > up system > nsd_c_11 nsd 512 -1 No Yes ready > up system > nsd_d_11 nsd 512 -1 No Yes ready > up system > nsd_a_12 nsd 512 -1 No Yes ready > up system > nsd_b_12 nsd 512 -1 No Yes ready > up system > nsd_c_12 nsd 512 -1 No Yes ready > up system > nsd_d_12 nsd 512 -1 No Yes ready > up system > work_md_pf1_1 nsd 512 200 Yes No ready > up system > jbf1z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf2z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf3z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf4z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf5z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf6z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf7z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf8z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf1z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf2z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf3z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf4z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf5z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf6z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf7z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf8z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf1z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf2z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf3z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf4z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf5z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf6z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf7z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf8z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf1z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf2z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf3z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf4z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf5z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf6z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf7z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf8z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > work_md_pf1_2 nsd 512 200 Yes No ready > up system > work_md_pf1_3 nsd 512 200 Yes No ready > up system > work_md_pf1_4 nsd 512 200 Yes No ready > up system > work_md_pf2_5 nsd 512 199 Yes No ready > up system > work_md_pf2_6 nsd 512 199 Yes No ready > up system > work_md_pf2_7 nsd 512 199 Yes No ready > up system > work_md_pf2_8 nsd 512 199 Yes No ready > up system > [root at cl005 net]# mmlsfs work -R -r -M -m -K > flag value description > ------------------- ------------------------ ------------------------------ > ----- > -R 2 Maximum number of data > replicas > -r 2 Default number of data > replicas > -M 2 Maximum number of metadata > replicas > -m 2 Default number of metadata > replicas > -K whenpossible Strict replica allocation > option > > > On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel <*YARD at il.ibm.com* > > wrote: > Hi > > 1) Yes in case u have only 1 Failure group - replication will not work. > > 2) Do you have 2 Storage Systems ? When using GPFS replication write stay > the same - but read can be double - since it read from 2 Storage systems > > Hope this help - what do you try to achive , can you share your env setup ? > > > Regards > > > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Server, **Storage and Data Services* > *- > Team Leader* Petach Tiqva, 49527 > *Global Technology Services* Israel > Phone: *+972-3-916-5672* <+972%203-916-5672> > Fax: *+972-3-916-5672* <+972%203-916-5672> > Mobile: *+972-52-8395593* <+972%2052-839-5593> > e-mail: *yard at il.ibm.com* > *IBM Israel* > > > > > > > > From: Brian Marshall <*mimarsh2 at vt.edu* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 01/09/2017 10:17 PM > Subject: [gpfsug-discuss] replication and no failure groups > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > > ------------------------------ > > > > > All, > > If I have a filesystem with replication set to 2 and 1 failure group: > > 1) I assume replication won't actually happen, correct? > > 2) Will this impact performance i.e cut write performance in half even > though it really only keeps 1 copy? > > End goal - I would like a single storage pool within the filesystem to be > replicated without affecting the performance of all other pools(which only > have a single failure group) > > Thanks, > Brian Marshall > VT - ARC_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From janfrode at tanso.net Mon Jan 9 22:24:45 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 09 Jan 2017 22:24:45 +0000 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Yaron, doesn't "-1" make each of these disk an independent failure group? >From 'man mmcrnsd': "The default is -1, which indicates this disk has no point of failure in common with any other disk." -jf man. 9. jan. 2017 kl. 21.54 skrev Yaron Daniel : > Hi > > So - do u able to have GPFS replication > > for the MD Failure Groups ? > > I can see that u have 3 Failure Groups > > for Data -1, 2012,2034 , how many Storage Subsystems you have ? > > > > > Regards > > > > ------------------------------ > > > > > > *YaronDaniel* 94 > > Em Ha'Moshavot Rd > > > *Server,* > > *Storageand Data Services* > *- > Team Leader* > > Petach > > Tiqva, 49527 > > > *GlobalTechnology Services* Israel > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > > > Mobile: +972-52-8395593 > > > e-mail: yard at il.ibm.com > > > > > *IBMIsrael* > > > > > > > > > > From: > > "J. Eric Wonderley" > > > > > To: > > gpfsug main discussion > > list > > Date: > > 01/09/2017 10:48 PM > Subject: > > Re: [gpfsug-discuss] > > replication and no failure groups > Sent by: > > gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Yaron: > > This is the filesystem: > > [root at cl005 net]# mmlsdisk work > disk driver > > sector failure holds holds > > storage > name type > > size group metadata data status > > availability pool > ------------ -------- ------ ----------- -------- ----- ------------- > ------------ > > ------------ > nsd_a_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_12 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_12 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_12 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_12 nsd > > 512 -1 No > > Yes ready up > > system > work_md_pf1_1 nsd 512 > > 200 Yes No ready > > up system > > > jbf1z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf2z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf3z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf4z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf5z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf6z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf7z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf8z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf1z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf2z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf3z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf4z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf5z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf6z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf7z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf8z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf1z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf2z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf3z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf4z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf5z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf6z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf7z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf8z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf1z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf2z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf3z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf4z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf5z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf6z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf7z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf8z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > work_md_pf1_2 nsd 512 > > 200 Yes No ready > > up system > > > work_md_pf1_3 nsd 512 > > 200 Yes No ready > > up system > > > work_md_pf1_4 nsd 512 > > 200 Yes No ready > > up system > > > work_md_pf2_5 nsd 512 > > 199 Yes No ready > > up system > > > work_md_pf2_6 nsd 512 > > 199 Yes No ready > > up system > > > work_md_pf2_7 nsd 512 > > 199 Yes No ready > > up system > > > work_md_pf2_8 nsd 512 > > 199 Yes No ready > > up system > > > [root at cl005 net]# mmlsfs work -R -r -M -m -K > flag > > value > > description > ------------------- ------------------------ > ----------------------------------- > -R > > 2 > > Maximum number of data replicas > -r > > 2 > > Default number of data replicas > -M > > 2 > > Maximum number of metadata replicas > -m > > 2 > > Default number of metadata replicas > -K > > whenpossible > > Strict replica allocation option > > > On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel <*YARD at il.ibm.com* > > > > wrote: > Hi > > 1) Yes in case u have only 1 Failure group - replication will not work. > > 2) Do you have 2 Storage Systems ? When using GPFS replication write > > stay the same - but read can be double - since it read from 2 Storage > systems > > Hope this help - what do you try to achive , can you share your env setup > > ? > > > Regards > > > > ------------------------------ > > > > > > *YaronDaniel* 94 > > Em Ha'Moshavot Rd > > > *Server,* > > *Storageand Data Services* > > > *-Team Leader* Petach > > Tiqva, 49527 > > > *GlobalTechnology Services* Israel > Phone: *+972-3-916-5672* <+972%203-916-5672> > Fax: *+972-3-916-5672* <+972%203-916-5672> > > > Mobile: *+972-52-8395593* <+972%2052-839-5593> > > > e-mail: *yard at il.ibm.com* > > > > > *IBMIsrael* > > > > > > > > > > From: Brian > > Marshall <*mimarsh2 at vt.edu* > > To: gpfsug > > main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 01/09/2017 > > 10:17 PM > Subject: [gpfsug-discuss] > > replication and no failure groups > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > > ------------------------------ > > > > > All, > > If I have a filesystem with replication set to 2 and 1 failure group: > > 1) I assume replication won't actually happen, correct? > > 2) Will this impact performance i.e cut write performance in half even > > though it really only keeps 1 copy? > > End goal - I would like a single storage pool within the filesystem to > > be replicated without affecting the performance of all other pools(which > > only have a single failure group) > > Thanks, > Brian Marshall > VT - ARC_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From vpuvvada at in.ibm.com Tue Jan 10 08:44:19 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Tue, 10 Jan 2017 14:14:19 +0530 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: <201701091552.v09Fq4kj012315@msw1.awe.co.uk> References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> <201701091552.v09Fq4kj012315@msw1.awe.co.uk> Message-ID: AFM cannot keep directory mtime in sync. Directory mtime changes during readdir when files are created inside it after initial lookup. This is a known limitation today. ~Venkat (vpuvvada at in.ibm.com) From: To: Date: 01/09/2017 09:30 PM Subject: Re: [gpfsug-discuss] AFM Migration Issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We have already come across the issues you have seen below, and worked around them. If you run the pre-fetch with just the --meta-data-only, then all the date stamps are correct for the dirs., as soon as you run --list-only all the directory times change to now. We have tried rsync but this did not appear to work. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs Sent: 09 January 2017 15:49 To: gpfsug-discuss at spectrumscale.org Subject: EXTERNAL: Re: [gpfsug-discuss] AFM Migration Issue Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on..... (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Tomlinson at awe.co.uk Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Jan 10 13:24:33 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 10 Jan 2017 08:24:33 -0500 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: That`s the answer. We hadn`t read deep enough and just assumed that -1 meant default failure group or no failure groups at all. Thanks, Brian On Mon, Jan 9, 2017 at 5:24 PM, Jan-Frode Myklebust wrote: > Yaron, doesn't "-1" make each of these disk an independent failure group? > > From 'man mmcrnsd': > > "The default is -1, which indicates this disk has no point of failure in > common with any other disk." > > > -jf > > > man. 9. jan. 2017 kl. 21.54 skrev Yaron Daniel : > >> Hi >> >> So - do u able to have GPFS replication >> >> for the MD Failure Groups ? >> >> I can see that u have 3 Failure Groups >> >> for Data -1, 2012,2034 , how many Storage Subsystems you have ? >> >> >> >> >> Regards >> >> >> >> ------------------------------ >> >> >> >> >> >> *YaronDaniel* 94 >> >> Em Ha'Moshavot Rd >> >> >> *Server,* >> >> *Storageand Data Services* >> *- >> Team Leader* >> >> Petach >> >> Tiqva, 49527 >> >> >> *GlobalTechnology Services* Israel >> Phone: +972-3-916-5672 <+972%203-916-5672> >> Fax: +972-3-916-5672 <+972%203-916-5672> >> >> >> Mobile: +972-52-8395593 <+972%2052-839-5593> >> >> >> e-mail: yard at il.ibm.com >> >> >> >> >> *IBMIsrael* >> >> >> >> >> >> >> >> >> >> From: >> >> "J. Eric Wonderley" >> >> >> >> >> To: >> >> gpfsug main discussion >> >> list >> >> Date: >> >> 01/09/2017 10:48 PM >> Subject: >> >> Re: [gpfsug-discuss] >> >> replication and no failure groups >> Sent by: >> >> gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> Hi Yaron: >> >> This is the filesystem: >> >> [root at cl005 net]# mmlsdisk work >> disk driver >> >> sector failure holds holds >> >> storage >> name type >> >> size group metadata data status >> >> availability pool >> ------------ -------- ------ ----------- -------- ----- ------------- >> ------------ >> >> ------------ >> nsd_a_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> work_md_pf1_1 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> jbf1z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf1z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf1z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf1z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> work_md_pf1_2 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> work_md_pf1_3 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> work_md_pf1_4 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> work_md_pf2_5 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> work_md_pf2_6 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> work_md_pf2_7 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> work_md_pf2_8 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> [root at cl005 net]# mmlsfs work -R -r -M -m -K >> flag >> >> value >> >> description >> ------------------- ------------------------ >> ----------------------------------- >> -R >> >> 2 >> >> Maximum number of data replicas >> -r >> >> 2 >> >> Default number of data replicas >> -M >> >> 2 >> >> Maximum number of metadata replicas >> -m >> >> 2 >> >> Default number of metadata replicas >> -K >> >> whenpossible >> >> Strict replica allocation option >> >> >> On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel <*YARD at il.ibm.com* >> > >> >> wrote: >> Hi >> >> 1) Yes in case u have only 1 Failure group - replication will not work. >> >> 2) Do you have 2 Storage Systems ? When using GPFS replication write >> >> stay the same - but read can be double - since it read from 2 Storage >> systems >> >> Hope this help - what do you try to achive , can you share your env setup >> >> ? >> >> >> Regards >> >> >> >> ------------------------------ >> >> >> >> >> >> *YaronDaniel* 94 >> >> Em Ha'Moshavot Rd >> >> >> *Server,* >> >> *Storageand Data Services* >> >> >> *-Team Leader* Petach >> >> Tiqva, 49527 >> >> >> *GlobalTechnology Services* Israel >> Phone: *+972-3-916-5672* <+972%203-916-5672> >> Fax: *+972-3-916-5672* <+972%203-916-5672> >> >> >> Mobile: *+972-52-8395593* <+972%2052-839-5593> >> >> >> e-mail: *yard at il.ibm.com* >> >> >> >> >> *IBMIsrael* >> >> >> >> >> >> >> >> >> >> From: Brian >> >> Marshall <*mimarsh2 at vt.edu* > >> To: gpfsug >> >> main discussion list <*gpfsug-discuss at spectrumscale.org* >> > >> Date: 01/09/2017 >> >> 10:17 PM >> Subject: [gpfsug-discuss] >> >> replication and no failure groups >> Sent by: *gpfsug-discuss-bounces at spectrumscale.org* >> >> >> ------------------------------ >> >> >> >> >> All, >> >> If I have a filesystem with replication set to 2 and 1 failure group: >> >> 1) I assume replication won't actually happen, correct? >> >> 2) Will this impact performance i.e cut write performance in half even >> >> though it really only keeps 1 copy? >> >> End goal - I would like a single storage pool within the filesystem to >> >> be replicated without affecting the performance of all other pools(which >> >> only have a single failure group) >> >> Thanks, >> Brian Marshall >> VT - ARC_______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> _______________________________________________ >> >> gpfsug-discuss mailing list >> >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Jan 10 17:59:52 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 10 Jan 2017 11:59:52 -0600 Subject: [gpfsug-discuss] CES nodes Hyper threads or no Message-ID: <5376d22b-abdc-7ead-5ea8-ae9da3073c4f@wustl.edu> All, I typically turn Hyper threading off on storage nodes. So I did on our CES nodes as well. Now they are running at a load of over 100 and have 25% cpu idle. With two 8 cores I am now wondering if hyper threading would help or did we just under size them :-(. These are nfs v3 servers only with lroc enabled. Load average: 156.13 160.40 158.97 any opinions on if it would help. Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From jtolson at us.ibm.com Tue Jan 10 20:17:01 2017 From: jtolson at us.ibm.com (John T Olson) Date: Tue, 10 Jan 2017 13:17:01 -0700 Subject: [gpfsug-discuss] Updated whitepaper published In-Reply-To: References: Message-ID: An updated white paper has been published which shows integration of the Varonis UNIX agent in Spectrum Scale for audit logging. This version of the paper is updated to include test results from new capabilities provided in Spectrum Scale version 4.2.2.1. Here is a link to the paper: https://www.ibm.com/developerworks/community/wikis/form/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 11 09:27:06 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 11 Jan 2017 09:27:06 +0000 Subject: [gpfsug-discuss] CES log files Message-ID: Which files do I need to look in to determine what's happening with CES... supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Jan 11 09:54:39 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 11 Jan 2017 09:54:39 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 11:21:00 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 12:21:00 +0100 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: I also struggle with where to look for CES log files.. but maybe the new "mmprotocoltrace" command can be useful? # mmprotocoltrace start smb ### reproduce problem # mmprotocoltrace stop smb Check log files it has collected. -jf On Wed, Jan 11, 2017 at 10:27 AM, Sobey, Richard A wrote: > Which files do I need to look in to determine what?s happening with CES? > supposing for example a load of domain controllers were shut down and CES > had no clue how to handle this and stopped working until the DCs were > switched back on again. > > > > Mmfs.log.latest said everything was fine btw. > > > > Thanks > > Richard > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 11 13:59:30 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 11 Jan 2017 13:59:30 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" > Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 11 14:29:39 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 11 Jan 2017 14:29:39 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: What did the smb log claim on the nodes? Should be in /var/adm/ras, for example if SMB failed, then I could see that CES would mark the node as degraded. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 11 January 2017 at 13:59 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] CES log files Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" > Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Jan 11 14:39:13 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 11 Jan 2017 14:39:13 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster Message-ID: We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are connected via Infiniband (FDR14). At the time of implementation of ESS, we were instructed to enable RDMA in addition to IPoIB. Previously we only ran IPoIB on our GPFS3.5 cluster. Every since the implementation (sometime back in July of 2016) we see a lot of compute nodes being ejected. What usually precedes the ejection are following messages: Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 1 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 400 Even our ESS IO server sometimes ends up being ejected (case in point - yesterday morning): Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3001 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2671 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2495 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3077 Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease renewal is overdue. Pinging to check if it is alive I've had multiple PMRs open for this issue, and I am told that our ESS needs code level upgrades in order to fix this issue. Looking at the errors, I think the issue is Infiniband related, and I am wondering if anyone on this list has seen similar issues? Thanks for your help in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Jan 11 15:03:13 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 11 Jan 2017 16:03:13 +0100 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 15:10:03 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 16:10:03 +0100 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: My first guess would also be rdmaSend, which the gssClientConfig.sh enables by default, but isn't scalable to large clusters. It fits with your error message: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning - """For GPFS version 3.5.0.11 and later, IB error IBV_WC_RNR_RETRY_EXC_ERR may occur if the cluster is too large when verbsRdmaSend is enabled Idf these errors are observed in the mmfs log, disable verbsRdmaSend on all nodes.. Additionally, out of memory errors may occur if verbsRdmaSend is enabled on very large clusters. If out of memory errors are observed, disabled verbsRdmaSend on all nodes in the cluster.""" Otherwise it would be nice if you could post your mmlsconfig to see if something else sticks out.. -jf On Wed, Jan 11, 2017 at 4:03 PM, Olaf Weiser wrote: > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jan 11 15:15:52 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Wed, 11 Jan 2017 15:15:52 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster References: [gpfsug-discuss] nodes being ejected out of the cluster Message-ID: <5F910253243E6A47B81A9A2EB424BBA101E91A4A@NDMSMBX404.ndc.nasa.gov> The RDMA errors I think are secondary to what's going on with either your IPoIB or Ethernet fabrics that's causing I assume IPoIB communication breakdowns and expulsions. We've had entire IB fabrics go offline and if the nodes werent depending on it for daemon communication nobody got expelled. Do you have a subnet defined for your IPoIB network or are your nodes daemon interfaces already set to their IPoIB interface? Have you checked your SM logs? From: Damir Krstic Sent: 1/11/17, 9:39 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] nodes being ejected out of the cluster We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are connected via Infiniband (FDR14). At the time of implementation of ESS, we were instructed to enable RDMA in addition to IPoIB. Previously we only ran IPoIB on our GPFS3.5 cluster. Every since the implementation (sometime back in July of 2016) we see a lot of compute nodes being ejected. What usually precedes the ejection are following messages: Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 1 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 400 Even our ESS IO server sometimes ends up being ejected (case in point - yesterday morning): Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3001 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2671 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2495 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3077 Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease renewal is overdue. Pinging to check if it is alive I've had multiple PMRs open for this issue, and I am told that our ESS needs code level upgrades in order to fix this issue. Looking at the errors, I think the issue is Infiniband related, and I am wondering if anyone on this list has seen similar issues? Thanks for your help in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Jan 11 15:16:09 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 11 Jan 2017 15:16:09 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From syi at ca.ibm.com Wed Jan 11 17:30:08 2017 From: syi at ca.ibm.com (Yi Sun) Date: Wed, 11 Jan 2017 12:30:08 -0500 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: Sometime increasing CES debug level to get more info, e.g. "mmces log level 3". Here are two public wiki links (probably you already know). https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Protocol%20Node%20-%20Tuning%20and%20Analysis https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Protocols%20Problem%20Determination Yi. gpfsug-discuss-bounces at spectrumscale.org wrote on 01/11/2017 07:00:06 AM: > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 01/11/2017 07:00 AM > Subject: gpfsug-discuss Digest, Vol 60, Issue 26 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: CES log files (Jan-Frode Myklebust) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 11 Jan 2017 12:21:00 +0100 > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] CES log files > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > I also struggle with where to look for CES log files.. but maybe the new > "mmprotocoltrace" command can be useful? > > # mmprotocoltrace start smb > ### reproduce problem > # mmprotocoltrace stop smb > > Check log files it has collected. > > > -jf > > > On Wed, Jan 11, 2017 at 10:27 AM, Sobey, Richard A > wrote: > > > Which files do I need to look in to determine what?s happening with CES? > > supposing for example a load of domain controllers were shut down and CES > > had no clue how to handle this and stopped working until the DCs were > > switched back on again. > > > > > > > > Mmfs.log.latest said everything was fine btw. > > > > > > > > Thanks > > > > Richard > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20170111/4ea25ddf/attachment-0001.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 60, Issue 26 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Jan 11 17:53:50 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 11 Jan 2017 17:53:50 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: Thanks for all the suggestions. Here is our mmlsconfig file. We just purchased another GL6. During the installation of the new GL6 IBM will upgrade our existing GL6 up to the latest code levels. This will happen during the week of 23rd of Jan. I am skeptical that the upgrade is going to fix the issue. On our IO servers we are running in connected mode (please note that IB interfaces are bonded) [root at gssio1 ~]# cat /sys/class/net/ib0/mode connected [root at gssio1 ~]# cat /sys/class/net/ib1/mode connected [root at gssio1 ~]# cat /sys/class/net/ib2/mode connected [root at gssio1 ~]# cat /sys/class/net/ib3/mode connected [root at gssio2 ~]# cat /sys/class/net/ib0/mode connected [root at gssio2 ~]# cat /sys/class/net/ib1/mode connected [root at gssio2 ~]# cat /sys/class/net/ib2/mode connected [root at gssio2 ~]# cat /sys/class/net/ib3/mode connected Our login nodes are also running connected mode as well. However, all of our compute nodes are running in datagram: [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode qnode0758: datagram qnode0763: datagram qnode0760: datagram qnode0772: datagram qnode0773: datagram ....etc. Here is our mmlsconfig: [root at gssio1 ~]# mmlsconfig Configuration data for cluster ess-qstorage.it.northwestern.edu: ---------------------------------------------------------------- clusterName ess-qstorage.it.northwestern.edu clusterId 17746506346828356609 dmapiFileHandleSize 32 minReleaseLevel 4.2.0.1 ccrEnabled yes cipherList AUTHONLY [gss_ppc64] nsdRAIDBufferPoolSizePct 80 maxBufferDescs 2m prefetchPct 5 nsdRAIDTracks 128k nsdRAIDSmallBufferSize 256k nsdMaxWorkerThreads 3k nsdMinWorkerThreads 3k nsdRAIDSmallThreadRatio 2 nsdRAIDThreadsPerQueue 16 nsdRAIDEventLogToConsole all nsdRAIDFastWriteFSDataLimit 256k nsdRAIDFastWriteFSMetadataLimit 1M nsdRAIDReconstructAggressiveness 1 nsdRAIDFlusherBuffersLowWatermarkPct 20 nsdRAIDFlusherBuffersLimitPct 80 nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDFlusherTracksLimitPct 80 nsdRAIDFlusherFWLogHighWatermarkMB 1000 nsdRAIDFlusherFWLogLimitMB 5000 nsdRAIDFlusherThreadsLowWatermark 1 nsdRAIDFlusherThreadsHighWatermark 512 nsdRAIDBlockDeviceMaxSectorsKB 8192 nsdRAIDBlockDeviceNrRequests 32 nsdRAIDBlockDeviceQueueDepth 16 nsdRAIDBlockDeviceScheduler deadline nsdRAIDMaxTransientStale2FT 1 nsdRAIDMaxTransientStale3FT 1 nsdMultiQueue 512 syncWorkerThreads 256 nsdInlineWriteMax 32k maxGeneralThreads 1280 maxReceiverThreads 128 nspdQueues 64 [common] maxblocksize 16m [ems1-fdr,compute,gss_ppc64] numaMemoryInterleave yes [gss_ppc64] maxFilesToCache 12k [ems1-fdr,compute] maxFilesToCache 128k [ems1-fdr,compute,gss_ppc64] flushedDataTarget 1024 flushedInodeTarget 1024 maxFileCleaners 1024 maxBufferCleaners 1024 logBufferCount 20 logWrapAmountPct 2 logWrapThreads 128 maxAllocRegionsPerNode 32 maxBackgroundDeletionThreads 16 maxInodeDeallocPrefetch 128 [gss_ppc64] maxMBpS 16000 [ems1-fdr,compute] maxMBpS 10000 [ems1-fdr,compute,gss_ppc64] worker1Threads 1024 worker3Threads 32 [gss_ppc64] ioHistorySize 64k [ems1-fdr,compute] ioHistorySize 4k [gss_ppc64] verbsRdmaMinBytes 16k [ems1-fdr,compute] verbsRdmaMinBytes 32k [ems1-fdr,compute,gss_ppc64] verbsRdmaSend yes [gss_ppc64] verbsRdmasPerConnection 16 [ems1-fdr,compute] verbsRdmasPerConnection 256 [gss_ppc64] verbsRdmasPerNode 3200 [ems1-fdr,compute] verbsRdmasPerNode 1024 [ems1-fdr,compute,gss_ppc64] verbsSendBufferMemoryMB 1024 verbsRdmasPerNodeOptimize yes verbsRdmaUseMultiCqThreads yes [ems1-fdr,compute] ignorePrefetchLUNCount yes [gss_ppc64] scatterBufferSize 256K [ems1-fdr,compute] scatterBufferSize 256k syncIntervalStrict yes [ems1-fdr,compute,gss_ppc64] nsdClientCksumTypeLocal ck64 nsdClientCksumTypeRemote ck64 [gss_ppc64] pagepool 72856M [ems1-fdr] pagepool 17544M [compute] pagepool 4g [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] verbsRdma enable [gss_ppc64] verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 [ems1-fdr] verbsPorts mlx5_0/1 mlx5_0/2 [qsched03-ib0,quser10-fdr,compute] verbsPorts mlx4_0/1 [common] autoload no [ems1-fdr,compute,gss_ppc64] maxStatCache 0 [common] envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 deadlockOverloadThreshold 0 deadlockDetectionThreshold 0 adminMode central File systems in cluster ess-qstorage.it.northwestern.edu: --------------------------------------------------------- /dev/home /dev/hpc /dev/projects /dev/tthome On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches wrote: > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 18:38:30 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 18:38:30 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: And there you have: [ems1-fdr,compute,gss_ppc64] verbsRdmaSend yes Try turning this off. -jf ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic : > Thanks for all the suggestions. Here is our mmlsconfig file. We just > purchased another GL6. During the installation of the new GL6 IBM will > upgrade our existing GL6 up to the latest code levels. This will happen > during the week of 23rd of Jan. > > I am skeptical that the upgrade is going to fix the issue. > > On our IO servers we are running in connected mode (please note that IB > interfaces are bonded) > > [root at gssio1 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib3/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib3/mode > > connected > > Our login nodes are also running connected mode as well. > > However, all of our compute nodes are running in datagram: > > [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode > > qnode0758: datagram > > qnode0763: datagram > > qnode0760: datagram > > qnode0772: datagram > > qnode0773: datagram > ....etc. > > Here is our mmlsconfig: > > [root at gssio1 ~]# mmlsconfig > > Configuration data for cluster ess-qstorage.it.northwestern.edu: > > ---------------------------------------------------------------- > > clusterName ess-qstorage.it.northwestern.edu > > clusterId 17746506346828356609 > > dmapiFileHandleSize 32 > > minReleaseLevel 4.2.0.1 > > ccrEnabled yes > > cipherList AUTHONLY > > [gss_ppc64] > > nsdRAIDBufferPoolSizePct 80 > > maxBufferDescs 2m > > prefetchPct 5 > > nsdRAIDTracks 128k > > nsdRAIDSmallBufferSize 256k > > nsdMaxWorkerThreads 3k > > nsdMinWorkerThreads 3k > > nsdRAIDSmallThreadRatio 2 > > nsdRAIDThreadsPerQueue 16 > > nsdRAIDEventLogToConsole all > > nsdRAIDFastWriteFSDataLimit 256k > > nsdRAIDFastWriteFSMetadataLimit 1M > > nsdRAIDReconstructAggressiveness 1 > > nsdRAIDFlusherBuffersLowWatermarkPct 20 > > nsdRAIDFlusherBuffersLimitPct 80 > > nsdRAIDFlusherTracksLowWatermarkPct 20 > > nsdRAIDFlusherTracksLimitPct 80 > > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > > nsdRAIDFlusherFWLogLimitMB 5000 > > nsdRAIDFlusherThreadsLowWatermark 1 > > nsdRAIDFlusherThreadsHighWatermark 512 > > nsdRAIDBlockDeviceMaxSectorsKB 8192 > > nsdRAIDBlockDeviceNrRequests 32 > > nsdRAIDBlockDeviceQueueDepth 16 > > nsdRAIDBlockDeviceScheduler deadline > > nsdRAIDMaxTransientStale2FT 1 > > nsdRAIDMaxTransientStale3FT 1 > > nsdMultiQueue 512 > > syncWorkerThreads 256 > > nsdInlineWriteMax 32k > > maxGeneralThreads 1280 > > maxReceiverThreads 128 > > nspdQueues 64 > > [common] > > maxblocksize 16m > > [ems1-fdr,compute,gss_ppc64] > > numaMemoryInterleave yes > > [gss_ppc64] > > maxFilesToCache 12k > > [ems1-fdr,compute] > > maxFilesToCache 128k > > [ems1-fdr,compute,gss_ppc64] > > flushedDataTarget 1024 > > flushedInodeTarget 1024 > > maxFileCleaners 1024 > > maxBufferCleaners 1024 > > logBufferCount 20 > > logWrapAmountPct 2 > > logWrapThreads 128 > > maxAllocRegionsPerNode 32 > > maxBackgroundDeletionThreads 16 > > maxInodeDeallocPrefetch 128 > > [gss_ppc64] > > maxMBpS 16000 > > [ems1-fdr,compute] > > maxMBpS 10000 > > [ems1-fdr,compute,gss_ppc64] > > worker1Threads 1024 > > worker3Threads 32 > > [gss_ppc64] > > ioHistorySize 64k > > [ems1-fdr,compute] > > ioHistorySize 4k > > [gss_ppc64] > > verbsRdmaMinBytes 16k > > [ems1-fdr,compute] > > verbsRdmaMinBytes 32k > > [ems1-fdr,compute,gss_ppc64] > > verbsRdmaSend yes > > [gss_ppc64] > > verbsRdmasPerConnection 16 > > [ems1-fdr,compute] > > verbsRdmasPerConnection 256 > > [gss_ppc64] > > verbsRdmasPerNode 3200 > > [ems1-fdr,compute] > > verbsRdmasPerNode 1024 > > [ems1-fdr,compute,gss_ppc64] > > verbsSendBufferMemoryMB 1024 > > verbsRdmasPerNodeOptimize yes > > verbsRdmaUseMultiCqThreads yes > > [ems1-fdr,compute] > > ignorePrefetchLUNCount yes > > [gss_ppc64] > > scatterBufferSize 256K > > [ems1-fdr,compute] > > scatterBufferSize 256k > > syncIntervalStrict yes > > [ems1-fdr,compute,gss_ppc64] > > nsdClientCksumTypeLocal ck64 > > nsdClientCksumTypeRemote ck64 > > [gss_ppc64] > > pagepool 72856M > > [ems1-fdr] > > pagepool 17544M > > [compute] > > pagepool 4g > > [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] > > verbsRdma enable > > [gss_ppc64] > > verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 > > [ems1-fdr] > > verbsPorts mlx5_0/1 mlx5_0/2 > > [qsched03-ib0,quser10-fdr,compute] > > verbsPorts mlx4_0/1 > > [common] > > autoload no > > [ems1-fdr,compute,gss_ppc64] > > maxStatCache 0 > > [common] > > envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 > > deadlockOverloadThreshold 0 > > deadlockDetectionThreshold 0 > > adminMode central > > > File systems in cluster ess-qstorage.it.northwestern.edu: > > --------------------------------------------------------- > > /dev/home > > /dev/hpc > > /dev/projects > > /dev/tthome > > On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches > wrote: > > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Jan 11 19:22:31 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 11 Jan 2017 19:22:31 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: Can this be done live? Meaning can GPFS remain up when I turn this off? Thanks, Damir On Wed, Jan 11, 2017 at 12:38 PM Jan-Frode Myklebust wrote: > And there you have: > > [ems1-fdr,compute,gss_ppc64] > verbsRdmaSend yes > > Try turning this off. > > > -jf > ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic : > > Thanks for all the suggestions. Here is our mmlsconfig file. We just > purchased another GL6. During the installation of the new GL6 IBM will > upgrade our existing GL6 up to the latest code levels. This will happen > during the week of 23rd of Jan. > > I am skeptical that the upgrade is going to fix the issue. > > On our IO servers we are running in connected mode (please note that IB > interfaces are bonded) > > [root at gssio1 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib3/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib3/mode > > connected > > Our login nodes are also running connected mode as well. > > However, all of our compute nodes are running in datagram: > > [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode > > qnode0758: datagram > > qnode0763: datagram > > qnode0760: datagram > > qnode0772: datagram > > qnode0773: datagram > ....etc. > > Here is our mmlsconfig: > > [root at gssio1 ~]# mmlsconfig > > Configuration data for cluster ess-qstorage.it.northwestern.edu: > > ---------------------------------------------------------------- > > clusterName ess-qstorage.it.northwestern.edu > > clusterId 17746506346828356609 > > dmapiFileHandleSize 32 > > minReleaseLevel 4.2.0.1 > > ccrEnabled yes > > cipherList AUTHONLY > > [gss_ppc64] > > nsdRAIDBufferPoolSizePct 80 > > maxBufferDescs 2m > > prefetchPct 5 > > nsdRAIDTracks 128k > > nsdRAIDSmallBufferSize 256k > > nsdMaxWorkerThreads 3k > > nsdMinWorkerThreads 3k > > nsdRAIDSmallThreadRatio 2 > > nsdRAIDThreadsPerQueue 16 > > nsdRAIDEventLogToConsole all > > nsdRAIDFastWriteFSDataLimit 256k > > nsdRAIDFastWriteFSMetadataLimit 1M > > nsdRAIDReconstructAggressiveness 1 > > nsdRAIDFlusherBuffersLowWatermarkPct 20 > > nsdRAIDFlusherBuffersLimitPct 80 > > nsdRAIDFlusherTracksLowWatermarkPct 20 > > nsdRAIDFlusherTracksLimitPct 80 > > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > > nsdRAIDFlusherFWLogLimitMB 5000 > > nsdRAIDFlusherThreadsLowWatermark 1 > > nsdRAIDFlusherThreadsHighWatermark 512 > > nsdRAIDBlockDeviceMaxSectorsKB 8192 > > nsdRAIDBlockDeviceNrRequests 32 > > nsdRAIDBlockDeviceQueueDepth 16 > > nsdRAIDBlockDeviceScheduler deadline > > nsdRAIDMaxTransientStale2FT 1 > > nsdRAIDMaxTransientStale3FT 1 > > nsdMultiQueue 512 > > syncWorkerThreads 256 > > nsdInlineWriteMax 32k > > maxGeneralThreads 1280 > > maxReceiverThreads 128 > > nspdQueues 64 > > [common] > > maxblocksize 16m > > [ems1-fdr,compute,gss_ppc64] > > numaMemoryInterleave yes > > [gss_ppc64] > > maxFilesToCache 12k > > [ems1-fdr,compute] > > maxFilesToCache 128k > > [ems1-fdr,compute,gss_ppc64] > > flushedDataTarget 1024 > > flushedInodeTarget 1024 > > maxFileCleaners 1024 > > maxBufferCleaners 1024 > > logBufferCount 20 > > logWrapAmountPct 2 > > logWrapThreads 128 > > maxAllocRegionsPerNode 32 > > maxBackgroundDeletionThreads 16 > > maxInodeDeallocPrefetch 128 > > [gss_ppc64] > > maxMBpS 16000 > > [ems1-fdr,compute] > > maxMBpS 10000 > > [ems1-fdr,compute,gss_ppc64] > > worker1Threads 1024 > > worker3Threads 32 > > [gss_ppc64] > > ioHistorySize 64k > > [ems1-fdr,compute] > > ioHistorySize 4k > > [gss_ppc64] > > verbsRdmaMinBytes 16k > > [ems1-fdr,compute] > > verbsRdmaMinBytes 32k > > [ems1-fdr,compute,gss_ppc64] > > verbsRdmaSend yes > > [gss_ppc64] > > verbsRdmasPerConnection 16 > > [ems1-fdr,compute] > > verbsRdmasPerConnection 256 > > [gss_ppc64] > > verbsRdmasPerNode 3200 > > [ems1-fdr,compute] > > verbsRdmasPerNode 1024 > > [ems1-fdr,compute,gss_ppc64] > > verbsSendBufferMemoryMB 1024 > > verbsRdmasPerNodeOptimize yes > > verbsRdmaUseMultiCqThreads yes > > [ems1-fdr,compute] > > ignorePrefetchLUNCount yes > > [gss_ppc64] > > scatterBufferSize 256K > > [ems1-fdr,compute] > > scatterBufferSize 256k > > syncIntervalStrict yes > > [ems1-fdr,compute,gss_ppc64] > > nsdClientCksumTypeLocal ck64 > > nsdClientCksumTypeRemote ck64 > > [gss_ppc64] > > pagepool 72856M > > [ems1-fdr] > > pagepool 17544M > > [compute] > > pagepool 4g > > [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] > > verbsRdma enable > > [gss_ppc64] > > verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 > > [ems1-fdr] > > verbsPorts mlx5_0/1 mlx5_0/2 > > [qsched03-ib0,quser10-fdr,compute] > > verbsPorts mlx4_0/1 > > [common] > > autoload no > > [ems1-fdr,compute,gss_ppc64] > > maxStatCache 0 > > [common] > > envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 > > deadlockOverloadThreshold 0 > > deadlockDetectionThreshold 0 > > adminMode central > > > File systems in cluster ess-qstorage.it.northwestern.edu: > > --------------------------------------------------------- > > /dev/home > > /dev/hpc > > /dev/projects > > /dev/tthome > > On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches > wrote: > > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 19:46:00 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 19:46:00 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: Don't think you can change it without reloading gpfs. Also it should be turned off for all nodes.. So it's a big change, unfortunately.. -jf ons. 11. jan. 2017 kl. 20.22 skrev Damir Krstic : > Can this be done live? Meaning can GPFS remain up when I turn this off? > > Thanks, > Damir > > On Wed, Jan 11, 2017 at 12:38 PM Jan-Frode Myklebust > wrote: > > And there you have: > > [ems1-fdr,compute,gss_ppc64] > verbsRdmaSend yes > > Try turning this off. > > > -jf > ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic : > > Thanks for all the suggestions. Here is our mmlsconfig file. We just > purchased another GL6. During the installation of the new GL6 IBM will > upgrade our existing GL6 up to the latest code levels. This will happen > during the week of 23rd of Jan. > > I am skeptical that the upgrade is going to fix the issue. > > On our IO servers we are running in connected mode (please note that IB > interfaces are bonded) > > [root at gssio1 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib3/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib3/mode > > connected > > Our login nodes are also running connected mode as well. > > However, all of our compute nodes are running in datagram: > > [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode > > qnode0758: datagram > > qnode0763: datagram > > qnode0760: datagram > > qnode0772: datagram > > qnode0773: datagram > ....etc. > > Here is our mmlsconfig: > > [root at gssio1 ~]# mmlsconfig > > Configuration data for cluster ess-qstorage.it.northwestern.edu: > > ---------------------------------------------------------------- > > clusterName ess-qstorage.it.northwestern.edu > > clusterId 17746506346828356609 > > dmapiFileHandleSize 32 > > minReleaseLevel 4.2.0.1 > > ccrEnabled yes > > cipherList AUTHONLY > > [gss_ppc64] > > nsdRAIDBufferPoolSizePct 80 > > maxBufferDescs 2m > > prefetchPct 5 > > nsdRAIDTracks 128k > > nsdRAIDSmallBufferSize 256k > > nsdMaxWorkerThreads 3k > > nsdMinWorkerThreads 3k > > nsdRAIDSmallThreadRatio 2 > > nsdRAIDThreadsPerQueue 16 > > nsdRAIDEventLogToConsole all > > nsdRAIDFastWriteFSDataLimit 256k > > nsdRAIDFastWriteFSMetadataLimit 1M > > nsdRAIDReconstructAggressiveness 1 > > nsdRAIDFlusherBuffersLowWatermarkPct 20 > > nsdRAIDFlusherBuffersLimitPct 80 > > nsdRAIDFlusherTracksLowWatermarkPct 20 > > nsdRAIDFlusherTracksLimitPct 80 > > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > > nsdRAIDFlusherFWLogLimitMB 5000 > > nsdRAIDFlusherThreadsLowWatermark 1 > > nsdRAIDFlusherThreadsHighWatermark 512 > > nsdRAIDBlockDeviceMaxSectorsKB 8192 > > nsdRAIDBlockDeviceNrRequests 32 > > nsdRAIDBlockDeviceQueueDepth 16 > > nsdRAIDBlockDeviceScheduler deadline > > nsdRAIDMaxTransientStale2FT 1 > > nsdRAIDMaxTransientStale3FT 1 > > nsdMultiQueue 512 > > syncWorkerThreads 256 > > nsdInlineWriteMax 32k > > maxGeneralThreads 1280 > > maxReceiverThreads 128 > > nspdQueues 64 > > [common] > > maxblocksize 16m > > [ems1-fdr,compute,gss_ppc64] > > numaMemoryInterleave yes > > [gss_ppc64] > > maxFilesToCache 12k > > [ems1-fdr,compute] > > maxFilesToCache 128k > > [ems1-fdr,compute,gss_ppc64] > > flushedDataTarget 1024 > > flushedInodeTarget 1024 > > maxFileCleaners 1024 > > maxBufferCleaners 1024 > > logBufferCount 20 > > logWrapAmountPct 2 > > logWrapThreads 128 > > maxAllocRegionsPerNode 32 > > maxBackgroundDeletionThreads 16 > > maxInodeDeallocPrefetch 128 > > [gss_ppc64] > > maxMBpS 16000 > > [ems1-fdr,compute] > > maxMBpS 10000 > > [ems1-fdr,compute,gss_ppc64] > > worker1Threads 1024 > > worker3Threads 32 > > [gss_ppc64] > > ioHistorySize 64k > > [ems1-fdr,compute] > > ioHistorySize 4k > > [gss_ppc64] > > verbsRdmaMinBytes 16k > > [ems1-fdr,compute] > > verbsRdmaMinBytes 32k > > [ems1-fdr,compute,gss_ppc64] > > verbsRdmaSend yes > > [gss_ppc64] > > verbsRdmasPerConnection 16 > > [ems1-fdr,compute] > > verbsRdmasPerConnection 256 > > [gss_ppc64] > > verbsRdmasPerNode 3200 > > [ems1-fdr,compute] > > verbsRdmasPerNode 1024 > > [ems1-fdr,compute,gss_ppc64] > > verbsSendBufferMemoryMB 1024 > > verbsRdmasPerNodeOptimize yes > > verbsRdmaUseMultiCqThreads yes > > [ems1-fdr,compute] > > ignorePrefetchLUNCount yes > > [gss_ppc64] > > scatterBufferSize 256K > > [ems1-fdr,compute] > > scatterBufferSize 256k > > syncIntervalStrict yes > > [ems1-fdr,compute,gss_ppc64] > > nsdClientCksumTypeLocal ck64 > > nsdClientCksumTypeRemote ck64 > > [gss_ppc64] > > pagepool 72856M > > [ems1-fdr] > > pagepool 17544M > > [compute] > > pagepool 4g > > [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] > > verbsRdma enable > > [gss_ppc64] > > verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 > > [ems1-fdr] > > verbsPorts mlx5_0/1 mlx5_0/2 > > [qsched03-ib0,quser10-fdr,compute] > > verbsPorts mlx4_0/1 > > [common] > > autoload no > > [ems1-fdr,compute,gss_ppc64] > > maxStatCache 0 > > [common] > > envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 > > deadlockOverloadThreshold 0 > > deadlockDetectionThreshold 0 > > adminMode central > > > File systems in cluster ess-qstorage.it.northwestern.edu: > > --------------------------------------------------------- > > /dev/home > > /dev/hpc > > /dev/projects > > /dev/tthome > > On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches > wrote: > > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jan 11 22:33:24 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 11 Jan 2017 15:33:24 -0700 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From r.sobey at imperial.ac.uk Thu Jan 12 09:51:12 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 12 Jan 2017 09:51:12 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not? This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible. Food for thought though! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 11 January 2017 22:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES log files A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aleciarm at us.ibm.com Thu Jan 12 14:54:12 2017 From: aleciarm at us.ibm.com (Alecia A Ramsay) Date: Thu, 12 Jan 2017 09:54:12 -0500 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: The Spectrum Scale Knowledge Center does have a topic on collecting CES log files. This might be helpful (4.2.2 version): http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1pdg_ces_monitor_admin.htm Alecia A. Ramsay, PMP? Program Manager, New Technology Introduction IBM Systems - Storage aleciarm at us.ibm.com work: 919-435-6494; mobile: 651-260-4928 https://www-01.ibm.com/marketing/iwm/iwmdocs/web/cc/earlyprograms/systems.shtml From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/12/2017 04:51 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not? This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible. Food for thought though! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 11 January 2017 22:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES log files A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From christof.schmitt at us.ibm.com Thu Jan 12 18:06:48 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 12 Jan 2017 11:06:48 -0700 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: It looks like the patch for the mentioned bugzilla is in 4.2.2.0, but not in 4.2.1.2. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/12/2017 02:51 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not? This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible. Food for thought though! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 11 January 2017 22:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES log files A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mimarsh2 at vt.edu Fri Jan 13 19:50:10 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 13 Jan 2017 14:50:10 -0500 Subject: [gpfsug-discuss] Authorized Key Messages Message-ID: All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jan 13 20:14:03 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 13 Jan 2017 15:14:03 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: Brian, This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 02:50 PM Subject: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Fri Jan 13 20:19:25 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 13 Jan 2017 15:19:25 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: We are running 4.2.1 (there may be some point fixes we don't have) Any report of it being in this version? Brian On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop wrote: > Brian, > > This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. > > Regards, > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/13/2017 02:50 PM > Subject: [gpfsug-discuss] Authorized Key Messages > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > All, > > I just saw this message start popping up constantly on one our NSD Servers. > > [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist > > CCR Auth is disabled on all the NSD Servers. > > What other features/checks would look for the ccr keys? > > Thanks, > Brian Marshall > Virginia Tech_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jan 13 22:58:02 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 13 Jan 2017 17:58:02 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: Brian, I had to check again whether the fix in question was in 4.2.0.0 (as opposed to a newer mod release), but confirmed that it seems to be. So this could be a new or different problem than the one I was thinking about. Researching a bit further, I found another potential match (internal defect number 981469), but that should be fixed in 4.2.1 as well. I have not seen recent reports of this problem. Perhaps this could be pursued via a PMR. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 03:21 PM Subject: Re: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org We are running 4.2.1 (there may be some point fixes we don't have) Any report of it being in this version? Brian On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop wrote: Brian, This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 02:50 PM Subject: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Jan 13 23:30:05 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 13 Jan 2017 18:30:05 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: Our intent was to have ccr turned off since all nodes are quorum in the server cluster: Considering this: [root at cl001 ~]# mmfsadm dump config | grep -i ccr ! ccrEnabled 0 ccrMaxChallengeCheckRetries 4 ccr : 0 (cluster configuration repository) ccr : 1 (cluster configuration repository) Will this disable ccr? On Fri, Jan 13, 2017 at 5:58 PM, Felipe Knop wrote: > Brian, > > I had to check again whether the fix in question was in 4.2.0.0 (as > opposed to a newer mod release), but confirmed that it seems to be. So > this could be a new or different problem than the one I was thinking about. > > Researching a bit further, I found another potential match (internal > defect number 981469), but that should be fixed in 4.2.1 as well. I have > not seen recent reports of this problem. > > Perhaps this could be pursued via a PMR. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/13/2017 03:21 PM > Subject: Re: [gpfsug-discuss] Authorized Key Messages > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running 4.2.1 (there may be some point fixes we don't have) > > Any report of it being in this version? > > Brian > > On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop <*knop at us.ibm.com* > > wrote: > Brian, > > This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. > > Regards, > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > *(845) 433-9314* <(845)%20433-9314> T/L 293-9314 > > > > > > From: Brian Marshall <*mimarsh2 at vt.edu* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 01/13/2017 02:50 PM > Subject: [gpfsug-discuss] Authorized Key Messages > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > All, > > I just saw this message start popping up constantly on one our NSD Servers. > > [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist > > CCR Auth is disabled on all the NSD Servers. > > What other features/checks would look for the ccr keys? > > Thanks, > Brian Marshall > Virginia Tech_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jan 13 23:48:37 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 13 Jan 2017 18:48:37 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: "! ccrEnabled 0" does indicate that CCR is disabled on the (server) cluster. In fact, instances of this '/var/mmfs/ssl/authorized_ccr_keys' does not exist message have been seen in clusters where CCR was disabled. It's just somewhat puzzling that the error message is appears in 4.2.1 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "J. Eric Wonderley" To: gpfsug main discussion list Date: 01/13/2017 06:30 PM Subject: Re: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org Our intent was to have ccr turned off since all nodes are quorum in the server cluster: Considering this: [root at cl001 ~]# mmfsadm dump config | grep -i ccr ! ccrEnabled 0 ccrMaxChallengeCheckRetries 4 ccr : 0 (cluster configuration repository) ccr : 1 (cluster configuration repository) Will this disable ccr? On Fri, Jan 13, 2017 at 5:58 PM, Felipe Knop wrote: Brian, I had to check again whether the fix in question was in 4.2.0.0 (as opposed to a newer mod release), but confirmed that it seems to be. So this could be a new or different problem than the one I was thinking about. Researching a bit further, I found another potential match (internal defect number 981469), but that should be fixed in 4.2.1 as well. I have not seen recent reports of this problem. Perhaps this could be pursued via a PMR. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 03:21 PM Subject: Re: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org We are running 4.2.1 (there may be some point fixes we don't have) Any report of it being in this version? Brian On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop wrote: Brian, This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 02:50 PM Subject: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Sun Jan 15 21:18:31 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Sun, 15 Jan 2017 21:18:31 +0000 Subject: [gpfsug-discuss] GUI "maintenance mode" Message-ID: Is there a way, perhaps through the CLI, to set a node in maintenance mode so the GUI alerting doesn't flag it up as being down? Pretty sure the option isn't available through the GUI's GUI if you'll pardon the expression. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Jan 17 21:50:53 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 17 Jan 2017 16:50:53 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM Message-ID: UG, I have a GPFS filesystem. I have a OpenStack private cloud. What is the best way for Nova Compute VMs to have access to data inside the GPFS filesystem? 1)Should VMs mount GPFS directly with a GPFS client? 2) Should the hypervisor mount GPFS and share to nova computes? 3) Should I create GPFS protocol servers that allow nova computes to mount of NFS? All advice is welcome. Best, Brian Marshall Virginia Tech -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Tue Jan 17 21:16:20 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Tue, 17 Jan 2017 16:16:20 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs Message-ID: I have messages like these frequent my logs: Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 vendor_err 136 Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error IBV_WC_REM_ACCESS_ERR index 23 Any ideas on cause..? -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jan 18 00:47:04 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Tue, 17 Jan 2017 19:47:04 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> I think the 1st option creates the challenges both with security (e.g. do you fully trust the users of your VMs not to do bad things as root either maliciously or accidentally? how do you ensure userids are properly mapped inside the guest?) and logistically (as VMs come and go how do you automate adding them/removing them to/from the GPFS cluster). I think the 2nd option is ideal perhaps using something like 9p (http://www.linux-kvm.org/page/9p_virtio) to export filesystems from the hypervisor to the guest. I'm not sure how you would integrate this with Nova and I've heard from others that there are stability issues, but I can't comment first hand. Another option might be to NFS/CIFS export the filesystems from the hypervisor to the guests via the 169.254.169.254 metadata address although I don't know how feasible that may or may not be. The advantage to using the metadata address is it should scale well and it should take the pain out of a guest mapping an IP address to its local hypervisor using an external method. Perhaps number 3 is the best way to go, especially (arguably only) if you use kerberized NFS or SMB. That way you don't have to trust anything about the guest and you theoretically should get decent performance. I'm really curious what other folks have done on this front. -Aaron On 1/17/17 4:50 PM, Brian Marshall wrote: > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to > mount of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From malone12 at illinois.edu Wed Jan 18 03:05:15 2017 From: malone12 at illinois.edu (Maloney, John Daniel) Date: Wed, 18 Jan 2017 03:05:15 +0000 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> References: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> Message-ID: <6CADE9B2-3691-4F44-B241-DABA02385B42@illinois.edu> I agree with Aaron on option 1, trusting users to do nothing malicious would be quite a stretch for most people?s use cases. Even if they do, if their user?s credentials getting stolen, and then used by someone else it could be a real issue as the hacker wouldn?t have to get lucky and find a VM with an un-patched root escalation vulnerability. Security aside, you?ll probably want to make sure your VMs have an external IP that is able to be reached by the GPFS cluster. We found having GPFS route through the Openstack NAT to be possible, but tricky (though this was an older version of Openstack?could be better now?). Using the external IP may be the natural way for most folks, but wanted to point it out none-the-less. We haven?t done much in regards to option 2, we?ve done work using native clients on the hypervisors to provide cinder/glance storage, but not to share other data into the VM?s. Currently use option 3 to export group?s project directories to their VMs using the CES protocol nodes. It?s getting the job done right now (have close to 100 VMs mounting from it). I would definitely recommend giving your maxFilesToCache and maxStatCache parameters a big bump from defaults on the export nodes if you weren?t planning to already (set mine at 1,000,000 on each of those). We saw that become a point of contention with our user?s workloads. That change was implemented fairly recently and so far, so good. Aaron?s point about logistics from his answer to option 1 is relevant here too, especially if you have high VM turnover rate where IP addresses are recycled and different projects are getting exported. You?ll want to keep track of VM?s and exports to prevent a new VM from picking up an old IP that has access on an export it isn?t supposed to because it hasn?t been flushed out. In our situation there are 30-40 projects, all names of them known to users who ls the project directory, wouldn?t take much for them to spin up a new VM and give them all a try. I agree this is a really interesting topic, there?s a lot of ways to come at this so hopefully more folks chime in on what they?re doing. Best, J.D. Maloney Storage Engineer | Storage Enabling Technologies Group National Center for Supercomputing Applications (NCSA) On 1/17/17, 6:47 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister" wrote: I think the 1st option creates the challenges both with security (e.g. do you fully trust the users of your VMs not to do bad things as root either maliciously or accidentally? how do you ensure userids are properly mapped inside the guest?) and logistically (as VMs come and go how do you automate adding them/removing them to/from the GPFS cluster). I think the 2nd option is ideal perhaps using something like 9p (http://www.linux-kvm.org/page/9p_virtio) to export filesystems from the hypervisor to the guest. I'm not sure how you would integrate this with Nova and I've heard from others that there are stability issues, but I can't comment first hand. Another option might be to NFS/CIFS export the filesystems from the hypervisor to the guests via the 169.254.169.254 metadata address although I don't know how feasible that may or may not be. The advantage to using the metadata address is it should scale well and it should take the pain out of a guest mapping an IP address to its local hypervisor using an external method. Perhaps number 3 is the best way to go, especially (arguably only) if you use kerberized NFS or SMB. That way you don't have to trust anything about the guest and you theoretically should get decent performance. I'm really curious what other folks have done on this front. -Aaron On 1/17/17 4:50 PM, Brian Marshall wrote: > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to > mount of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Wed Jan 18 08:46:53 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 18 Jan 2017 08:46:53 +0000 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> References: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> Message-ID: >Another option might be to NFS/CIFS export the >filesystems from the hypervisor to the guests via the 169.254.169.254 >metadata address although I don't know how feasible that may or may not Doesn't the metadata IP site on the network nodes though and not the hypervisor? We currently have created interfaces on out net nodes attached to the appropriate VLAN/VXLAN and then run CES on top of that. The problem with this is if you have the same subnet existing in two networks, then you have a problem. I had some discussion with some of the IBM guys about the possibility of using a different CES protocol group and running multiple ganesha servers (maybe a container attached to the net?) so you could then have different NFS configs on different ganesha instances with CES managing a floating IP that could exist multiple times. There were some potential issues in the way the CES HA bits work though with this approach. Simon From S.J.Thompson at bham.ac.uk Wed Jan 18 08:59:48 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 18 Jan 2017 08:59:48 +0000 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: References: Message-ID: I'd be inclined to look at something like: ibqueryerrors -s PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors -c And see if you have a high number of symbol errors, might be a cable needs replugging or replacing. Simon From: > on behalf of "J. Eric Wonderley" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 17 January 2017 at 21:16 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs I have messages like these frequent my logs: Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 vendor_err 136 Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error IBV_WC_REM_ACCESS_ERR index 23 Any ideas on cause..? -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jan 18 15:22:51 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 18 Jan 2017 10:22:51 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: References: Message-ID: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> I'm curious about this too. We see these messages sometimes when things have gone horribly wrong but also sometimes during recovery events. Here's a recent one: loremds20 (manager/nsd node): Mon Jan 16 14:19:02.048 2017: [E] VERBS RDMA rdma read error IBV_WC_REM_ACCESS_ERR to 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 vendor_err 136 Mon Jan 16 14:19:02.049 2017: [E] VERBS RDMA closed connection to 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 due to RDMA read error IBV_WC_REM_ACCESS_ERR index 11 lorej006 (client): Mon Jan 16 14:19:01.990 2017: [N] VERBS RDMA closed connection to 10.101.53.18 (loremds18) on mlx5_0 port 1 fabnum 3 index 2 Mon Jan 16 14:19:01.995 2017: [N] VERBS RDMA closed connection to 10.101.53.19 (loremds19) on mlx5_0 port 1 fabnum 3 index 0 Mon Jan 16 14:19:01.997 2017: [I] Recovering nodes: 10.101.53.18 10.101.53.19 Mon Jan 16 14:19:02.047 2017: [W] VERBS RDMA async event IBV_EVENT_QP_ACCESS_ERR on mlx5_0 qp 0x7fffe550f1c8. Mon Jan 16 14:19:02.051 2017: [E] VERBS RDMA closed connection to 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 error 733 index 1 Mon Jan 16 14:19:02.071 2017: [I] Recovered 2 nodes for file system tnb32. Mon Jan 16 14:19:02.140 2017: [I] VERBS RDMA connecting to 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 index 0 Mon Jan 16 14:19:02.160 2017: [I] VERBS RDMA connected to 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 sl 0 index 0 I had just shut down loremds18 and loremds19 so there was certainly recovery taking place and during that time is when the error seems to have occurred. I looked up the meaning of IBV_WC_REM_ACCESS_ERR here (http://www.rdmamojo.com/2013/02/15/ibv_poll_cq/) and see this: IBV_WC_REM_ACCESS_ERR (10) - Remote Access Error: a protection error occurred on a remote data buffer to be read by an RDMA Read, written by an RDMA Write or accessed by an atomic operation. This error is reported only on RDMA operations or atomic operations. Relevant for RC QPs. my take on it during recovery it seems like one end of the connection more or less hanging up on the other end (e.g. Connection reset by peer /ECONNRESET). But like I said at the start, we also see this when there something has gone awfully wrong. -Aaron On 1/18/17 3:59 AM, Simon Thompson (Research Computing - IT Services) wrote: > I'd be inclined to look at something like: > > ibqueryerrors -s > PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors > -c > > And see if you have a high number of symbol errors, might be a cable > needs replugging or replacing. > > Simon > > From: > on behalf of "J. Eric > Wonderley" > > Reply-To: "gpfsug-discuss at spectrumscale.org > " > > > Date: Tuesday, 17 January 2017 at 21:16 > To: "gpfsug-discuss at spectrumscale.org > " > > > Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs > > I have messages like these frequent my logs: > Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error > IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 > vendor_err 136 > Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to > 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error > IBV_WC_REM_ACCESS_ERR index 23 > > Any ideas on cause..? > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jan 18 15:56:16 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 18 Jan 2017 15:56:16 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Message-ID: Hi All, We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.b.mills at nasa.gov Wed Jan 18 16:10:51 2017 From: jonathan.b.mills at nasa.gov (Jonathan Mills) Date: Wed, 18 Jan 2017 11:10:51 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> Message-ID: <8d41b8c8-eb84-3d1c-eec2-d26f1816108b@nasa.gov> On 1/18/17 3:46 AM, Simon Thompson (Research Computing - IT Services) wrote: > >> Another option might be to NFS/CIFS export the >> filesystems from the hypervisor to the guests via the 169.254.169.254 >> metadata address although I don't know how feasible that may or may not > > Doesn't the metadata IP site on the network nodes though and not the > hypervisor? Not when Neutron is in DVR mode. It is intercepted at the hypervisor and redirected to the neutron-ns-metadata-proxy. See below: [root at gpcc003 ~]# ip netns exec qrouter-bc4aa217-5128-4eec-b9af-67923dae319a iptables -t nat -nvL neutron-l3-agent-PREROUTING Chain neutron-l3-agent-PREROUTING (1 references) pkts bytes target prot opt in out source destination 19 1140 REDIRECT tcp -- qr-+ * 0.0.0.0/0 169.254.169.254 tcp dpt:80 redir ports 9697 281 12650 DNAT all -- rfp-bc4aa217-5 * 0.0.0.0/0 169.154.180.32 to:10.0.4.22 [root at gpcc003 ~]# ip netns exec qrouter-bc4aa217-5128-4eec-b9af-67923dae319a netstat -tulpn |grep 9697 tcp 0 0 0.0.0.0:9697 0.0.0.0:* LISTEN 28130/python2 [root at gpcc003 ~]# ps aux |grep 28130 neutron 28130 0.0 0.0 286508 41364 ? S Jan04 0:02 /usr/bin/python2 /bin/neutron-ns-metadata-proxy --pid_file=/var/lib/neutron/external/pids/bc4aa217-5128-4eec-b9af-67923dae319a.pid --metadata_proxy_socket=/var/lib/neutron/metadata_proxy --router_id=bc4aa217-5128-4eec-b9af-67923dae319a --state_path=/var/lib/neutron --metadata_port=9697 --metadata_proxy_user=989 --metadata_proxy_group=986 --verbose --log-file=neutron-ns-metadata-proxy-bc4aa217-5128-4eec-b9af-67923dae319a.log --log-dir=/var/log/neutron root 31220 0.0 0.0 112652 972 pts/1 S+ 11:08 0:00 grep --color=auto 28130 > > We currently have created interfaces on out net nodes attached to the > appropriate VLAN/VXLAN and then run CES on top of that. > > The problem with this is if you have the same subnet existing in two > networks, then you have a problem. > > I had some discussion with some of the IBM guys about the possibility of > using a different CES protocol group and running multiple ganesha servers > (maybe a container attached to the net?) so you could then have different > NFS configs on different ganesha instances with CES managing a floating IP > that could exist multiple times. > > There were some potential issues in the way the CES HA bits work though > with this approach. > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Jonathan Mills / jonathan.mills at nasa.gov NASA GSFC / NCCS HPC (606.2) Bldg 28, Rm. S230 / c. 252-412-5710 From mimarsh2 at vt.edu Wed Jan 18 16:22:12 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Wed, 18 Jan 2017 11:22:12 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: To answer some more questions: What sort of workload will your Nova VM's be running? This is largely TBD but we anticipate webapps and other non-batch ways of interacting with and post processing data that has been computed on HPC batch systems. For example a user might host a website that allows users to view pieces of a large data set and do some processing in private cloud or kick off larger jobs on HPC clusters How many VM's are you running? This work is still in the design / build phase. We have 48 servers slated for the project. At max maybe 500 VMs; again this is a pretty wild estimate. This is a new service we are looking to provide What is your Network interconnect between the Scale Storage cluster and the Nova Compute cluster Each nova node has a dual 10gigE connection to switches that uplink to our core 40 gigE switches were NSD Servers are directly connectly. The information so far has been awesome. Thanks everyone. I am definitely leaning towards option #3 of creating protocol servers. Are there any design/build white papers targetting the virutalization use case? Thanks, Brian On Tue, Jan 17, 2017 at 5:55 PM, Andrew Beattie wrote: > HI Brian, > > > Couple of questions for you: > > What sort of workload will your Nova VM's be running? > How many VM's are you running? > What is your Network interconnect between the Scale Storage cluster and > the Nova Compute cluster > > I have cc'd Jake Carrol from University of Queensland in on the email as I > know they have done some basic performance testing using Scale to provide > storage to Openstack. > One of the issues that they found was the Openstack network translation > was a performance limiting factor. > > I think from memory the best performance scenario they had was, when they > installed the scale client locally into the virtual machines > > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Brian Marshall > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM > Date: Wed, Jan 18, 2017 7:51 AM > > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to mount > of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Wed Jan 18 16:58:24 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Wed, 18 Jan 2017 11:58:24 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> References: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> Message-ID: As background, we recently upgraded GPFS from 4.2.0 to 4.2.1 and updated the Mellanox OFED on our compute cluster to allow it to move from CentOS 7.1 to 7.2 We do some transient warnings from the Mellanox switch gear about various port counters that we are tracking down with them. Jobs and filesystem seem stable, but the logs are concerning. On Wed, Jan 18, 2017 at 10:22 AM, Aaron Knister wrote: > I'm curious about this too. We see these messages sometimes when things > have gone horribly wrong but also sometimes during recovery events. Here's > a recent one: > > loremds20 (manager/nsd node): > Mon Jan 16 14:19:02.048 2017: [E] VERBS RDMA rdma read error > IBV_WC_REM_ACCESS_ERR to 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 > vendor_err 136 > Mon Jan 16 14:19:02.049 2017: [E] VERBS RDMA closed connection to > 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 due to RDMA read error > IBV_WC_REM_ACCESS_ERR index 11 > > lorej006 (client): > Mon Jan 16 14:19:01.990 2017: [N] VERBS RDMA closed connection to > 10.101.53.18 (loremds18) on mlx5_0 port 1 fabnum 3 index 2 > Mon Jan 16 14:19:01.995 2017: [N] VERBS RDMA closed connection to > 10.101.53.19 (loremds19) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:01.997 2017: [I] Recovering nodes: 10.101.53.18 > 10.101.53.19 > Mon Jan 16 14:19:02.047 2017: [W] VERBS RDMA async event > IBV_EVENT_QP_ACCESS_ERR on mlx5_0 qp 0x7fffe550f1c8. > Mon Jan 16 14:19:02.051 2017: [E] VERBS RDMA closed connection to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 error 733 index 1 > Mon Jan 16 14:19:02.071 2017: [I] Recovered 2 nodes for file system tnb32. > Mon Jan 16 14:19:02.140 2017: [I] VERBS RDMA connecting to 10.101.53.20 > (loremds20) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:02.160 2017: [I] VERBS RDMA connected to 10.101.53.20 > (loremds20) on mlx5_0 port 1 fabnum 3 sl 0 index 0 > > I had just shut down loremds18 and loremds19 so there was certainly > recovery taking place and during that time is when the error seems to have > occurred. > > I looked up the meaning of IBV_WC_REM_ACCESS_ERR here ( > http://www.rdmamojo.com/2013/02/15/ibv_poll_cq/) and see this: > > IBV_WC_REM_ACCESS_ERR (10) - Remote Access Error: a protection error > occurred on a remote data buffer to be read by an RDMA Read, written by an > RDMA Write or accessed by an atomic operation. This error is reported only > on RDMA operations or atomic operations. Relevant for RC QPs. > > my take on it during recovery it seems like one end of the connection more > or less hanging up on the other end (e.g. Connection reset by peer > /ECONNRESET). > > But like I said at the start, we also see this when there something has > gone awfully wrong. > > -Aaron > > On 1/18/17 3:59 AM, Simon Thompson (Research Computing - IT Services) > wrote: > >> I'd be inclined to look at something like: >> >> ibqueryerrors -s >> PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemot >> ePhysicalErrors >> -c >> >> And see if you have a high number of symbol errors, might be a cable >> needs replugging or replacing. >> >> Simon >> >> From: > > on behalf of "J. Eric >> Wonderley" > >> Reply-To: "gpfsug-discuss at spectrumscale.org >> " >> > mscale.org>> >> Date: Tuesday, 17 January 2017 at 21:16 >> To: "gpfsug-discuss at spectrumscale.org >> " >> > mscale.org>> >> Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs >> >> I have messages like these frequent my logs: >> Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error >> IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 >> vendor_err 136 >> Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to >> 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error >> IBV_WC_REM_ACCESS_ERR index 23 >> >> Any ideas on cause..? >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From veb2005 at med.cornell.edu Wed Jan 18 22:54:10 2017 From: veb2005 at med.cornell.edu (Vanessa Borcherding) Date: Wed, 18 Jan 2017 17:54:10 -0500 Subject: [gpfsug-discuss] Issue with X forwarding Message-ID: Hi All, We've got a new-ish 4.1.1.0 Advanced cluster and we've run into a strange problem: users who have their home directory on the GPFS filesystem cannot do X11 forwarding. They get the following error: "/usr/bin/xauth: error in locking authority file /home/user/.Xauthority" The file ~/.Xauthority is there and also a new one ~/.Xauthority-c. Similarly, "xauth -b" also fails: Attempting to break locks on authority file /home/user/.Xauthority xauth: error in locking authority file /home/user/.Xauthority This behavior happens regardless of the client involved, and happens across multiple OS/kernel versions, and if GPFS is mounted natively or via NFS export. For any given host, if the user's home directory is moved to another NFS-exported location, X forwarding works correctly. Has anyone seen this before, or have any idea as to where it's coming from? Thanks, Vanessa -- * * * * * Vanessa Borcherding Director, Scientific Computing Technology Manager - Applied Bioinformatics Core Dept. of Physiology and Biophysics Institute for Computational Biomedicine Weill Cornell Medical College (212) 746-6281 - office (917) 861-9777 - cell * * * * * -------------- next part -------------- An HTML attachment was scrubbed... URL: From farid.chabane at ymail.com Thu Jan 19 06:00:54 2017 From: farid.chabane at ymail.com (FC) Date: Thu, 19 Jan 2017 06:00:54 +0000 (UTC) Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> Message-ID: <51281598.14159900.1484805654772@mail.yahoo.com> Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Jan 19 08:45:20 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 19 Jan 2017 09:45:20 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 19 15:46:55 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jan 2017 15:46:55 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: Hi Olaf, The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? Thanks... Kevin On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: have you checked, where th fsmgr runs as you have nodes with different code levels mmlsmgr From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 01/18/2017 04:57 PM Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Jan 19 16:05:41 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 19 Jan 2017 17:05:41 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 19 16:25:20 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jan 2017 16:25:20 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> Hi Olaf, We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: 1. mmrepquota reports mostly GIDs, only a few group names 2. /etc/nsswitch.conf says to look at files first 3. the GID is in /etc/group 4. length of group name doesn?t matter I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) Thanks? Kevin On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster cheers From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 01/19/2017 04:48 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? Thanks... Kevin On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: have you checked, where th fsmgr runs as you have nodes with different code levels mmlsmgr From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 01/18/2017 04:57 PM Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Thu Jan 19 16:36:42 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Thu, 19 Jan 2017 17:36:42 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> Message-ID: <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek From peserocka at gmail.com Thu Jan 19 17:07:55 2017 From: peserocka at gmail.com (Peter Serocka) Date: Fri, 20 Jan 2017 01:07:55 +0800 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: <7D8E5B3D-6BA9-4362-984D-6A74448FA7BC@gmail.com> Any caching in effect? Like nscd which is configured separately in /etc/nscd.conf Any insights from strace?ing mmrepquota? For example, when a plain ls -l doesn?t look groups up in /etc/group but queries from nscd instead, strace output has something like: connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0 sendto(4, "\2\0\0\0\f\0\0\0\6\0\0\0group\0", 18, MSG_NOSIGNAL, NULL, 0) = 18 ? Peter > On 2017 Jan 19 Thu, at 23:46, Buterbaugh, Kevin L wrote: > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > >> On Jan 19, 2017, at 2:45 AM, Olaf Weiser wrote: >> >> have you checked, where th fsmgr runs as you have nodes with different code levels >> >> mmlsmgr >> >> >> >> >> From: "Buterbaugh, Kevin L" >> To: gpfsug main discussion list >> Date: 01/18/2017 04:57 PM >> Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi All, >> >> We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. >> >> From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. >> >> However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). >> >> I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) >> >> I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? >> >> Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Thu Jan 19 17:16:27 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 19 Jan 2017 18:16:27 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Thu Jan 19 18:07:32 2017 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 19 Jan 2017 19:07:32 +0100 Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 In-Reply-To: <51281598.14159900.1484805654772@mail.yahoo.com> References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> <51281598.14159900.1484805654772@mail.yahoo.com> Message-ID: Hi Farid, there is no official way for disabling the system health monitoring because other components rely on it (e.g. GUI, CES, Install Toolkit,..) If you are fine with the consequences you can just delete the mmsysmonitor.conf, which will prevent the monitor from starting. During our testing we did not see a significant performance impact caused by the monitoring. In 4.2.2 some component monitors (e.g. disk) have been further improved to reduce polling and use notifications instead. Nevertheless, I would like to better understand what the issue is. What kind of workload do you run ? Do you see spikes in CPU usage every 30 seconds ? Is it the same on all cluster nodes or just on some of them ? Could you send us the output of "mmhealth node show -v" to see which monitors are active. It might make sense to open a PMR to get this issue fixed. Thanks. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale - Release Lead Architect (4.2.X Release) System Health and Problem Determination Architect IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: FC To: "gpfsug-discuss at spectrumscale.org" Date: 01/19/2017 07:06 AM Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jan 19 18:21:18 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 19 Jan 2017 18:21:18 +0000 Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 In-Reply-To: References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> <51281598.14159900.1484805654772@mail.yahoo.com>, Message-ID: On some of our nodes we were regularly seeing procees hung timeouts in dmesg from a python process, which I vaguely thought was related to the monitoring process (though we have other python bits from openstack running on these boxes). These are all running 4.2.2.0 code Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mathias Dietz [MDIETZ at de.ibm.com] Sent: 19 January 2017 18:07 To: FC; gpfsug main discussion list Subject: Re: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Hi Farid, there is no official way for disabling the system health monitoring because other components rely on it (e.g. GUI, CES, Install Toolkit,..) If you are fine with the consequences you can just delete the mmsysmonitor.conf, which will prevent the monitor from starting. During our testing we did not see a significant performance impact caused by the monitoring. In 4.2.2 some component monitors (e.g. disk) have been further improved to reduce polling and use notifications instead. Nevertheless, I would like to better understand what the issue is. What kind of workload do you run ? Do you see spikes in CPU usage every 30 seconds ? Is it the same on all cluster nodes or just on some of them ? Could you send us the output of "mmhealth node show -v" to see which monitors are active. It might make sense to open a PMR to get this issue fixed. Thanks. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale - Release Lead Architect (4.2.X Release) System Health and Problem Determination Architect IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: FC To: "gpfsug-discuss at spectrumscale.org" Date: 01/19/2017 07:06 AM Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Thu Jan 19 21:22:40 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Thu, 19 Jan 2017 21:22:40 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz>, Message-ID: <1484860960203.43563@csiro.au> It's not something to do with the value of the GID, like being less or greater than some number? ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Friday, 20 January 2017 3:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x in my eyes.. that's the hint .. not to wait until all 700 clients 'll have been updated .. before open PMR .. ;-) ... From: Lukas Hejtmanek To: gpfsug main discussion list Date: 01/19/2017 05:37 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 19 21:51:07 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jan 2017 21:51:07 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <1484860960203.43563@csiro.au> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> Message-ID: <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> Hi All, Let me try to answer some questions that have been raised by various list members? 1. I am not using nscd. 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me Here?s another point ? I am 95% sure that I have a client that was running 4.2.1.1 and mmrepquota displayed the group names ? I then upgraded GPFS on it ? no other changes ? and now it?s mostly GID?s. I?m not 100% sure because output scrolled out of my terminal buffer. Thanks to all for the suggestions ? please feel free to keep them coming. To any of the GPFS team on this mailing list, at least one other person has reported the same behavior ? is this a known bug? Kevin On Jan 19, 2017, at 3:22 PM, Greg.Lehmann at csiro.au wrote: It's not something to do with the value of the GID, like being less or greater than some number? ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, 20 January 2017 3:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x in my eyes.. that's the hint .. not to wait until all 700 clients 'll have been updated .. before open PMR .. ;-) ... From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 01/19/2017 05:37 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at uni-mainz.de Fri Jan 20 08:41:26 2017 From: martin at uni-mainz.de (Christoph Martin) Date: Fri, 20 Jan 2017 09:41:26 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> Message-ID: Hi, I have a system with two servers with GPFS 4.2.1.2 on SLES 12.1 and some clients with GPFS 4.2.2.1 on SLES 11 and Centos 7. mmrepquota shows on all systems group names. I still have to upgrade the servers to 4.2.2.1. Christoph -- ============================================================================ Christoph Martin, Leiter Unix-Systeme Zentrum f?r Datenverarbeitung, Uni-Mainz, Germany Anselm Franz von Bentzel-Weg 12, 55128 Mainz Telefon: +49(6131)3926337 Instant-Messaging: Jabber: martin at uni-mainz.de (Siehe http://www.zdv.uni-mainz.de/4010.php) -------------- next part -------------- A non-text attachment was scrubbed... Name: martin.vcf Type: text/x-vcard Size: 421 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From Achim.Rehor at de.ibm.com Fri Jan 20 09:01:12 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Fri, 20 Jan 2017 10:01:12 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu><20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From farid.chabane at ymail.com Fri Jan 20 09:02:32 2017 From: farid.chabane at ymail.com (FC) Date: Fri, 20 Jan 2017 09:02:32 +0000 (UTC) Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 In-Reply-To: References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> <51281598.14159900.1484805654772@mail.yahoo.com> Message-ID: <1898813661.15589480.1484902952833@mail.yahoo.com> Hi Mathias, It's OK when we remove the configuration file, the process doens't start. The problem occurs mainly with our compute nodes (all of them) and we don't use GUI and CES. Ideed, I confirm we don't see performance impact with Linpack running on more than hundred nodes, it appears especially when there is a lot of communications wich is the case of our applications, our high speed network is based on Intel OmniPath Fabric. We are seeing irregular iteration time every 30 sec. By Enabling HyperThreading, the issue is a little bit hidden but still there. By using less cores per nodes (26 instead of 28), we don't see this behavior as if it needs one core for mmsysmon process. I agree with you, might be good idea to open a PMR... Please find below the output of mmhealth node show --verbose Node status:???????????? HEALTHY Component??????????????? Status?????????????????? Reasons ------------------------------------------------------------------- GPFS???????????????????? HEALTHY????????????????? - NETWORK????????????????? HEALTHY????????????????? - ? ib0????????????????????? HEALTHY????????????????? - FILESYSTEM?????????????? HEALTHY????????????????? - ? gpfs1??????????????????? HEALTHY????????????????? - ? gpfs2??????????????????? HEALTHY????????????????? - DISK???????????????????? HEALTHY????????????????? - Thanks Farid Le Jeudi 19 janvier 2017 19h21, Simon Thompson (Research Computing - IT Services) a ?crit : On some of our nodes we were regularly seeing procees hung timeouts in dmesg from a python process, which I vaguely thought was related to the monitoring process (though we have other python bits from openstack running on these boxes). These are all running 4.2.2.0 code Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mathias Dietz [MDIETZ at de.ibm.com] Sent: 19 January 2017 18:07 To: FC; gpfsug main discussion list Subject: Re: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Hi Farid, there is no official way for disabling the system health monitoring because other components rely on it (e.g. GUI, CES, Install Toolkit,..) If you are fine with the consequences you can just delete the mmsysmonitor.conf, which will prevent the monitor from starting. During our testing we did not see a significant performance impact caused by the monitoring. In 4.2.2 some component monitors (e.g. disk) have been further improved to reduce polling and use notifications instead. Nevertheless, I would like to better understand what the issue is. What kind of workload do you run ? Do you see spikes in CPU usage every 30 seconds ? Is it the same on all cluster nodes or just on some of them ? Could you send us the output of "mmhealth node show -v" to see which monitors are active. It might make sense to open a PMR to get this issue fixed. Thanks. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale - Release Lead Architect (4.2.X Release) System Health and Problem Determination Architect IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From:? ? ? ? FC To:? ? ? ? "gpfsug-discuss at spectrumscale.org" Date:? ? ? ? 01/19/2017 07:06 AM Subject:? ? ? ? [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Sent by:? ? ? ? gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Fri Jan 20 09:45:04 2017 From: st.graf at fz-juelich.de (Stephan Graf) Date: Fri, 20 Jan 2017 10:45:04 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: Guten Morgen herr Rehor! Ich habe gerade geguckt. Auf dem Knoten, auf dem wir das mmlsquota -g Problem haben, sehe ich auch beim mmlsrepquota -g, dass einige Gruppen nur numerisch ausgegeben werden. Ich kann gerne einen PMR dazu ?ffnen. Viele Gr??e, Stephan Graf On 01/20/17 10:01, Achim Rehor wrote: fully agreed, there are PMRs open on "mmlsquota -g failes : no such group" where the handling of group names vs. ids is being tracked. a PMR on mmrepquota and a slightly different facette of a similar problem might give more and faster insight and solution. Mit freundlichen Gr??en / Kind regards Achim Rehor ________________________________ Software Technical Support Specialist AIX/ Emea HPC Support [cid:part1.A7833F18.D0EA2498 at fz-juelich.de] IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services ________________________________ Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke, Dieter Scholz, Gregor Pillen, Ivo Koerner, Christian Noll Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: Olaf Weiser/Germany/IBM at IBMDE To: gpfsug main discussion list Date: 01/19/2017 06:17 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ in my eyes.. that's the hint .. not to wait until all 700 clients 'll have been updated .. before open PMR .. ;-) ... From: Lukas Hejtmanek To: gpfsug main discussion list Date: 01/19/2017 05:37 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Stephan Graf Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From st.graf at fz-juelich.de Fri Jan 20 10:22:09 2017 From: st.graf at fz-juelich.de (Stephan Graf) Date: Fri, 20 Jan 2017 11:22:09 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> Message-ID: Sorry for the mail. I just can tell, that we are facing the same issue: We run GPFS 4.1.1.11 & 4.2.1.2 In both versions the mmlsquota -g fails. I also tried the mmrepquota -g command on GPFS 4.2.1.2, and some groups are displayed only numerical. Stephan On 01/20/17 09:41, Christoph Martin wrote: Hi, I have a system with two servers with GPFS 4.2.1.2 on SLES 12.1 and some clients with GPFS 4.2.2.1 on SLES 11 and Centos 7. mmrepquota shows on all systems group names. I still have to upgrade the servers to 4.2.2.1. Christoph _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Stephan Graf Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Fri Jan 20 10:54:37 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Fri, 20 Jan 2017 11:54:37 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu><20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From duersch at us.ibm.com Fri Jan 20 14:14:23 2017 From: duersch at us.ibm.com (Steve Duersch) Date: Fri, 20 Jan 2017 09:14:23 -0500 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: Kevin, Please go ahead and open a PMR. Cursorily, we don't know of an obvious known bug. Thank you. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York gpfsug-discuss-bounces at spectrumscale.org wrote on 01/19/2017 04:52:02 PM: > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 01/19/2017 04:52 PM > Subject: gpfsug-discuss Digest, Vol 60, Issue 47 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: mmrepquota and group names in GPFS 4.2.2.x > (Buterbaugh, Kevin L) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 19 Jan 2017 21:51:07 +0000 > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS > 4.2.2.x > Message-ID: <31F584FD-A926-4D86-B365-63EA244DEE45 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > Let me try to answer some questions that have been raised by various > list members? > > 1. I am not using nscd. > 2. getent group with either a GID or a group name resolves GID?s / > names that are being printed as GIDs by mmrepquota > 3. The GID?s in question are all in a normal range ? i.e. some > group names that are being printed by mmrepquota have GIDs ?close? > to others that are being printed as GID?s > 4. strace?ing mmrepquota doesn?t show anything relating to nscd or > anything that jumps out at me > > Here?s another point ? I am 95% sure that I have a client that was > running 4.2.1.1 and mmrepquota displayed the group names ? I then > upgraded GPFS on it ? no other changes ? and now it?s mostly GID?s. > I?m not 100% sure because output scrolled out of my terminal buffer. > > Thanks to all for the suggestions ? please feel free to keep them > coming. To any of the GPFS team on this mailing list, at least one > other person has reported the same behavior ? is this a known bug? > > Kevin > > On Jan 19, 2017, at 3:22 PM, Greg.Lehmann at csiro.au< > mailto:Greg.Lehmann at csiro.au> wrote: > > > It's not something to do with the value of the GID, like being less > or greater than some number? > > ________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org discuss-bounces at spectrumscale.org> mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Olaf > Weiser > > Sent: Friday, 20 January 2017 3:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > > in my eyes.. that's the hint .. not to wait until all 700 clients > 'll have been updated .. before open PMR .. ;-) ... > > > > From: Lukas Hejtmanek >> > To: gpfsug main discussion list mailto:gpfsug-discuss at spectrumscale.org>> > Date: 01/19/2017 05:37 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in > GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org< > mailto:gpfsug-discuss-bounces at spectrumscale.org> > ________________________________ > > > > Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota > resolves only some of group names. > > On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > > Hi Olaf, > > > > We will continue upgrading clients in a rolling fashion, but with > ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t > consider figuring out why this is happening a waste of time and > therefore having systems on both versions is a good thing. > > > > While I would prefer not to paste actual group names and GIDs into > this public forum, I can assure you that on every 4.2.1.1 system > that I have tried this on: > > > > 1. mmrepquota reports mostly GIDs, only a few group names > > 2. /etc/nsswitch.conf says to look at files first > > 3. the GID is in /etc/group > > 4. length of group name doesn?t matter > > > > I have a support contract with IBM, so I can open a PMR if > necessary. I just thought someone on the list might have an idea as > to what is happening or be able to point out the obvious explanation > that I?m missing. ;-) > > > > Thanks? > > > > Kevin > > > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser mailto:olaf.weiser at de.ibm.com>> wrote: > > > > unfortunately , I don't own a cluster right now, which has 4.2.2 > to double check... SpectrumScale should resolve the GID into a name, > if it find the name somewhere... > > > > but in your case.. I would say.. before we waste to much time in a > version-mismatch issue.. finish the rolling migration, especially > RHEL .. and then we continue > > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > > cheers > > > > > > > > From: "Buterbaugh, Kevin L" mailto:Kevin.Buterbaugh at Vanderbilt.Edu> >> > > To: gpfsug main discussion list mailto:gpfsug-discuss at spectrumscale.org> discuss at spectrumscale.org>> > > Date: 01/19/2017 04:48 PM > > Subject: Re: [gpfsug-discuss] mmrepquota and group names in > GPFS 4.2.2.x > > Sent by: gpfsug-discuss-bounces at spectrumscale.org< > mailto:gpfsug-discuss-bounces at spectrumscale.org> discuss-bounces at spectrumscale.org> > > ________________________________ > > > > > > > > Hi Olaf, > > > > The filesystem manager runs on one of our servers, all of which > are upgraded to 4.2.2.x. > > > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf > has ?files? listed first for /etc/group. > > > > In addition to a mixture of GPFS versions, we also have a mixture > of OS versions (RHEL 6/7). AFAIK tell with all of my testing / > experimenting the only factor that seems to change the behavior of > mmrepquota in regards to GIDs versus group names is the GPFS version. > > > > Other ideas, anyone? Is anyone else in a similar situation and > can test whether they see similar behavior? > > > > Thanks... > > > > Kevin > > > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser mailto:olaf.weiser at de.ibm.com>> wrote: > > > > have you checked, where th fsmgr runs as you have nodes with > different code levels > > > > mmlsmgr > > > > > > > > > > From: "Buterbaugh, Kevin L" mailto:Kevin.Buterbaugh at Vanderbilt.Edu> >> > > To: gpfsug main discussion list mailto:gpfsug-discuss at spectrumscale.org> discuss at spectrumscale.org>> > > Date: 01/18/2017 04:57 PM > > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > > Sent by: gpfsug-discuss-bounces at spectrumscale.org< > mailto:gpfsug-discuss-bounces at spectrumscale.org> discuss-bounces at spectrumscale.org> > > ________________________________ > > > > > > > > Hi All, > > > > We recently upgraded our cluster (well, the servers are all > upgraded; the clients are still in progress) from GPFS 4.2.1.1 to > GPFS 4.2.2.1 and there appears to be a change in how mmrepquota > handles group names in its? output. I?m trying to get a handle on > it, because it is messing with some of my scripts and - more > importantly - because I don?t understand the behavior. > > > > From one of my clients which is still running GPFS 4.2.1.1 I can > run an ?mmrepquota -g ? and if the group exists in /etc/group > the group name is displayed. Of course, if the group doesn?t exist > in /etc/group, the GID is displayed. Makes sense. > > > > However, on my servers which have been upgraded to GPFS 4.2.2.1 > most - but not all - of the time I see GID numbers instead of group > names. My question is, what is the criteria GPFS 4.2.2.x is using > to decide when to display a GID instead of a group name? It?s > apparently *not* the length of the name of the group, because I have > output in front of me where a 13 character long group name is > displayed but a 7 character long group name is *not* displayed - > its? GID is instead (and yes, both exist in /etc/group). > > > > I know that sample output would be useful to illustrate this, but > I do not want to post group names or GIDs to a public mailing list ? > if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > > > I am in the process of updating scripts to use ?mmrepquota -gn > ? and then looking up the group name myself, but I want to try > to understand this. Thanks? > > > > Kevin > > > > > > ? > > Kevin Buterbaugh - Senior System Administrator > > Vanderbilt University - Advanced Computing Center for Research andEducation > > Kevin.Buterbaugh at vanderbilt.edu< > mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633 > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > Luk?? Hejtm?nek > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20170119/8e599938/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 60, Issue 47 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 20 14:33:23 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 20 Jan 2017 14:33:23 +0000 Subject: [gpfsug-discuss] Weird log message Message-ID: So today I was just trying to collect a gpfs.snap to log a ticket, and part way through the log collection it said: Month '12' out of range 0..11 at /usr/lpp/mmfs/bin/mmlogsort line 114. This is a cluster running 4.2.2.0 It carried on anyway so hardly worth me logging a ticket, but just in case someone want to pick it up internally in IBM ...? Simon From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jan 20 15:09:06 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 20 Jan 2017 15:09:06 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> Message-ID: <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> Hi Phil, Nope - that was the very first thought I had but on a 4.2.2.1 node I have a 13 character group name displaying and a resolvable 7 character long group name being displayed as its? GID? Kevin > On Jan 20, 2017, at 9:06 AM, Phil Pishioneri wrote: > > On 1/19/17 4:51 PM, Buterbaugh, Kevin L wrote: >> Hi All, >> >> Let me try to answer some questions that have been raised by various list members? >> >> 1. I am not using nscd. >> 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota >> 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s >> 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me >> > > Anything unique about the lengths of the names of the affected groups? (i.e., all a certain value, all greater than some value, etc.) > > -Phil From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jan 20 15:10:05 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 20 Jan 2017 15:10:05 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: <8F3B6E42-6B37-48DF-8870-0CC5F293DCF7@vanderbilt.edu> Steve, I just opened a PMR - thanks? Kevin On Jan 20, 2017, at 8:14 AM, Steve Duersch > wrote: Kevin, Please go ahead and open a PMR. Cursorily, we don't know of an obvious known bug. Thank you. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Jan 20 15:32:17 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Fri, 20 Jan 2017 10:32:17 -0500 Subject: [gpfsug-discuss] Path to NSD lost when host_sas_address changed on port Message-ID: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> We have most of our GPFS NSD storage set up as pairs of RAID boxes served by failover pairs of servers. Most of it is FibreChannel, but the newest four boxes and servers are using dual port SAS controllers. Just this week, we had one server lose one out of the paths to one of the raid boxes. Took a while to realize what happened, but apparently the port2 ID changed from 51866da05cf7b001 to 51866da05cf7b002 on the fly, without rebooting. Port1 is still 51866da05cf7b000, which is the card ID (host_add). We?re running gpfs 4.2.2.1 on RHEL7.2 on these hosts. Has anyone else seen this kind of behavior? First noticed these messages, 3 hours 13 minutes after boot: Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd The multipath daemon was sending lots of log messages like: Jan 10 13:49:22 storage043 multipathd: mpathw: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:64 1] Jan 10 13:49:22 storage043 multipathd: mpathaa: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:96 1] Jan 10 13:49:22 storage043 multipathd: mpathx: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:128 1] Currently worked around problem by including 00 01 and 02 for all 8 SAS cards when mapping LUN/volume to host groups. Thanks, ? ddj Dave Johnson Brown University CCV -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 20 15:43:56 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 20 Jan 2017 15:43:56 +0000 Subject: [gpfsug-discuss] SOBAR questions Message-ID: We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon From eric.wonderley at vt.edu Fri Jan 20 16:14:09 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 20 Jan 2017 11:14:09 -0500 Subject: [gpfsug-discuss] Path to NSD lost when host_sas_address changed on port In-Reply-To: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> References: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> Message-ID: Maybe multipath is not seeing all of the wwns? multipath -v3 | grep ^51855 look ok? For some unknown reason multipath does not see our sandisk array...we have to add them to the end of /etc/multipath/wwids file On Fri, Jan 20, 2017 at 10:32 AM, David D. Johnson wrote: > We have most of our GPFS NSD storage set up as pairs of RAID boxes served > by failover pairs of servers. > Most of it is FibreChannel, but the newest four boxes and servers are > using dual port SAS controllers. > Just this week, we had one server lose one out of the paths to one of the > raid boxes. Took a while > to realize what happened, but apparently the port2 ID changed from > 51866da05cf7b001 to > 51866da05cf7b002 on the fly, without rebooting. Port1 is still > 51866da05cf7b000, which is the card ID (host_add). > > We?re running gpfs 4.2.2.1 on RHEL7.2 on these hosts. > > Has anyone else seen this kind of behavior? > First noticed these messages, 3 hours 13 minutes after boot: > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > > The multipath daemon was sending lots of log messages like: > Jan 10 13:49:22 storage043 multipathd: mpathw: load table [0 4642340864 > multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 > 1 8:64 1] > Jan 10 13:49:22 storage043 multipathd: mpathaa: load table [0 4642340864 > multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 > 1 8:96 1] > Jan 10 13:49:22 storage043 multipathd: mpathx: load table [0 4642340864 > multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 > 1 8:128 1] > > Currently worked around problem by including 00 01 and 02 for all 8 SAS > cards when mapping LUN/volume to host groups. > > Thanks, > ? ddj > Dave Johnson > Brown University CCV > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Jan 20 16:27:30 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Fri, 20 Jan 2017 11:27:30 -0500 Subject: [gpfsug-discuss] Path to NSD lost when host_sas_address changed on port In-Reply-To: References: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> Message-ID: Actually, we can see all the Volume LUN WWNs such as 3600a098000a11f990000022457cf5091 1:0:0:0 sdb 8:16 14 undef ready DELL 3600a098000a0b4ea000001fd57cf50b2 1:0:0:1 sdc 8:32 9 undef ready DELL 3600a098000a11f990000024457cf576f 1:0:0:10 sdl 8:176 14 undef ready DELL (45 lines, 11 LUNs from each controller, each showing up twice, plus the boot volume) My problem involves the ID of the server's host adapter as seen by the 60 drive RAID box. [root at storage043 scsi]# lsscsi -Ht [0] megaraid_sas [1] mpt3sas sas:0x51866da05f388a00 [2] ahci sata: [3] ahci sata: [4] ahci sata: [5] ahci sata: [6] ahci sata: [7] ahci sata: [8] ahci sata: [9] ahci sata: [10] ahci sata: [11] ahci sata: [12] mpt3sas sas:0x51866da05cf7b000 Each card [1] and [12] is a dual port card. The address of the second port is not consistent. ? ddj > On Jan 20, 2017, at 11:14 AM, J. Eric Wonderley wrote: > > > Maybe multipath is not seeing all of the wwns? > > multipath -v3 | grep ^51855 look ok? > > For some unknown reason multipath does not see our sandisk array...we have to add them to the end of /etc/multipath/wwids file > > > On Fri, Jan 20, 2017 at 10:32 AM, David D. Johnson > wrote: > We have most of our GPFS NSD storage set up as pairs of RAID boxes served by failover pairs of servers. > Most of it is FibreChannel, but the newest four boxes and servers are using dual port SAS controllers. > Just this week, we had one server lose one out of the paths to one of the raid boxes. Took a while > to realize what happened, but apparently the port2 ID changed from 51866da05cf7b001 to > 51866da05cf7b002 on the fly, without rebooting. Port1 is still 51866da05cf7b000, which is the card ID (host_add). > > We?re running gpfs 4.2.2.1 on RHEL7.2 on these hosts. > > Has anyone else seen this kind of behavior? > First noticed these messages, 3 hours 13 minutes after boot: > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > > The multipath daemon was sending lots of log messages like: > Jan 10 13:49:22 storage043 multipathd: mpathw: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:64 1] > Jan 10 13:49:22 storage043 multipathd: mpathaa: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:96 1] > Jan 10 13:49:22 storage043 multipathd: mpathx: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:128 1] > > Currently worked around problem by including 00 01 and 02 for all 8 SAS cards when mapping LUN/volume to host groups. > > Thanks, > ? ddj > Dave Johnson > Brown University CCV > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From duersch at us.ibm.com Fri Jan 20 16:54:12 2017 From: duersch at us.ibm.com (Steve Duersch) Date: Fri, 20 Jan 2017 11:54:12 -0500 Subject: [gpfsug-discuss] Weird log message In-Reply-To: References: Message-ID: This is a known bug. It is fixed in 4.2.2.1. It does not impact any of the gathering of information. It impacts the sorting of the logs, but all the logs will be there. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York > > Message: 1 > Date: Fri, 20 Jan 2017 14:33:23 +0000 > From: "Simon Thompson (Research Computing - IT Services)" > > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [gpfsug-discuss] Weird log message > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > > So today I was just trying to collect a gpfs.snap to log a ticket, and > part way through the log collection it said: > > Month '12' out of range 0..11 at /usr/lpp/mmfs/bin/mmlogsort line 114. > > This is a cluster running 4.2.2.0 > > It carried on anyway so hardly worth me logging a ticket, but just in case > someone want to pick it up internally in IBM ...? > > Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Jan 20 16:57:56 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 20 Jan 2017 11:57:56 -0500 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaurang.tapase at in.ibm.com Fri Jan 20 18:04:45 2017 From: gaurang.tapase at in.ibm.com (Gaurang Tapase) Date: Fri, 20 Jan 2017 23:34:45 +0530 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: Hi Brian, For option #3, you can use GPFS Manila (OpenStack shared file system service) driver for exporting data from protocol servers to the OpenStack VMs. It was updated to support CES in the Newton release. A new feature of bringing existing filesets under Manila management has also been added recently. Thanks, Gaurang ------------------------------------------------------------------------ Gaurang S Tapase Spectrum Scale & OpenStack IBM India Storage Lab, Pune (India) Email : gaurang.tapase at in.ibm.com Phone : +91-20-42025699 (W), +91-9860082042(Cell) ------------------------------------------------------------------------- From: Brian Marshall To: gpfsug main discussion list Date: 01/18/2017 09:52 PM Subject: Re: [gpfsug-discuss] Mounting GPFS data on OpenStack VM Sent by: gpfsug-discuss-bounces at spectrumscale.org To answer some more questions: What sort of workload will your Nova VM's be running? This is largely TBD but we anticipate webapps and other non-batch ways of interacting with and post processing data that has been computed on HPC batch systems. For example a user might host a website that allows users to view pieces of a large data set and do some processing in private cloud or kick off larger jobs on HPC clusters How many VM's are you running? This work is still in the design / build phase. We have 48 servers slated for the project. At max maybe 500 VMs; again this is a pretty wild estimate. This is a new service we are looking to provide What is your Network interconnect between the Scale Storage cluster and the Nova Compute cluster Each nova node has a dual 10gigE connection to switches that uplink to our core 40 gigE switches were NSD Servers are directly connectly. The information so far has been awesome. Thanks everyone. I am definitely leaning towards option #3 of creating protocol servers. Are there any design/build white papers targetting the virutalization use case? Thanks, Brian On Tue, Jan 17, 2017 at 5:55 PM, Andrew Beattie wrote: HI Brian, Couple of questions for you: What sort of workload will your Nova VM's be running? How many VM's are you running? What is your Network interconnect between the Scale Storage cluster and the Nova Compute cluster I have cc'd Jake Carrol from University of Queensland in on the email as I know they have done some basic performance testing using Scale to provide storage to Openstack. One of the issues that they found was the Openstack network translation was a performance limiting factor. I think from memory the best performance scenario they had was, when they installed the scale client locally into the virtual machines Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Brian Marshall Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM Date: Wed, Jan 18, 2017 7:51 AM UG, I have a GPFS filesystem. I have a OpenStack private cloud. What is the best way for Nova Compute VMs to have access to data inside the GPFS filesystem? 1)Should VMs mount GPFS directly with a GPFS client? 2) Should the hypervisor mount GPFS and share to nova computes? 3) Should I create GPFS protocol servers that allow nova computes to mount of NFS? All advice is welcome. Best, Brian Marshall Virginia Tech _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Fri Jan 20 18:22:11 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 20 Jan 2017 13:22:11 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: Perfect. Thanks for the advice. Further: this might be a basic question: Are their design guides for building CES protocl servers? Brian On Fri, Jan 20, 2017 at 1:04 PM, Gaurang Tapase wrote: > Hi Brian, > > For option #3, you can use GPFS Manila (OpenStack shared file system > service) driver for exporting data from protocol servers to the OpenStack > VMs. > It was updated to support CES in the Newton release. > > A new feature of bringing existing filesets under Manila management has > also been added recently. > > Thanks, > Gaurang > ------------------------------------------------------------------------ > Gaurang S Tapase > Spectrum Scale & OpenStack > IBM India Storage Lab, Pune (India) > Email : gaurang.tapase at in.ibm.com > Phone : +91-20-42025699 <+91%2020%204202%205699> (W), +91-9860082042 > <+91%2098600%2082042>(Cell) > ------------------------------------------------------------------------- > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/18/2017 09:52 PM > Subject: Re: [gpfsug-discuss] Mounting GPFS data on OpenStack VM > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > To answer some more questions: > > What sort of workload will your Nova VM's be running? > This is largely TBD but we anticipate webapps and other non-batch ways of > interacting with and post processing data that has been computed on HPC > batch systems. For example a user might host a website that allows users > to view pieces of a large data set and do some processing in private cloud > or kick off larger jobs on HPC clusters > > How many VM's are you running? > This work is still in the design / build phase. We have 48 servers slated > for the project. At max maybe 500 VMs; again this is a pretty wild > estimate. This is a new service we are looking to provide > > What is your Network interconnect between the Scale Storage cluster and > the Nova Compute cluster > Each nova node has a dual 10gigE connection to switches that uplink to our > core 40 gigE switches were NSD Servers are directly connectly. > > The information so far has been awesome. Thanks everyone. I am > definitely leaning towards option #3 of creating protocol servers. Are > there any design/build white papers targetting the virutalization use case? > > Thanks, > Brian > > On Tue, Jan 17, 2017 at 5:55 PM, Andrew Beattie <*abeattie at au1.ibm.com* > > wrote: > HI Brian, > > > Couple of questions for you: > > What sort of workload will your Nova VM's be running? > How many VM's are you running? > What is your Network interconnect between the Scale Storage cluster and > the Nova Compute cluster > > I have cc'd Jake Carrol from University of Queensland in on the email as I > know they have done some basic performance testing using Scale to provide > storage to Openstack. > One of the issues that they found was the Openstack network translation > was a performance limiting factor. > > I think from memory the best performance scenario they had was, when they > installed the scale client locally into the virtual machines > > > *Andrew Beattie* > *Software Defined Storage - IT Specialist* > *Phone: *614-2133-7927 > *E-mail: **abeattie at au1.ibm.com* > > > ----- Original message ----- > From: Brian Marshall <*mimarsh2 at vt.edu* > > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Cc: > Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM > Date: Wed, Jan 18, 2017 7:51 AM > > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to mount > of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Fri Jan 20 22:23:07 2017 From: ulmer at ulmer.org (Stephen Ulmer) Date: Fri, 20 Jan 2017 17:23:07 -0500 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> Message-ID: <3D2CE694-2A3A-4B5E-8078-238A09681BE8@ulmer.org> My list of questions that might or might not be thought provoking: How about the relative position of the items in the /etc/group file? Are all of the failures later in the file than all of the successes? Do any groups have group passwords (parsing error due to ?different" line format)? Is the /etc/group sorted by either GID or group name (not normally required, but it would be interesting to see if it changed the problem)? Is the set that is translated versus not translated consistent or do they change? (Across all axes of comparison by {node, command invocation, et al.}) Are the not translated groups more or less likely to be the default group of the owning UID? Can you translate the GID other ways? Like with ls? (I think this was in the original problem description, but I don?t remember the answer.) What is you just turn of nscd? -- Stephen > On Jan 20, 2017, at 10:09 AM, Buterbaugh, Kevin L > wrote: > > Hi Phil, > > Nope - that was the very first thought I had but on a 4.2.2.1 node I have a 13 character group name displaying and a resolvable 7 character long group name being displayed as its? GID? > > Kevin > >> On Jan 20, 2017, at 9:06 AM, Phil Pishioneri > wrote: >> >> On 1/19/17 4:51 PM, Buterbaugh, Kevin L wrote: >>> Hi All, >>> >>> Let me try to answer some questions that have been raised by various list members? >>> >>> 1. I am not using nscd. >>> 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota >>> 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s >>> 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me >>> >> >> Anything unique about the lengths of the names of the affected groups? (i.e., all a certain value, all greater than some value, etc.) >> >> -Phil > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Fri Jan 20 22:37:15 2017 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Sat, 21 Jan 2017 08:37:15 +1000 Subject: [gpfsug-discuss] CES permissions Message-ID: Hi we have an existing configuration with a home - cache relationship on linked clusters, we are running CES on the cache cluster. When data is copied to an SMB share the the afm target for the cache is marked dirty and the replication back to the home cluster stops. both clusters are running 4.2.1 We have seen this behaviour whether the acls on the home cluster file system are nfsv4 only or posix and nfsv4 the cache cluster is nfsv4 only so that we can use CES on it for SMB. We are using uid remapping between the cache and the home can anyone suggest why the cache is marked dirty and how we can get around this issue the other thing we would like to do is force group and posix file permissions via samba but these are not supported options in the CES installation of samba any help is appreciated leslie Leslie Elliott, Infrastructure Support Specialist Information Technology Services, The University of Queensland -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Jan 23 01:10:14 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 22 Jan 2017 20:10:14 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? Message-ID: This is going to sound like a ridiculous request, but, is there a way to cause a filesystem to panic everywhere in one "swell foop"? I'm assuming the answer will come with an appropriate disclaimer of "don't ever do this, we don't support it, it might eat your data, summon cthulu, etc.". I swear I've seen the fs manager initiate this type of operation before. I can seem to do it on a per-node basis with "mmfsadm test panic " but if I do that over all 1k nodes in my test cluster at once it results in about 45 minutes of almost total deadlock while each panic is processed by the fs manager. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From abeattie at au1.ibm.com Mon Jan 23 01:16:58 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Mon, 23 Jan 2017 01:16:58 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Mon Jan 23 01:23:34 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Sun, 22 Jan 2017 20:23:34 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: Message-ID: <142910.1485134614@turing-police.cc.vt.edu> On Sun, 22 Jan 2017 20:10:14 -0500, Aaron Knister said: > This is going to sound like a ridiculous request, but, is there a way to > cause a filesystem to panic everywhere in one "swell foop"? (...) > I can seem to do it on a per-node basis with "mmfsadm test panic > " but if I do that over all 1k nodes in my test cluster at > once it results in about 45 minutes of almost total deadlock while each > panic is processed by the fs manager. Sounds like you've already found the upper bound for panicking all at once. :) What exactly are you trying to do here? Force-dismount all over the cluster due to some urgent external condition (UPS fail, whatever)? And how much do you care about file system metadata consistency and/or pending data writes? (Be prepared to Think Outside The Box - the *fastest* way may be to use a controllable power strip in the rack and cut power to your fiber channel switches, isolating the storage *real* fast....) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Jan 23 01:31:06 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 22 Jan 2017 20:31:06 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: Message-ID: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> I was afraid someone would ask :) One possible use would be testing how monitoring reacts to and/or corrects stale filesystems. The use in my case is there's an issue we see quite often where a filesystem won't unmount when trying to shut down gpfs. Linux insists its still busy despite every process being killed on the node just about except init. It's a real pain because it complicates maintenance, requiring a reboot of some nodes prior to patching for example. I dug into it and it appears as though when this happens the filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm trying to debug it further but I need to actually be able to make the condition happen a few more times to debug it. A stripegroup panic isn't a surefire way but it's the only way I've found so far to trigger this behavior somewhat on demand. One way I've found to trigger a mass stripegroup panic is to induce what I call a "301 error": loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted by the system with return code 301 reason code 0 loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument and tickle a known race condition between nodes being expelled from the cluster and a manager node joining the cluster. When this happens it seems to cause a mass stripe group panic that's over in a few minutes. The trick there is that it doesn't happen every time I go through the exercise and when it does there's no guarantee the filesystem that panics is the one in use. If it's not an fs in use then it doesn't help me reproduce the error condition. I was trying to use the "mmfsadm test panic" command to try a more direct approach. Hope that helps shed some light. -Aaron On 1/22/17 8:16 PM, Andrew Beattie wrote: > Out of curiosity -- why would you want to? > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Aaron Knister > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Date: Mon, Jan 23, 2017 11:11 AM > > This is going to sound like a ridiculous request, but, is there a way to > cause a filesystem to panic everywhere in one "swell foop"? I'm assuming > the answer will come with an appropriate disclaimer of "don't ever do > this, we don't support it, it might eat your data, summon cthulu, etc.". > I swear I've seen the fs manager initiate this type of operation before. > > I can seem to do it on a per-node basis with "mmfsadm test panic > " but if I do that over all 1k nodes in my test cluster at > once it results in about 45 minutes of almost total deadlock while each > panic is processed by the fs manager. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at us.ibm.com Mon Jan 23 04:12:02 2017 From: oehmes at us.ibm.com (Sven Oehme) Date: Mon, 23 Jan 2017 04:12:02 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: What version of Scale/ GPFS code is this cluster on ? ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Aaron Knister To: Date: 01/23/2017 01:31 AM Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? Sent by: gpfsug-discuss-bounces at spectrumscale.org I was afraid someone would ask :) One possible use would be testing how monitoring reacts to and/or corrects stale filesystems. The use in my case is there's an issue we see quite often where a filesystem won't unmount when trying to shut down gpfs. Linux insists its still busy despite every process being killed on the node just about except init. It's a real pain because it complicates maintenance, requiring a reboot of some nodes prior to patching for example. I dug into it and it appears as though when this happens the filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm trying to debug it further but I need to actually be able to make the condition happen a few more times to debug it. A stripegroup panic isn't a surefire way but it's the only way I've found so far to trigger this behavior somewhat on demand. One way I've found to trigger a mass stripegroup panic is to induce what I call a "301 error": loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted by the system with return code 301 reason code 0 loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument and tickle a known race condition between nodes being expelled from the cluster and a manager node joining the cluster. When this happens it seems to cause a mass stripe group panic that's over in a few minutes. The trick there is that it doesn't happen every time I go through the exercise and when it does there's no guarantee the filesystem that panics is the one in use. If it's not an fs in use then it doesn't help me reproduce the error condition. I was trying to use the "mmfsadm test panic" command to try a more direct approach. Hope that helps shed some light. -Aaron On 1/22/17 8:16 PM, Andrew Beattie wrote: > Out of curiosity -- why would you want to? > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Aaron Knister > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Date: Mon, Jan 23, 2017 11:11 AM > > This is going to sound like a ridiculous request, but, is there a way to > cause a filesystem to panic everywhere in one "swell foop"? I'm assuming > the answer will come with an appropriate disclaimer of "don't ever do > this, we don't support it, it might eat your data, summon cthulu, etc.". > I swear I've seen the fs manager initiate this type of operation before. > > I can seem to do it on a per-node basis with "mmfsadm test panic > " but if I do that over all 1k nodes in my test cluster at > once it results in about 45 minutes of almost total deadlock while each > panic is processed by the fs manager. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Jan 23 04:22:38 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 22 Jan 2017 23:22:38 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: It's at 4.1.1.10. On 1/22/17 11:12 PM, Sven Oehme wrote: > What version of Scale/ GPFS code is this cluster on ? > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > was afraid someone would ask :) One possible use would beAaron Knister > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > possible use would be testing how monitoring reacts to and/or > > From: Aaron Knister > To: > Date: 01/23/2017 01:31 AM > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > I was afraid someone would ask :) > > One possible use would be testing how monitoring reacts to and/or > corrects stale filesystems. > > The use in my case is there's an issue we see quite often where a > filesystem won't unmount when trying to shut down gpfs. Linux insists > its still busy despite every process being killed on the node just about > except init. It's a real pain because it complicates maintenance, > requiring a reboot of some nodes prior to patching for example. > > I dug into it and it appears as though when this happens the > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > trying to debug it further but I need to actually be able to make the > condition happen a few more times to debug it. A stripegroup panic isn't > a surefire way but it's the only way I've found so far to trigger this > behavior somewhat on demand. > > One way I've found to trigger a mass stripegroup panic is to induce what > I call a "301 error": > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > by the system with return code 301 reason code 0 > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > and tickle a known race condition between nodes being expelled from the > cluster and a manager node joining the cluster. When this happens it > seems to cause a mass stripe group panic that's over in a few minutes. > The trick there is that it doesn't happen every time I go through the > exercise and when it does there's no guarantee the filesystem that > panics is the one in use. If it's not an fs in use then it doesn't help > me reproduce the error condition. I was trying to use the "mmfsadm test > panic" command to try a more direct approach. > > Hope that helps shed some light. > > -Aaron > > On 1/22/17 8:16 PM, Andrew Beattie wrote: >> Out of curiosity -- why would you want to? >> Andrew Beattie >> Software Defined Storage - IT Specialist >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com >> >> >> >> ----- Original message ----- >> From: Aaron Knister >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug main discussion list >> Cc: >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? >> Date: Mon, Jan 23, 2017 11:11 AM >> >> This is going to sound like a ridiculous request, but, is there a way to >> cause a filesystem to panic everywhere in one "swell foop"? I'm assuming >> the answer will come with an appropriate disclaimer of "don't ever do >> this, we don't support it, it might eat your data, summon cthulu, etc.". >> I swear I've seen the fs manager initiate this type of operation before. >> >> I can seem to do it on a per-node basis with "mmfsadm test panic >> " but if I do that over all 1k nodes in my test cluster at >> once it results in about 45 minutes of almost total deadlock while each >> panic is processed by the fs manager. >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Mon Jan 23 05:03:43 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 23 Jan 2017 05:03:43 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: Then i would suggest to move up to at least 4.2.1.LATEST , there is a high chance your problem might already be fixed. i see 2 potential area that got significant improvements , Token Manager recovery and Log Recovery, both are in latest 4.2.1 code enabled : 2 significant improvements on Token Recovery in 4.2.1 : 1. Extendible hashing for token hash table. This speeds up token lookup and thereby reduce tcMutex hold times for configurations with a large ratio of clients to token servers. 2. Cleaning up tokens held by failed nodes was making multiple passes over the whole token table, one for each failed node. The loops are now inverted, so it makes a single pass over the able, and for each token found, does cleanup for all failed nodes. there are multiple smaller enhancements beyond 4.2.1 but thats the minimum level you want to be. i have seen token recovery of 10's of minutes similar to what you described going down to a minute with this change. on Log Recovery - in case of an unclean unmount/shutdown of a node prior 4.2.1 the Filesystem manager would only recover one Log file at a time, using a single thread, with 4.2.1 this is now done with multiple threads and multiple log files in parallel . Sven On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister wrote: > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > ------------------------------------------ > > Sven Oehme > > Scalable Storage Research > > email: oehmes at us.ibm.com > > Phone: +1 (408) 824-8904 <(408)%20824-8904> > > IBM Almaden Research Lab > > ------------------------------------------ > > > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > > was afraid someone would ask :) One possible use would beAaron Knister > > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > > possible use would be testing how monitoring reacts to and/or > > > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > ------------------------------------------------------------------------ > > > > > > > > I was afraid someone would ask :) > > > > One possible use would be testing how monitoring reacts to and/or > > corrects stale filesystems. > > > > The use in my case is there's an issue we see quite often where a > > filesystem won't unmount when trying to shut down gpfs. Linux insists > > its still busy despite every process being killed on the node just about > > except init. It's a real pain because it complicates maintenance, > > requiring a reboot of some nodes prior to patching for example. > > > > I dug into it and it appears as though when this happens the > > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > > trying to debug it further but I need to actually be able to make the > > condition happen a few more times to debug it. A stripegroup panic isn't > > a surefire way but it's the only way I've found so far to trigger this > > behavior somewhat on demand. > > > > One way I've found to trigger a mass stripegroup panic is to induce what > > I call a "301 error": > > > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > > by the system with return code 301 reason code 0 > > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > > > and tickle a known race condition between nodes being expelled from the > > cluster and a manager node joining the cluster. When this happens it > > seems to cause a mass stripe group panic that's over in a few minutes. > > The trick there is that it doesn't happen every time I go through the > > exercise and when it does there's no guarantee the filesystem that > > panics is the one in use. If it's not an fs in use then it doesn't help > > me reproduce the error condition. I was trying to use the "mmfsadm test > > panic" command to try a more direct approach. > > > > Hope that helps shed some light. > > > > -Aaron > > > > On 1/22/17 8:16 PM, Andrew Beattie wrote: > >> Out of curiosity -- why would you want to? > >> Andrew Beattie > >> Software Defined Storage - IT Specialist > >> Phone: 614-2133-7927 > >> E-mail: abeattie at au1.ibm.com > >> > >> > >> > >> ----- Original message ----- > >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> To: gpfsug main discussion list > >> Cc: > >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > >> Date: Mon, Jan 23, 2017 11:11 AM > >> > >> This is going to sound like a ridiculous request, but, is there a > way to > >> cause a filesystem to panic everywhere in one "swell foop"? I'm > assuming > >> the answer will come with an appropriate disclaimer of "don't ever > do > >> this, we don't support it, it might eat your data, summon cthulu, > etc.". > >> I swear I've seen the fs manager initiate this type of operation > before. > >> > >> I can seem to do it on a per-node basis with "mmfsadm test panic > > >> " but if I do that over all 1k nodes in my test cluster > at > >> once it results in about 45 minutes of almost total deadlock while > each > >> panic is processed by the fs manager. > >> > >> -Aaron > >> > >> -- > >> Aaron Knister > >> NASA Center for Climate Simulation (Code 606.2) > >> Goddard Space Flight Center > >> (301) 286-2776 > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jan 23 05:27:53 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 23 Jan 2017 05:27:53 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: Aaron, hold a bit with the upgrade , i just got word that while 4.2.1+ most likely addresses the issues i mentioned, there was a defect in the initial release of the parallel log recovery code. i will get the exact minimum version you need to deploy and send another update to this thread. sven On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme wrote: > Then i would suggest to move up to at least 4.2.1.LATEST , there is a high > chance your problem might already be fixed. > > i see 2 potential area that got significant improvements , Token Manager > recovery and Log Recovery, both are in latest 4.2.1 code enabled : > > 2 significant improvements on Token Recovery in 4.2.1 : > > 1. Extendible hashing for token hash table. This speeds up token lookup > and thereby reduce tcMutex hold times for configurations with a large ratio > of clients to token servers. > 2. Cleaning up tokens held by failed nodes was making multiple passes > over the whole token table, one for each failed node. The loops are now > inverted, so it makes a single pass over the able, and for each token > found, does cleanup for all failed nodes. > > there are multiple smaller enhancements beyond 4.2.1 but thats the minimum > level you want to be. i have seen token recovery of 10's of minutes similar > to what you described going down to a minute with this change. > > on Log Recovery - in case of an unclean unmount/shutdown of a node prior > 4.2.1 the Filesystem manager would only recover one Log file at a time, > using a single thread, with 4.2.1 this is now done with multiple threads > and multiple log files in parallel . > > Sven > > On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister > wrote: > > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > ------------------------------------------ > > Sven Oehme > > Scalable Storage Research > > email: oehmes at us.ibm.com > > Phone: +1 (408) 824-8904 <(408)%20824-8904> > > IBM Almaden Research Lab > > ------------------------------------------ > > > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > > was afraid someone would ask :) One possible use would beAaron Knister > > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > > possible use would be testing how monitoring reacts to and/or > > > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > ------------------------------------------------------------------------ > > > > > > > > I was afraid someone would ask :) > > > > One possible use would be testing how monitoring reacts to and/or > > corrects stale filesystems. > > > > The use in my case is there's an issue we see quite often where a > > filesystem won't unmount when trying to shut down gpfs. Linux insists > > its still busy despite every process being killed on the node just about > > except init. It's a real pain because it complicates maintenance, > > requiring a reboot of some nodes prior to patching for example. > > > > I dug into it and it appears as though when this happens the > > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > > trying to debug it further but I need to actually be able to make the > > condition happen a few more times to debug it. A stripegroup panic isn't > > a surefire way but it's the only way I've found so far to trigger this > > behavior somewhat on demand. > > > > One way I've found to trigger a mass stripegroup panic is to induce what > > I call a "301 error": > > > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > > by the system with return code 301 reason code 0 > > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > > > and tickle a known race condition between nodes being expelled from the > > cluster and a manager node joining the cluster. When this happens it > > seems to cause a mass stripe group panic that's over in a few minutes. > > The trick there is that it doesn't happen every time I go through the > > exercise and when it does there's no guarantee the filesystem that > > panics is the one in use. If it's not an fs in use then it doesn't help > > me reproduce the error condition. I was trying to use the "mmfsadm test > > panic" command to try a more direct approach. > > > > Hope that helps shed some light. > > > > -Aaron > > > > On 1/22/17 8:16 PM, Andrew Beattie wrote: > >> Out of curiosity -- why would you want to? > >> Andrew Beattie > >> Software Defined Storage - IT Specialist > >> Phone: 614-2133-7927 > >> E-mail: abeattie at au1.ibm.com > >> > >> > >> > >> ----- Original message ----- > >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> To: gpfsug main discussion list > >> Cc: > >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > >> Date: Mon, Jan 23, 2017 11:11 AM > >> > >> This is going to sound like a ridiculous request, but, is there a > way to > >> cause a filesystem to panic everywhere in one "swell foop"? I'm > assuming > >> the answer will come with an appropriate disclaimer of "don't ever > do > >> this, we don't support it, it might eat your data, summon cthulu, > etc.". > >> I swear I've seen the fs manager initiate this type of operation > before. > >> > >> I can seem to do it on a per-node basis with "mmfsadm test panic > > >> " but if I do that over all 1k nodes in my test cluster > at > >> once it results in about 45 minutes of almost total deadlock while > each > >> panic is processed by the fs manager. > >> > >> -Aaron > >> > >> -- > >> Aaron Knister > >> NASA Center for Climate Simulation (Code 606.2) > >> Goddard Space Flight Center > >> (301) 286-2776 > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Mon Jan 23 05:40:25 2017 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Mon, 23 Jan 2017 05:40:25 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: I?ve also done the ?panic stripe group everywhere? trick on a test cluster for a large FPO filesystem solution. With FPO it?s not very hard to get a filesystem to become unmountable due to missing disks. Sometimes the best answer, especially in a scratch use-case, may be to throw the filesystem away and start again empty so that research can resume (even though there will be work loss and repeated effort for some). But the stuck mounts problem can make this a long-lived problem. In my case, I just repeatedly panic any nodes which continue to mount the filesystem and try mmdelfs until it works (usually takes a few attempts). In this case, I really don?t want/need the filesystem to be recovered. I just want the cluster to forget about it as quickly as possible. So far, in testing, the panic/destroy times aren?t bad, but I don?t have heavy user workloads running against it yet. It would be interesting to know if there were any shortcuts to skip SG manager reassignment and recovery attempts. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sven Oehme Sent: Monday, January 23, 2017 12:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? Aaron, hold a bit with the upgrade , i just got word that while 4.2.1+ most likely addresses the issues i mentioned, there was a defect in the initial release of the parallel log recovery code. i will get the exact minimum version you need to deploy and send another update to this thread. sven On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme > wrote: Then i would suggest to move up to at least 4.2.1.LATEST , there is a high chance your problem might already be fixed. i see 2 potential area that got significant improvements , Token Manager recovery and Log Recovery, both are in latest 4.2.1 code enabled : 2 significant improvements on Token Recovery in 4.2.1 : 1. Extendible hashing for token hash table. This speeds up token lookup and thereby reduce tcMutex hold times for configurations with a large ratio of clients to token servers. 2. Cleaning up tokens held by failed nodes was making multiple passes over the whole token table, one for each failed node. The loops are now inverted, so it makes a single pass over the able, and for each token found, does cleanup for all failed nodes. there are multiple smaller enhancements beyond 4.2.1 but thats the minimum level you want to be. i have seen token recovery of 10's of minutes similar to what you described going down to a minute with this change. on Log Recovery - in case of an unclean unmount/shutdown of a node prior 4.2.1 the Filesystem manager would only recover one Log file at a time, using a single thread, with 4.2.1 this is now done with multiple threads and multiple log files in parallel . Sven On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister > wrote: It's at 4.1.1.10. On 1/22/17 11:12 PM, Sven Oehme wrote: > What version of Scale/ GPFS code is this cluster on ? > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > was afraid someone would ask :) One possible use would beAaron Knister > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > possible use would be testing how monitoring reacts to and/or > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > I was afraid someone would ask :) > > One possible use would be testing how monitoring reacts to and/or > corrects stale filesystems. > > The use in my case is there's an issue we see quite often where a > filesystem won't unmount when trying to shut down gpfs. Linux insists > its still busy despite every process being killed on the node just about > except init. It's a real pain because it complicates maintenance, > requiring a reboot of some nodes prior to patching for example. > > I dug into it and it appears as though when this happens the > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > trying to debug it further but I need to actually be able to make the > condition happen a few more times to debug it. A stripegroup panic isn't > a surefire way but it's the only way I've found so far to trigger this > behavior somewhat on demand. > > One way I've found to trigger a mass stripegroup panic is to induce what > I call a "301 error": > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > by the system with return code 301 reason code 0 > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > and tickle a known race condition between nodes being expelled from the > cluster and a manager node joining the cluster. When this happens it > seems to cause a mass stripe group panic that's over in a few minutes. > The trick there is that it doesn't happen every time I go through the > exercise and when it does there's no guarantee the filesystem that > panics is the one in use. If it's not an fs in use then it doesn't help > me reproduce the error condition. I was trying to use the "mmfsadm test > panic" command to try a more direct approach. > > Hope that helps shed some light. > > -Aaron > > On 1/22/17 8:16 PM, Andrew Beattie wrote: >> Out of curiosity -- why would you want to? >> Andrew Beattie >> Software Defined Storage - IT Specialist >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com > >> >> >> >> ----- Original message ----- >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug main discussion list > >> Cc: >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? >> Date: Mon, Jan 23, 2017 11:11 AM >> >> This is going to sound like a ridiculous request, but, is there a way to >> cause a filesystem to panic everywhere in one "swell foop"? I'm assuming >> the answer will come with an appropriate disclaimer of "don't ever do >> this, we don't support it, it might eat your data, summon cthulu, etc.". >> I swear I've seen the fs manager initiate this type of operation before. >> >> I can seem to do it on a per-node basis with "mmfsadm test panic >> " but if I do that over all 1k nodes in my test cluster at >> once it results in about 45 minutes of almost total deadlock while each >> panic is processed by the fs manager. >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jan 23 10:17:03 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 23 Jan 2017 10:17:03 +0000 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: Hi Mark, Thanks. I get that using it to move to a new FS version is probably beyond design. But equally, I could easily see that having to support implementing the latest FS version is a strong requirement. I.e. In a DR situation say three years down the line, it would be a new FS of (say) 5.1.1, we wouldn't want to have to go back and find 4.1.1 code, nor would we necessarily be able to even run that version (as kernels and OSes move forward). That?s sorta also the situation where you don't want to suddenly have to run back to IBM support because your DR solution suddenly doesn't work like it says on the tin ;-) I can test 1 and 2 relatively easily, but 3 is a bit more difficult for us to test out as the FS we want to use SOBAR on is 4.2 already. Simon From: > on behalf of Marc A Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 20 January 2017 at 16:57 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SOBAR questions I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 23 15:32:41 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 23 Jan 2017 15:32:41 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <3D2CE694-2A3A-4B5E-8078-238A09681BE8@ulmer.org> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> <3D2CE694-2A3A-4B5E-8078-238A09681BE8@ulmer.org> Message-ID: <031A80F6-B00B-4AF9-963B-98E61BC537B4@vanderbilt.edu> Hi All, Stephens? very first question below has led me to figure out what the problem is ? we have one group in /etc/group that has dozens and dozens of members ? any group above that in /etc/group gets printed as a name by mmrepquota; any group below it gets printed as a GID. Wasn?t there an identical bug in mmlsquota a while back? I will update the PMR I have open with IBM. Thanks to all who took the time to respond with suggestions. Kevin On Jan 20, 2017, at 4:23 PM, Stephen Ulmer > wrote: My list of questions that might or might not be thought provoking: How about the relative position of the items in the /etc/group file? Are all of the failures later in the file than all of the successes? Do any groups have group passwords (parsing error due to ?different" line format)? Is the /etc/group sorted by either GID or group name (not normally required, but it would be interesting to see if it changed the problem)? Is the set that is translated versus not translated consistent or do they change? (Across all axes of comparison by {node, command invocation, et al.}) Are the not translated groups more or less likely to be the default group of the owning UID? Can you translate the GID other ways? Like with ls? (I think this was in the original problem description, but I don?t remember the answer.) What is you just turn of nscd? -- Stephen On Jan 20, 2017, at 10:09 AM, Buterbaugh, Kevin L > wrote: Hi Phil, Nope - that was the very first thought I had but on a 4.2.2.1 node I have a 13 character group name displaying and a resolvable 7 character long group name being displayed as its? GID? Kevin On Jan 20, 2017, at 9:06 AM, Phil Pishioneri > wrote: On 1/19/17 4:51 PM, Buterbaugh, Kevin L wrote: Hi All, Let me try to answer some questions that have been raised by various list members? 1. I am not using nscd. 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me Anything unique about the lengths of the names of the affected groups? (i.e., all a certain value, all greater than some value, etc.) -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jan 23 15:35:41 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 23 Jan 2017 10:35:41 -0500 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: Regarding back level file systems and testing... 1. Did you know that the mmcrfs command supports --version which allows you to create a back level file system? 2. If your concern is restoring from a SOBAR backup that was made a long while ago with an old version of GPFS/sobar... I'd say that should work... BUT I don't know for sure AND I'd caution that AFAIK (someone may correct me) Sobar is not intended for long term archiving of file systems. Personally ( IBM hat off ;-) ), for that I'd choose a standard, vendor-neutral archival format that is likely to be supported in the future.... My current understanding: Spectrum Scal SOBAR is for "disaster recovery" or "migrate/upgrade entire file system" -- where presumably you do Sobar backups on a regular schedule... and/or do one just before you begin an upgrade or migration to new hardware. --marc From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 01/23/2017 05:17 AM Subject: Re: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Mark, Thanks. I get that using it to move to a new FS version is probably beyond design. But equally, I could easily see that having to support implementing the latest FS version is a strong requirement. I.e. In a DR situation say three years down the line, it would be a new FS of (say) 5.1.1, we wouldn't want to have to go back and find 4.1.1 code, nor would we necessarily be able to even run that version (as kernels and OSes move forward). That?s sorta also the situation where you don't want to suddenly have to run back to IBM support because your DR solution suddenly doesn't work like it says on the tin ;-) I can test 1 and 2 relatively easily, but 3 is a bit more difficult for us to test out as the FS we want to use SOBAR on is 4.2 already. Simon From: on behalf of Marc A Kaplan Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Friday, 20 January 2017 at 16:57 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] SOBAR questions I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" < S.J.Thompson at bham.ac.uk> To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jan 23 22:04:25 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 23 Jan 2017 22:04:25 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: Hi, you either need to request access to GPFS 4.2.1.0 efix16 via your PMR or need to upgrade to 4.2.2.1 both contain the fixes required. Sven On Mon, Jan 23, 2017 at 6:27 AM Sven Oehme wrote: > Aaron, > > hold a bit with the upgrade , i just got word that while 4.2.1+ most > likely addresses the issues i mentioned, there was a defect in the initial > release of the parallel log recovery code. i will get the exact minimum > version you need to deploy and send another update to this thread. > > sven > > On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme wrote: > > Then i would suggest to move up to at least 4.2.1.LATEST , there is a high > chance your problem might already be fixed. > > i see 2 potential area that got significant improvements , Token Manager > recovery and Log Recovery, both are in latest 4.2.1 code enabled : > > 2 significant improvements on Token Recovery in 4.2.1 : > > 1. Extendible hashing for token hash table. This speeds up token lookup > and thereby reduce tcMutex hold times for configurations with a large ratio > of clients to token servers. > 2. Cleaning up tokens held by failed nodes was making multiple passes > over the whole token table, one for each failed node. The loops are now > inverted, so it makes a single pass over the able, and for each token > found, does cleanup for all failed nodes. > > there are multiple smaller enhancements beyond 4.2.1 but thats the minimum > level you want to be. i have seen token recovery of 10's of minutes similar > to what you described going down to a minute with this change. > > on Log Recovery - in case of an unclean unmount/shutdown of a node prior > 4.2.1 the Filesystem manager would only recover one Log file at a time, > using a single thread, with 4.2.1 this is now done with multiple threads > and multiple log files in parallel . > > Sven > > On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister > wrote: > > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > ------------------------------------------ > > Sven Oehme > > Scalable Storage Research > > email: oehmes at us.ibm.com > > Phone: +1 (408) 824-8904 <(408)%20824-8904> > > IBM Almaden Research Lab > > ------------------------------------------ > > > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > > was afraid someone would ask :) One possible use would beAaron Knister > > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > > possible use would be testing how monitoring reacts to and/or > > > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > ------------------------------------------------------------------------ > > > > > > > > I was afraid someone would ask :) > > > > One possible use would be testing how monitoring reacts to and/or > > corrects stale filesystems. > > > > The use in my case is there's an issue we see quite often where a > > filesystem won't unmount when trying to shut down gpfs. Linux insists > > its still busy despite every process being killed on the node just about > > except init. It's a real pain because it complicates maintenance, > > requiring a reboot of some nodes prior to patching for example. > > > > I dug into it and it appears as though when this happens the > > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > > trying to debug it further but I need to actually be able to make the > > condition happen a few more times to debug it. A stripegroup panic isn't > > a surefire way but it's the only way I've found so far to trigger this > > behavior somewhat on demand. > > > > One way I've found to trigger a mass stripegroup panic is to induce what > > I call a "301 error": > > > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > > by the system with return code 301 reason code 0 > > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > > > and tickle a known race condition between nodes being expelled from the > > cluster and a manager node joining the cluster. When this happens it > > seems to cause a mass stripe group panic that's over in a few minutes. > > The trick there is that it doesn't happen every time I go through the > > exercise and when it does there's no guarantee the filesystem that > > panics is the one in use. If it's not an fs in use then it doesn't help > > me reproduce the error condition. I was trying to use the "mmfsadm test > > panic" command to try a more direct approach. > > > > Hope that helps shed some light. > > > > -Aaron > > > > On 1/22/17 8:16 PM, Andrew Beattie wrote: > >> Out of curiosity -- why would you want to? > >> Andrew Beattie > >> Software Defined Storage - IT Specialist > >> Phone: 614-2133-7927 > >> E-mail: abeattie at au1.ibm.com > >> > >> > >> > >> ----- Original message ----- > >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> To: gpfsug main discussion list > >> Cc: > >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > >> Date: Mon, Jan 23, 2017 11:11 AM > >> > >> This is going to sound like a ridiculous request, but, is there a > way to > >> cause a filesystem to panic everywhere in one "swell foop"? I'm > assuming > >> the answer will come with an appropriate disclaimer of "don't ever > do > >> this, we don't support it, it might eat your data, summon cthulu, > etc.". > >> I swear I've seen the fs manager initiate this type of operation > before. > >> > >> I can seem to do it on a per-node basis with "mmfsadm test panic > > >> " but if I do that over all 1k nodes in my test cluster > at > >> once it results in about 45 minutes of almost total deadlock while > each > >> panic is processed by the fs manager. > >> > >> -Aaron > >> > >> -- > >> Aaron Knister > >> NASA Center for Climate Simulation (Code 606.2) > >> Goddard Space Flight Center > >> (301) 286-2776 > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jan 24 10:00:42 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 24 Jan 2017 10:00:42 +0000 Subject: [gpfsug-discuss] Manager nodes Message-ID: We are looking at moving manager processes off our NSD nodes and on to dedicated quorum/manager nodes. Are there some broad recommended hardware specs for the function of these nodes. I assume they benefit from having high memory (for some value of high, probably a function of number of clients, files, expected open files?, and probably completely incalculable, so some empirical evidence may be useful here?) (I'm going to ignore the docs that say you should have twice as much swap as RAM!) What about cores, do they benefit from high core counts or high clock rates? For example would I benefit more form a high core count, low clock speed, or going for higher clock speeds and reducing core count? Or is memory bandwidth more important for manager nodes? Connectivity, does token management run over IB or only over Ethernet/admin network? I.e. Should I bother adding IB cards, or just have fast Ethernet on them (my clients/NSDs all have IB). I'm looking for some hints on what I would most benefit in investing in vs keeping to budget. Thanks Simon From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jan 24 15:18:09 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 24 Jan 2017 15:18:09 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: References: Message-ID: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From janfrode at tanso.net Tue Jan 24 15:51:05 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 24 Jan 2017 15:51:05 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> Message-ID: Just some datapoints, in hope that it helps.. I've seen metadata performance improvements by turning down hyperthreading from 8/core to 4/core on Power8. Also it helped distributing the token managers over multiple nodes (6+) instead of fewer. I would expect this to flow over IP, not IB. -jf tir. 24. jan. 2017 kl. 16.18 skrev Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu>: Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jan 24 16:34:16 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 24 Jan 2017 16:34:16 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu>, Message-ID: Thanks both. I was thinking of adding 4 (we have a storage cluster over two DC's, so was planning to put two in each and use them as quorum nodes as well plus one floating VM to guarantee only one sitr is quorate in the event of someone cutting a fibre...) We pretty much start at 128GB ram and go from there, so this sounds fine. Would be good if someone could comment on if token traffic goes via IB or Ethernet, maybe I can save myself a few EDR cards... Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net] Sent: 24 January 2017 15:51 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Manager nodes Just some datapoints, in hope that it helps.. I've seen metadata performance improvements by turning down hyperthreading from 8/core to 4/core on Power8. Also it helped distributing the token managers over multiple nodes (6+) instead of fewer. I would expect this to flow over IP, not IB. -jf tir. 24. jan. 2017 kl. 16.18 skrev Buterbaugh, Kevin L >: Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) > wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Tue Jan 24 16:53:24 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 24 Jan 2017 16:53:24 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu>, Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06544544@CHI-EXCHANGEW1.w2k.jumptrading.com> It goes over IP, and that could be IPoIB if you have the daemon interface or subnets configured that way, but it will go over native IB VERBS if you have rdmaVerbsSend enabled (not recommended for large clusters). verbsRdmaSend Enables or disables the use of InfiniBand RDMA rather than TCP for most GPFS daemon-to-daemon communication. When disabled, only data transfers between an NSD client and NSD server are eligible for RDMA. Valid values are enable or disable. The default value is disable. The verbsRdma option must be enabled for verbsRdmaSend to have any effect. HTH, -B -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, January 24, 2017 10:34 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Manager nodes Thanks both. I was thinking of adding 4 (we have a storage cluster over two DC's, so was planning to put two in each and use them as quorum nodes as well plus one floating VM to guarantee only one sitr is quorate in the event of someone cutting a fibre...) We pretty much start at 128GB ram and go from there, so this sounds fine. Would be good if someone could comment on if token traffic goes via IB or Ethernet, maybe I can save myself a few EDR cards... Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net] Sent: 24 January 2017 15:51 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Manager nodes Just some datapoints, in hope that it helps.. I've seen metadata performance improvements by turning down hyperthreading from 8/core to 4/core on Power8. Also it helped distributing the token managers over multiple nodes (6+) instead of fewer. I would expect this to flow over IP, not IB. -jf tir. 24. jan. 2017 kl. 16.18 skrev Buterbaugh, Kevin L >: Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ... ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU's and 64 GB RAM. They've got 10 GbE, but we don't use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don't have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL... Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) > wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From UWEFALKE at de.ibm.com Tue Jan 24 17:36:22 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 24 Jan 2017 18:36:22 +0100 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> Message-ID: Hi, Kevin, I'd look for more cores on the expense of clock speed. You send data over routes involving much higher latencies than your CPU-memory combination has even in the slowest available clock rate, but GPFS with its multi-threaded appoach is surely happy if it can start a few more threads. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 01/24/2017 04:18 PM Subject: Re: [gpfsug-discuss] Manager nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Tue Jan 24 19:48:02 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 24 Jan 2017 19:48:02 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes Message-ID: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. From Achim.Rehor at de.ibm.com Wed Jan 25 08:58:58 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 25 Jan 2017 09:58:58 +0100 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06544544@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu>, <21BC488F0AEA2245B2C3E83FC0B33DBB06544544@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Jan 25 11:30:00 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 25 Jan 2017 12:30:00 +0100 Subject: [gpfsug-discuss] snapshots Message-ID: <20170125113000.lwvzpekzjsjvghx5@ics.muni.cz> Hello, is there a way to get number of inodes consumed by a particular snapshot? I have a fileset with separate inodespace: Filesets in file system 'vol1': Name Status Path InodeSpace MaxInodes AllocInodes UsedInodes export Linked /gpfs/vol1/export 1 300000256 300000256 157515747 and it reports no space left on device. It seems that inodes consumed by fileset snapshots are not accounted under usedinodes. So can I somehow check how many inodes are consumed by snapshots? The 'no space left on device' IS caused by exhausted inodes, I can store more data into existing files and if I increase the inode limit, I can create new files. -- Luk?? Hejtm?nek From r.sobey at imperial.ac.uk Wed Jan 25 16:08:27 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 25 Jan 2017 16:08:27 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors Message-ID: Hoping someone can show me what should be obvious. I've got an LROC device configured but I want to see stats for it in the GUI: 1) On the CES node itself I've modified ZIMonSensors.cfg and under the GPFSLROC section changed it to 10: { name = "GPFSLROC" period = 10 }, 2) On the CES node restarted pmsensors. 3) On the collector node restarted pmcollector. But I can't find anywhere in the GUI that lets me look at anything LROC related. Anyone got this working? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jan 25 20:25:19 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 20:25:19 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: References: Message-ID: Richard, there are no exposures of LROC counters in the Scale GUI. you need to use the grafana bridge to get graphs or the command line tools to query the data in text format. Sven On Wed, Jan 25, 2017 at 5:08 PM Sobey, Richard A wrote: > Hoping someone can show me what should be obvious. I?ve got an LROC device > configured but I want to see stats for it in the GUI: > > > > 1) On the CES node itself I?ve modified ZIMonSensors.cfg and under > the GPFSLROC section changed it to 10: > > > > { > > name = "GPFSLROC" > > period = 10 > > }, > > > > 2) On the CES node restarted pmsensors. > > 3) On the collector node restarted pmcollector. > > > > But I can?t find anywhere in the GUI that lets me look at anything LROC > related. > > > > Anyone got this working? > > > > Cheers > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jan 25 20:45:05 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 25 Jan 2017 20:45:05 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors Message-ID: <0CDC969E-7CB9-4B4E-9AAA-1BF9193BF7E2@nuance.com> For the Zimon ?GPFSLROC?, what metrics can Grafana query, I don?t see them documented or exposed anywhere: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adv_listofmetricsPMT.htm Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Wednesday, January 25, 2017 at 2:25 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] LROC Zimon sensors Richard, there are no exposures of LROC counters in the Scale GUI. you need to use the grafana bridge to get graphs or the command line tools to query the data in text format. Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 25 20:50:28 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 25 Jan 2017 14:50:28 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: Message-ID: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From oehmes at gmail.com Wed Jan 25 21:00:03 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:00:03 +0000 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: Message-ID: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil wrote: > Hello all, > > We are having an issue where the LROC on a CES node gets overrun 100% > utilized. Processes then start to backup waiting for the LROC to > return data. Any way to have the GPFS client go direct if LROC gets to > busy? > > Thanks > Matt > > ________________________________ > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 25 21:01:11 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 25 Jan 2017 21:01:11 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: References: , Message-ID: Ok Sven thanks, looks like I'll be checking out grafana. Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: 25 January 2017 20:25 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] LROC Zimon sensors Richard, there are no exposures of LROC counters in the Scale GUI. you need to use the grafana bridge to get graphs or the command line tools to query the data in text format. Sven On Wed, Jan 25, 2017 at 5:08 PM Sobey, Richard A > wrote: Hoping someone can show me what should be obvious. I've got an LROC device configured but I want to see stats for it in the GUI: 1) On the CES node itself I've modified ZIMonSensors.cfg and under the GPFSLROC section changed it to 10: { name = "GPFSLROC" period = 10 }, 2) On the CES node restarted pmsensors. 3) On the collector node restarted pmcollector. But I can't find anywhere in the GUI that lets me look at anything LROC related. Anyone got this working? Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jan 25 21:06:12 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:06:12 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: <0CDC969E-7CB9-4B4E-9AAA-1BF9193BF7E2@nuance.com> References: <0CDC969E-7CB9-4B4E-9AAA-1BF9193BF7E2@nuance.com> Message-ID: Hi, i guess thats a docu gap, i will send a email trying to get this fixed. here is the list of sensors : [image: pasted1] i hope most of them are self explaining given the others are documented , if not let me know and i clarify . sven On Wed, Jan 25, 2017 at 9:45 PM Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > For the Zimon ?GPFSLROC?, what metrics can Grafana query, I don?t see them > documented or exposed anywhere: > > > > > http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adv_listofmetricsPMT.htm > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > > > *From: * on behalf of Sven > Oehme > *Reply-To: *gpfsug main discussion list > *Date: *Wednesday, January 25, 2017 at 2:25 PM > *To: *"gpfsug-discuss at spectrumscale.org" > > *Subject: *[EXTERNAL] Re: [gpfsug-discuss] LROC Zimon sensors > > > > Richard, > > > > there are no exposures of LROC counters in the Scale GUI. you need to use > the grafana bridge to get graphs or the command line tools to query the > data in text format. > > > > Sven > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pasted1 Type: image/png Size: 283191 bytes Desc: not available URL: From oehmes at gmail.com Wed Jan 25 21:08:02 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:08:02 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: References: Message-ID: start here : https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/IBM%20Spectrum%20Scale%20Performance%20Monitoring%20Bridge On Wed, Jan 25, 2017 at 10:01 PM Sobey, Richard A wrote: > Ok Sven thanks, looks like I'll be checking out grafana. > > > Richard > > > ------------------------------ > *From:* gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < > oehmes at gmail.com> > *Sent:* 25 January 2017 20:25 > *To:* gpfsug-discuss at spectrumscale.org > *Subject:* Re: [gpfsug-discuss] LROC Zimon sensors > > Richard, > > there are no exposures of LROC counters in the Scale GUI. you need to use > the grafana bridge to get graphs or the command line tools to query the > data in text format. > > Sven > > > On Wed, Jan 25, 2017 at 5:08 PM Sobey, Richard A > wrote: > > Hoping someone can show me what should be obvious. I?ve got an LROC device > configured but I want to see stats for it in the GUI: > > > > 1) On the CES node itself I?ve modified ZIMonSensors.cfg and under > the GPFSLROC section changed it to 10: > > > > { > > name = "GPFSLROC" > > period = 10 > > }, > > > > 2) On the CES node restarted pmsensors. > > 3) On the collector node restarted pmcollector. > > > > But I can?t find anywhere in the GUI that lets me look at anything LROC > related. > > > > Anyone got this working? > > > > Cheers > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 25 21:20:21 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 25 Jan 2017 15:20:21 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: Message-ID: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jan 25 21:29:50 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:29:50 +0000 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> Message-ID: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil wrote: > > > On 1/25/17 3:00 PM, Sven Oehme wrote: > > Matt, > > the assumption was that the remote devices are slower than LROC. there is > some attempts in the code to not schedule more than a maximum numbers of > outstanding i/os to the LROC device, but this doesn't help in all cases and > is depending on what kernel level parameters for the device are set. the > best way is to reduce the max size of data to be cached into lroc. > > I just turned LROC file caching completely off. most if not all of the IO > is metadata. Which is what I wanted to keep fast. It is amazing once you > drop the latency the IO's go up way more than they ever where before. I > guess we will need another nvme. > > > sven > > > On Wed, Jan 25, 2017 at 9:50 PM Matt Weil wrote: > > Hello all, > > We are having an issue where the LROC on a CES node gets overrun 100% > utilized. Processes then start to backup waiting for the LROC to > return data. Any way to have the GPFS client go direct if LROC gets to > busy? > > Thanks > Matt > > ________________________________ > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 25 21:51:43 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 25 Jan 2017 15:51:43 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> Message-ID: [ces1,ces2,ces3] maxStatCache 80000 worker1Threads 2000 maxFilesToCache 500000 pagepool 100G maxStatCache 80000 lrocData no 378G system memory. On 1/25/17 3:29 PM, Sven Oehme wrote: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? yes data I just turned off also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil > wrote: On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Jan 26 15:37:54 2017 From: mweil at wustl.edu (Matt Weil) Date: Thu, 26 Jan 2017 09:37:54 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> Message-ID: <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> 100% utilized are bursts above 200,000 IO's. Any way to tell ganesha.nfsd to cache more? On 1/25/17 3:51 PM, Matt Weil wrote: [ces1,ces2,ces3] maxStatCache 80000 worker1Threads 2000 maxFilesToCache 500000 pagepool 100G maxStatCache 80000 lrocData no 378G system memory. On 1/25/17 3:29 PM, Sven Oehme wrote: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? yes data I just turned off also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil > wrote: On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 26 17:15:56 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 26 Jan 2017 17:15:56 +0000 Subject: [gpfsug-discuss] mmlsquota output question Message-ID: <73AC6907-90BD-447F-9F72-4B7CBBFE2321@vanderbilt.edu> Hi All, We had 3 local GPFS filesystems on our cluster ? let?s call them gpfs0, gpfs1, and gpfs2. gpfs0 is for project space (i.e. groups can buy quota in 1 TB increments there). gpfs1 is scratch and gpfs2 is home. We are combining gpfs0 and gpfs1 into one new filesystem (gpfs3) ? we?re doing this for multiple reasons that aren?t really pertinent to my question here, but suffice it to say I have discussed our plan with some of IBM?s GPFS people and they agree that it?s the thing for us to do. gpfs3 will have a scratch fileset with no fileset quota, but user and group quotas (just like the gpfs1 filesystem currently has). We will also move all the filesets from gpfs0 over to gpfs3 - those use fileset quotas only - no user or group quotas. I have created the new gpfs3 filesystem, the scratch fileset within it, and one of the project filesets coming over from gpfs0. I?ve also moved my scratch directory to the gpfs3 scratch fileset. When I run mmlsquota I see (please note, I?ve changed names of things to protect the guilty): kevin at gateway: mmlsquota -u kevin --block-size auto Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs0 USR no limits Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs1 USR 2.008G 50G 200G 0 none | 3 100000 1000000 0 none Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs2 USR 11.69G 25G 35G 0 none | 8453 100000 200000 0 none Block Limits | File Limits Filesystem Fileset type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs3 root USR no limits gpfs3 scratch USR 31.04G 50G 200G 0 none | 2134 200000 1000000 0 none gpfs3 fakegroup USR no limits kevin at gateway: My question is this ? why am I seeing the ?root? and ?fakegroup? filesets listed in the output for gpfs3? They don?t show up for gpfs0 and the also exist there. Is it possibly because there are no user quotas whatsoever for gpfs0 and there are user quotas on the gpfs3:scratch fileset? If so, that still doesn?t make sense as to why mmlsquota would think it needs to show the filesets within that filesystem that don?t have user quotas. In fact, we don?t *want* that to happen, as we have certain groups that deal with various types of restricted data and we?d prefer that their existence not be advertised to everyone on the cluster. Oh, we?re still in the process of upgrading clients on our cluster, but this output is from a client running 4.2.2.1, in case that matters. Thanks all... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Jan 26 20:20:00 2017 From: mweil at wustl.edu (Matt Weil) Date: Thu, 26 Jan 2017 14:20:00 -0600 Subject: [gpfsug-discuss] LROC nvme small IO size 4 k In-Reply-To: <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> Message-ID: I still see small 4k IO's going to the nvme device after changing the max_sectors_kb. Writes did increase from 64 to 512. Is that a nvme limitation. > [root at ces1 system]# cat /sys/block/nvme0n1/queue/read_ahead_kb > 8192 > [root at ces1 system]# cat /sys/block/nvme0n1/queue/nr_requests > 512 > [root at ces1 system]# cat /sys/block/nvme0n1/queue/max_sectors_kb > 8192 > [root at ces1 system]# collectl -sD --dskfilt=nvme0n1 > waiting for 1 second sample... > > # DISK STATISTICS (/sec) > # > <---------reads---------><---------writes---------><--------averages--------> > Pct > #Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize > QLen Wait SvcTim Util > nvme0n1 47187 0 11K 4 30238 0 59 512 > 6 8 0 0 34 > nvme0n1 61730 0 15K 4 14321 0 28 512 > 4 9 0 0 45 ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From Robert.Oesterlin at nuance.com Fri Jan 27 00:57:05 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 27 Jan 2017 00:57:05 +0000 Subject: [gpfsug-discuss] Waiter identification help - Quota related Message-ID: OK, I have a sick cluster, and it seems to be tied up with quota related RPCs like this. Any help in narrowing down what the issue is? Waiting 3.8729 sec since 19:54:09, monitored, thread 32786 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.3158 sec since 19:54:08, monitored, thread 32771 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.3173 sec since 19:54:08, monitored, thread 35829 Msg handler quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.4619 sec since 19:54:08, monitored, thread 9694 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.4967 sec since 19:54:08, monitored, thread 32357 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.6885 sec since 19:54:08, monitored, thread 32305 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.7123 sec since 19:54:08, monitored, thread 32261 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.7932 sec since 19:54:08, monitored, thread 53409 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.2954 sec since 19:54:07, monitored, thread 32905 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3058 sec since 19:54:07, monitored, thread 32573 Msg handler quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3207 sec since 19:54:07, monitored, thread 32397 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3274 sec since 19:54:07, monitored, thread 32897 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3343 sec since 19:54:07, monitored, thread 32691 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3347 sec since 19:54:07, monitored, thread 32364 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3348 sec since 19:54:07, monitored, thread 32522 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jan 27 01:26:49 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 26 Jan 2017 20:26:49 -0500 Subject: [gpfsug-discuss] Waiter identification help - Quota related In-Reply-To: References: Message-ID: <49f984fc-4881-60fd-88a0-29701ce4ea73@nasa.gov> This might be a stretch but do you happen to have a user/fileset/group over it's hard quota or soft quota + grace period? We've had this really upset our cluster before. At least with 3.5 each op that's done against an over quota user/group/fileset results in at least one rpc from the fs manager to every node in the cluster. Are those waiters from an fs manager node? If so perhaps briefly fire up tracing (/usr/lpp/mmfs/bin/mmtrace start) let it run for ~10 seconds then stop it (/usr/lpp/mmfs/bin/mmtrace stop) then grep for "TRACE_QUOTA" out of the resulting trcrpt file. If you see a bunch of lines that contain: TRACE_QUOTA: qu.server revoke reply type that might be what's going on. You can also see the behavior if you look at the output of mmdiag --network on your fs manager nodes and see a bunch of RPC's with all of your cluster node listed as the recipients. Can't recall what the RPC is called that you're looking for, though. Hope that helps! -Aaron On 1/26/17 7:57 PM, Oesterlin, Robert wrote: > OK, I have a sick cluster, and it seems to be tied up with quota related > RPCs like this. Any help in narrowing down what the issue is? > > > > Waiting 3.8729 sec since 19:54:09, monitored, thread 32786 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.3158 sec since 19:54:08, monitored, thread 32771 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.3173 sec since 19:54:08, monitored, thread 35829 Msg handler > quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.4619 sec since 19:54:08, monitored, thread 9694 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.4967 sec since 19:54:08, monitored, thread 32357 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.6885 sec since 19:54:08, monitored, thread 32305 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.7123 sec since 19:54:08, monitored, thread 32261 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.7932 sec since 19:54:08, monitored, thread 53409 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.2954 sec since 19:54:07, monitored, thread 32905 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3058 sec since 19:54:07, monitored, thread 32573 Msg handler > quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3207 sec since 19:54:07, monitored, thread 32397 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3274 sec since 19:54:07, monitored, thread 32897 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3343 sec since 19:54:07, monitored, thread 32691 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3347 sec since 19:54:07, monitored, thread 32364 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3348 sec since 19:54:07, monitored, thread 32522 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From r.sobey at imperial.ac.uk Fri Jan 27 11:12:25 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 27 Jan 2017 11:12:25 +0000 Subject: [gpfsug-discuss] Nodeclasses question Message-ID: All, Can it be clarified whether specifying "-N ces" (for example, I have a custom nodeclass called ces containing CES nodes of course) will then apply changes to future nodes that join the same nodeclass? For example, "mmchconfig maxFilesToCache=100000 -N ces" will give existing nodes that new config. I then add a 5th node to the nodeclass. Will it inherit the cache value or will I need to set it again? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 27 12:43:40 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 27 Jan 2017 12:43:40 +0000 Subject: [gpfsug-discuss] ?spam? Nodeclasses question Message-ID: I think this depends on you FS min version. We had some issues where ours was still set to 3.5 I think even though we have 4.x clients. The nodeclasses in mmlsconfig were expanded to individual nodes. But adding a node to a node class would apply the config to the node, though I'd expect you to have to stop/restart GPFS on the node and not expect it to work like "mmchconfig -I" Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 27 January 2017 at 11:12 To: "gpfsug-discuss at spectrumscale.org" > Subject: ?spam? [gpfsug-discuss] Nodeclasses question All, Can it be clarified whether specifying ?-N ces? (for example, I have a custom nodeclass called ces containing CES nodes of course) will then apply changes to future nodes that join the same nodeclass? For example, ?mmchconfig maxFilesToCache=100000 ?N ces? will give existing nodes that new config. I then add a 5th node to the nodeclass. Will it inherit the cache value or will I need to set it again? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From gil at us.ibm.com Fri Jan 27 13:08:06 2017 From: gil at us.ibm.com (Gil Sharon) Date: Fri, 27 Jan 2017 08:08:06 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 60, Issue 72 In-Reply-To: References: Message-ID: yes, node-classes are updated across all nodes, so if you add a node to an existing class it will be included from then on. But for CES nodes there is already a 'built-in' system class: cesNodes. why not use that? you can see all system nodeclasses by: mmlsnodeclass --system Regards, GIL SHARON Spectrum Scale (GPFS) Development Mobile: 978-302-9355 E-mail: gil at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/27/2017 07:00 AM Subject: gpfsug-discuss Digest, Vol 60, Issue 72 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Nodeclasses question (Sobey, Richard A) ---------------------------------------------------------------------- Message: 1 Date: Fri, 27 Jan 2017 11:12:25 +0000 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] Nodeclasses question Message-ID: Content-Type: text/plain; charset="us-ascii" All, Can it be clarified whether specifying "-N ces" (for example, I have a custom nodeclass called ces containing CES nodes of course) will then apply changes to future nodes that join the same nodeclass? For example, "mmchconfig maxFilesToCache=100000 -N ces" will give existing nodes that new config. I then add a 5th node to the nodeclass. Will it inherit the cache value or will I need to set it again? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170127/0d841ddb/attachment-0001.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 60, Issue 72 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at wustl.edu Fri Jan 27 15:49:12 2017 From: mweil at wustl.edu (Matt Weil) Date: Fri, 27 Jan 2017 09:49:12 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> Message-ID: <0ad3735a-77d4-6d98-6e8a-135479f3f594@wustl.edu> turning off data seems to have helped this issue Thanks all On 1/26/17 9:37 AM, Matt Weil wrote: 100% utilized are bursts above 200,000 IO's. Any way to tell ganesha.nfsd to cache more? On 1/25/17 3:51 PM, Matt Weil wrote: [ces1,ces2,ces3] maxStatCache 80000 worker1Threads 2000 maxFilesToCache 500000 pagepool 100G maxStatCache 80000 lrocData no 378G system memory. On 1/25/17 3:29 PM, Sven Oehme wrote: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? yes data I just turned off also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil > wrote: On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Fri Jan 27 17:17:53 2017 From: laurence at qsplace.co.uk (laurence at qsplace.co.uk) Date: Fri, 27 Jan 2017 17:17:53 +0000 Subject: [gpfsug-discuss] ?spam? Nodeclasses question In-Reply-To: References: Message-ID: Richard, As Simon notes in 3.5 they were expanded and where a pain; however this has since been tidied up and now works as it "should". So any further node added to a group will inherit the relevant parts of the config. i.e. (I've snipped the boring bits out) mmlsnodeclass Node Class Name Members --------------------- ----------------------------------------------------------- site2 s2gpfs1.site2,s2gpfs2.site2 mmchconfig pagepool=2G -N site2 mmshutdown -a mmstartup -a mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs3.site2: pagepool 1073741824 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 mmchnodeclass site2 add -N s2gpfs3.site2 mmshutdown -N s2gpfs3.site2 mmstartup -N s2gpfs3.site2 mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 s2gpfs3.site2: ! pagepool 2147483648 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 -- Lauz On 2017-01-27 12:43, Simon Thompson (Research Computing - IT Services) wrote: > I think this depends on you FS min version. > > We had some issues where ours was still set to 3.5 I think even though > we have 4.x clients. The nodeclasses in mmlsconfig were expanded to > individual nodes. But adding a node to a node class would apply the > config to the node, though I'd expect you to have to stop/restart GPFS > on the node and not expect it to work like "mmchconfig -I" > > Simon > > From: on behalf of "Sobey, > Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > > Date: Friday, 27 January 2017 at 11:12 > To: "gpfsug-discuss at spectrumscale.org" > > Subject: ?spam? [gpfsug-discuss] Nodeclasses question > > All, > > Can it be clarified whether specifying ?-N ces? (for example, I > have a custom nodeclass called ces containing CES nodes of course) > will then apply changes to future nodes that join the same nodeclass? > > For example, ?mmchconfig maxFilesToCache=100000 ?N ces? will > give existing nodes that new config. I then add a 5th node to the > nodeclass. Will it inherit the cache value or will I need to set it > again? > > Thanks > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From r.sobey at imperial.ac.uk Fri Jan 27 21:13:28 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 27 Jan 2017 21:13:28 +0000 Subject: [gpfsug-discuss] ?spam? Nodeclasses question In-Reply-To: References: , Message-ID: Thanks Lauz and Simon. Next question and I presume the answer is "yes": if you specify a node explicitly that already has a certain config applied through a nodeclass, the value that has been set specific to that node should override the nodeclass setting. Correct? Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of laurence at qsplace.co.uk Sent: 27 January 2017 17:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] ?spam? Nodeclasses question Richard, As Simon notes in 3.5 they were expanded and where a pain; however this has since been tidied up and now works as it "should". So any further node added to a group will inherit the relevant parts of the config. i.e. (I've snipped the boring bits out) mmlsnodeclass Node Class Name Members --------------------- ----------------------------------------------------------- site2 s2gpfs1.site2,s2gpfs2.site2 mmchconfig pagepool=2G -N site2 mmshutdown -a mmstartup -a mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs3.site2: pagepool 1073741824 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 mmchnodeclass site2 add -N s2gpfs3.site2 mmshutdown -N s2gpfs3.site2 mmstartup -N s2gpfs3.site2 mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 s2gpfs3.site2: ! pagepool 2147483648 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 -- Lauz On 2017-01-27 12:43, Simon Thompson (Research Computing - IT Services) wrote: > I think this depends on you FS min version. > > We had some issues where ours was still set to 3.5 I think even though > we have 4.x clients. The nodeclasses in mmlsconfig were expanded to > individual nodes. But adding a node to a node class would apply the > config to the node, though I'd expect you to have to stop/restart GPFS > on the node and not expect it to work like "mmchconfig -I" > > Simon > > From: on behalf of "Sobey, > Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > > Date: Friday, 27 January 2017 at 11:12 > To: "gpfsug-discuss at spectrumscale.org" > > Subject: ?spam? [gpfsug-discuss] Nodeclasses question > > All, > > Can it be clarified whether specifying "-N ces" (for example, I > have a custom nodeclass called ces containing CES nodes of course) > will then apply changes to future nodes that join the same nodeclass? > > For example, "mmchconfig maxFilesToCache=100000 -N ces" will > give existing nodes that new config. I then add a 5th node to the > nodeclass. Will it inherit the cache value or will I need to set it > again? > > Thanks > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jan 27 22:54:51 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 27 Jan 2017 17:54:51 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: References: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> Message-ID: <239473a0-a8b7-0f13-f55d-a9e85948ce19@nasa.gov> This is rather disconcerting. We just finished upgrading our nsd servers from 3.5.0.31 to 4.1.1.10 (All clients were previously migrated from 3.5.0.31 to 4.1.1.10). After finishing that upgrade I'm now seeing these errors with some frequency (a couple every few minutes). Anyone have insight? On 1/18/17 11:58 AM, Brian Marshall wrote: > As background, we recently upgraded GPFS from 4.2.0 to 4.2.1 and > updated the Mellanox OFED on our compute cluster to allow it to move > from CentOS 7.1 to 7.2 > > We do some transient warnings from the Mellanox switch gear about > various port counters that we are tracking down with them. > > Jobs and filesystem seem stable, but the logs are concerning. > > On Wed, Jan 18, 2017 at 10:22 AM, Aaron Knister > > wrote: > > I'm curious about this too. We see these messages sometimes when > things have gone horribly wrong but also sometimes during recovery > events. Here's a recent one: > > loremds20 (manager/nsd node): > Mon Jan 16 14:19:02.048 2017: [E] VERBS RDMA rdma read error > IBV_WC_REM_ACCESS_ERR to 10.101.11.6 (lorej006) on mlx5_0 port 1 > fabnum 3 vendor_err 136 > Mon Jan 16 14:19:02.049 2017: [E] VERBS RDMA closed connection to > 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 due to RDMA read > error IBV_WC_REM_ACCESS_ERR index 11 > > lorej006 (client): > Mon Jan 16 14:19:01.990 2017: [N] VERBS RDMA closed connection to > 10.101.53.18 (loremds18) on mlx5_0 port 1 fabnum 3 index 2 > Mon Jan 16 14:19:01.995 2017: [N] VERBS RDMA closed connection to > 10.101.53.19 (loremds19) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:01.997 2017: [I] Recovering nodes: 10.101.53.18 > 10.101.53.19 > Mon Jan 16 14:19:02.047 2017: [W] VERBS RDMA async event > IBV_EVENT_QP_ACCESS_ERR on mlx5_0 qp 0x7fffe550f1c8. > Mon Jan 16 14:19:02.051 2017: [E] VERBS RDMA closed connection to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 error 733 index 1 > Mon Jan 16 14:19:02.071 2017: [I] Recovered 2 nodes for file system > tnb32. > Mon Jan 16 14:19:02.140 2017: [I] VERBS RDMA connecting to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:02.160 2017: [I] VERBS RDMA connected to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 sl 0 index 0 > > I had just shut down loremds18 and loremds19 so there was certainly > recovery taking place and during that time is when the error seems > to have occurred. > > I looked up the meaning of IBV_WC_REM_ACCESS_ERR here > (http://www.rdmamojo.com/2013/02/15/ibv_poll_cq/ > ) and see this: > > IBV_WC_REM_ACCESS_ERR (10) - Remote Access Error: a protection error > occurred on a remote data buffer to be read by an RDMA Read, written > by an RDMA Write or accessed by an atomic operation. This error is > reported only on RDMA operations or atomic operations. Relevant for > RC QPs. > > my take on it during recovery it seems like one end of the > connection more or less hanging up on the other end (e.g. Connection > reset by peer > /ECONNRESET). > > But like I said at the start, we also see this when there something > has gone awfully wrong. > > -Aaron > > On 1/18/17 3:59 AM, Simon Thompson (Research Computing - IT > Services) wrote: > > I'd be inclined to look at something like: > > ibqueryerrors -s > PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors > -c > > And see if you have a high number of symbol errors, might be a cable > needs replugging or replacing. > > Simon > > From: > >> on behalf of > "J. Eric > Wonderley" > >> > Reply-To: "gpfsug-discuss at spectrumscale.org > > >" > > >> > Date: Tuesday, 17 January 2017 at 21:16 > To: "gpfsug-discuss at spectrumscale.org > > >" > > >> > Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs > > I have messages like these frequent my logs: > Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error > IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 > fabnum 0 > vendor_err 136 > Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to > 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error > IBV_WC_REM_ACCESS_ERR index 23 > > Any ideas on cause..? > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From jonathon.anderson at colorado.edu Mon Jan 30 22:10:25 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Mon, 30 Jan 2017 22:10:25 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. From olaf.weiser at de.ibm.com Tue Jan 31 08:30:19 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 09:30:19 +0100 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 15:13:34 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 15:13:34 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Jan 31 15:42:33 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 16:42:33 +0100 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 16:32:18 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 16:32:18 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 16:35:23 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 16:35:23 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: <1515B2FC-1B1B-4A8B-BB7B-CD7C815B662A@colorado.edu> > [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa Just to head-off any concerns that this problem is a result of the ces-ip in this command not being one of the ces ips added in my earlier examples, this is just an artifact of changing configuration during the troubleshooting process. I realized that while 10.225.71.{104,105} were allocated to this node, they were to be used for something else, and shouldn?t be under CES control; so I changed our CES addresses to 10.225.71.{102,103}. On 1/30/17, 3:10 PM, "Jonathon A Anderson" wrote: In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. From olaf.weiser at de.ibm.com Tue Jan 31 17:45:17 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 17:45:17 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: Message-ID: I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von:"Jonathon A Anderson" An:"gpfsug main discussion list" Datum:Di. 31.01.2017 17:32Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 17:47:12 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 17:47:12 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: Message-ID: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jan 31 20:07:14 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 31 Jan 2017 20:07:14 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> References: , <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> Message-ID: We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes. According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 17:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Tue Jan 31 20:11:31 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 20:11:31 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> Message-ID: Simon, This is what I?d usually do, and I?m pretty sure it?d fix the problem; but we only have two protocol nodes, so no good way to do quorum in a separate cluster of just those two. Plus, I?d just like to see the bug fixed. I suppose we could move the compute nodes to a separate cluster, and keep the protocol nodes together with the NSD servers; but then I?m back to the age-old question of ?do I technically violate the GPFS license in order to do the right thing architecturally?? (Since you have to nominate GPFS servers in the client-only cluster to manage quorum, for nodes that only have client licenses.) So far, we?re 100% legit, and it?d be better to stay that way. ~jonathon On 1/31/17, 1:07 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services)" wrote: We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes. According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 17:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jan 31 20:21:10 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 31 Jan 2017 20:21:10 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> , Message-ID: Ah we have separate server licensed nodes in the hpc cluster (typically we have some stuff for config management, monitoring etc, so we license those as servers). Agreed the bug should be fixed, I was meaning that we probably don't see it as the CES cluster is 4 nodes serving protocols (plus some other data access boxes). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 20:11 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Simon, This is what I?d usually do, and I?m pretty sure it?d fix the problem; but we only have two protocol nodes, so no good way to do quorum in a separate cluster of just those two. Plus, I?d just like to see the bug fixed. I suppose we could move the compute nodes to a separate cluster, and keep the protocol nodes together with the NSD servers; but then I?m back to the age-old question of ?do I technically violate the GPFS license in order to do the right thing architecturally?? (Since you have to nominate GPFS servers in the client-only cluster to manage quorum, for nodes that only have client licenses.) So far, we?re 100% legit, and it?d be better to stay that way. ~jonathon On 1/31/17, 1:07 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services)" wrote: We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes. According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 17:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Tue Jan 31 22:47:23 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 22:47:23 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: Message-ID: Yeah... depending on the #nodes you 're affected or not. ..... So if your remote ces cluster is small enough in terms of the #nodes ... you'll neuer hit into this issue Gesendet von IBM Verse Simon Thompson (Research Computing - IT Services) --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von:"Simon Thompson (Research Computing - IT Services)" An:"gpfsug main discussion list" Datum:Di. 31.01.2017 21:07Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes.According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken.Simon________________________________________From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu]Sent: 31 January 2017 17:47To: gpfsug main discussion listSubject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesYeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment.~jonathonFrom: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesI ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi kGesendet von IBM VerseJonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ---Von:"Jonathon A Anderson" An:"gpfsug main discussion list" Datum:Di. 31.01.2017 17:32Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes________________________________No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort.I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR?Thanks.~jonathonFrom: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesok.. so obviously ... it seems , that we have several issues..the 3983 characters is obviously a defecthave you already raised a PMR , if so , can you send me the number ?From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PMSubject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesSent by: gpfsug-discuss-bounces at spectrumscale.org________________________________The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread.The actual command istsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefileBut you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster.[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l120[root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l403Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters.[root at sgate2 ~]# tsctl shownodes up | wc -c3983Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete.[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1shas0260-opa.rc.int.col[root at sgate2 ~]#I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :)I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though.For the record:[root at sgate2 ~]# rpm -qa | grep -i gpfsgpfs.base-4.2.1-2.x86_64gpfs.msg.en_US-4.2.1-2.noarchgpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64gpfs.gskit-8.0.50-57.x86_64gpfs.gpl-4.2.1-2.noarchnfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64gpfs.ext-4.2.1-2.x86_64gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64gpfs.docs-4.2.1-2.noarch~jonathonFrom: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesHi ...same thing here.. everything after 10 nodes will be truncated..though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-)the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items...should be easy to fix..cheersolafFrom: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PMSubject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesSent by: gpfsug-discuss-bounces at spectrumscale.org________________________________In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm?Here are the details of my investigation:## GPFS is up on sgate2[root at sgate2 ~]# mmgetstateNode number Node name GPFS state------------------------------------------ 414 sgate2-opa active## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down[root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opammces address move: GPFS is down on this node.mmces address move: Command failed. Examine previous error messages to determine cause.## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs[root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\"%s: GPFS is down on this node."## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList[root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddressdownNodeList=$(getDownCesNodeList)for downNode in $downNodeListdo if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd"## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up`[root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncsfunction getDownCesNodeList{typeset sourceFile="mmcesfuncs.sh"[[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] &&set -x$mmTRACE_ENTER "$*"typeset upnodefile=${cmdTmpDir}upnodefiletypeset downNodeList# get all CES nodes$sort -o $nodefile $mmfsCesNodes.dae$tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefiledownNodeList=$($comm -23 $nodefile $upnodefile)print -- $downNodeList} #----- end of function getDownCesNodeList --------------------## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tailshas0251-opa.rc.int.colorado.edushas0252-opa.rc.int.colorado.edushas0253-opa.rc.int.colorado.edushas0254-opa.rc.int.colorado.edushas0255-opa.rc.int.colorado.edushas0256-opa.rc.int.colorado.edushas0257-opa.rc.int.colorado.edushas0258-opa.rc.int.colorado.edushas0259-opa.rc.int.colorado.edushas0260-opa.rc.int.col[root at sgate2 ~]### I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`.On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far._______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Tue Jan 3 16:12:26 2017 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Tue, 3 Jan 2017 17:12:26 +0100 Subject: [gpfsug-discuss] SAVE THE DATE - IBM Spectrum Scale (GPFS) Strategy Days 2017, Stuttgart/Ehningen, Germany In-Reply-To: References: Message-ID: Save the Date - as there is a large request for a German speaking Spectrum Scale event here is the next event. Am 8. - 9. M?rz 2017 finden die #WhatsUp IBM - Spectrum Scale Strategy Days Expertentage 2017 statt. Sehr geehrte Damen und Herren, das Team der Konferenz l?dt Sie herzlich ein, an dieser kostenfreien Veranstaltung auf dem IBM Campus in Ehningen (bei Stuttgart) teilzunehmen. Die Expertentage stehen unter dem Leitgedanken, sowohl technische Neuerungen und Funktionen im Detail zu erl?utern, als auch praktische Tipps und Erfahrungen aus Projekten auszutauschen. Aufgrund der regen Nachfrage und vorliegenden Themen werden sich auch die diesj?hrigen Expertentage ?ber zwei Tage erstrecken, um somit den komplexen neuen Funktionen sowie auch dem Erfahrungsaustausch unter Kollegen und anwesenden Experten zwischen den Vortr?gen entsprechend gerecht zu werden.? Die Veranstaltung richtet sich an alle, die die M?glichkeiten von Spectrum Scale innerhalb kurzer Zeit besser nutzen m?chten und/oder sich ?ber die mit Spectrum Scale gemachten Erfahrungen austauschen wollen. Das zweit?gige Programm der Expertentage informiert neben Produktupdates, technischen Details und Serviceangeboten auch ?ber zuk?nftige Releases. Die genaue Programm?bersicht kommt ab Mitte Januar 2017 auf die Registrierungsseite. Anmeldung ist aber schon m?glich unter: 1) Anmeldelink f?r Expertentage 2017 https://www.ibm.com/events/wwe/grp/grp312.nsf/Registration.xsp?openform&seminar=Z9AH7POE&locale=de_DE Beginn 8. M?rz 2017 um 10:00 Uhr, Ende am 9. M?rz gegen 16:00 Uhr Sie oder ihre Kollegen besch?ftigen sich erstmalig mit Spectrum Scale oder m?chten ihr Basis Wissen auffrischen ? F?r Spectrum Scale Einsteiger bieten wir am 7. M?rz zus?tzlich einen Tag an, an dem die Grundlagen von Spectrum Scale und Elastic Storage Server vermittelt werden. 2) Anmeldelink f?r Einsteigertag 2017 https://www.ibm.com/events/wwe/grp/grp312.nsf/Registration.xsp?openform&seminar=3ACDRTOE&locale=de_DE Beginn am 7. M?rz 2017 um 10:00 Uhr, Ende gegen 17:00 Uhr TEILNEHMERKREIS: Kunden, IBM Vertriebspartner und IBM Mitarbeiter mit fundiertem Spectrum Scale (GPFS) Basiswissen. Es ist ein Workshop von Experten f?r Experten. Die Teilnahme an dem Workshop ist kostenfrei. Sprache ist Deutsch. Ort der Veranstaltung: IBM Deutschland GmbH , IBM-Allee 1 (Navigationssystem: Am Keltenwald 1), 71139 Ehningen (bei Stuttgart) IBM Spectrum Scale (GPFS) ist eine bew?hrte, skalierbare und hochleistungsf?hige L?sung f?r Daten-, Objekt- und Dateimanagement, die in vielen Branchen weltweit intensiv eingesetzt wird. Spectrum Scale bietet vereinfachtes Datenmanagement und integrierte Tools f?r den Informationslebenszyklus, die mehrere Petabytes an Daten und Milliarden Dateien verwalten k?nnen. IBM Spectrum Scale Version 4, das softwaredefinierte Speichersystem f?r die Cloud, f?r Big Data, High Performance Computing und Analysen, bietet erweiterte Sicherheit, Leistungsverbesserungen durch Flash-Speicher Integration und h?here Benutzerfreundlichkeit f?r weltweit operierende Unternehmen, die mit anspruchsvollen und datenintensiven Anwendungen arbeiten. Das Konferenz Team freut sich auf Sie: - Heiko Lehmann, mailto:heiko.lehmann at de.ibm.com - Olaf Weiser, mailto:olaf.weiser at de.ibm.com - Ulf Troppens, mailto:troppens at de.ibm.com - Frank Kraemer, mailto:kraemerf at de.ibm.com - Goetz Mensel, mailto:goetz.mensel at de.ibm.com Appendix: Redbooks/Redpapers Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/redpapers/pdfs/redp5411.pdf IBM Spectrum Scale Security http://www.redbooks.ibm.com/redpieces/pdfs/redp5426.pdf IBM Spectrum Archive Enterprise Edition V1.2.2: Installation and Configuration Guide http://www.redbooks.ibm.com/redpieces/pdfs/sg248333.pdf Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach mailto:kraemerf at de.ibm.com voice: +49-(0)171-3043699 / +4970342741078 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Jan 3 20:27:17 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 3 Jan 2017 14:27:17 -0600 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> this follows the IP what ever node the ip lands on. the ganesha.nfsd process seems to stop working. any ideas? there is nothing helpful in the logs. time mount ces200:/vol/aggr14/temp403 /mnt/test mount.nfs: mount system call failed real 1m0.000s user 0m0.000s sys 0m0.010s From Valdis.Kletnieks at vt.edu Tue Jan 3 21:00:44 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Tue, 03 Jan 2017 16:00:44 -0500 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> Message-ID: <177090.1483477244@turing-police.cc.vt.edu> On Tue, 03 Jan 2017 14:27:17 -0600, Matt Weil said: > this follows the IP what ever node the ip lands on. the ganesha.nfsd > process seems to stop working. any ideas? there is nothing helpful in > the logs. Does it in fact "stop working", or are you just having a mount issue? Do already existing mounts work? Does 'ps' report the process running? Any log messages? > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s Check the obvious stuff first. Is temp403 exported to your test box? Does tcpdump/wireshark show the expected network activity? Does wireshark flag any issues? Is there a firewall issue (remember to check *both* ends :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From abeattie at au1.ibm.com Tue Jan 3 22:19:20 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Tue, 3 Jan 2017 22:19:20 +0000 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu>, <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov><28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu><4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov><0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov><5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu><5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu><45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Tue Jan 3 22:40:48 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Tue, 03 Jan 2017 22:40:48 +0000 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu>, <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov><28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu><4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov><0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov><5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu><5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu><45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Andrew, You may have been stung by: 2.34 What considerations are there when running on SELinux? https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux I've see this issue on a customer site myself. Matt, Could you increase the logging verbosity and check the logs further? As per http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm -- Lauz On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie wrote: >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Tue Jan 3 22:56:48 2017 From: Valdis.Kletnieks at vt.edu (Valdis Kletnieks) Date: Tue, 03 Jan 2017 17:56:48 -0500 Subject: [gpfsug-discuss] What is LTFS/EE now called, and what version should I be on? Message-ID: <186951.1483484208@turing-police.cc.vt.edu> So we have GPFS Advanced 4.2.1 installed, and the following RPMs: % rpm -qa 'ltfs*' | sort ltfsle-2.1.6.0-9706.x86_64 ltfsle-library-2.1.6.0-9706.x86_64 ltfsle-library-plus-2.1.6.0-9706.x86_64 ltfs-license-2.1.0-20130412_2702.x86_64 ltfs-mig-1.2.1.1-10232.x86_64 What release of "Spectrum Archive" does this correspond to, and what release do we need to be on if I upgrade GPFS to 4.2.2.1? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From janfrode at tanso.net Tue Jan 3 23:14:21 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 4 Jan 2017 00:14:21 +0100 Subject: [gpfsug-discuss] What is LTFS/EE now called, and what version should I be on? In-Reply-To: <186951.1483484208@turing-police.cc.vt.edu> References: <186951.1483484208@turing-police.cc.vt.edu> Message-ID: This looks like Spectrum Archive v1.2.1.0 (Build 10230). Newest version available on fixcentral is v1.2.2.0, but it doesn't support GPFS v4.2.2.x yet. -jf On Tue, Jan 3, 2017 at 11:56 PM, Valdis Kletnieks wrote: > So we have GPFS Advanced 4.2.1 installed, and the following RPMs: > > % rpm -qa 'ltfs*' | sort > ltfsle-2.1.6.0-9706.x86_64 > ltfsle-library-2.1.6.0-9706.x86_64 > ltfsle-library-plus-2.1.6.0-9706.x86_64 > ltfs-license-2.1.0-20130412_2702.x86_64 > ltfs-mig-1.2.1.1-10232.x86_64 > > What release of "Spectrum Archive" does this correspond to, > and what release do we need to be on if I upgrade GPFS to 4.2.2.1? > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 4 01:21:34 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 3 Jan 2017 19:21:34 -0600 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: nsds and ces nodes are RHEL 7.3 nfsv3 clients are old ubuntu lucid. we finally just removed the IP that seemed to... when moved to a ces node caused it to stop responding. it hung up a few more times but has been working fine now for the last few hours. maybe a bad client apple out there finally gave up ;-) PMR 50787 122 000 waiting on IBM. On 1/3/17 4:19 PM, Andrew Beattie wrote: > Matt > > What Operating system are you running? > > I have an open PMR at present with something very similar > when ever we publish an NFS export via the protocol nodes the nfs > service stops, although we have no issues publishing SMB exports. > > I"m waiting on some testing by the customer but L3 support have > indicated that they think there is a bug in the SElinux code, which is > causing this issue, and have suggested that we disable SElinux and try > again. > > My clients environment is currently deployed on Centos 7. > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Matt Weil > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding > Date: Wed, Jan 4, 2017 6:27 AM > > this follows the IP what ever node the ip lands on. the ganesha.nfsd > process seems to stop working. any ideas? there is nothing > helpful in > the logs. > > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s > user 0m0.000s > sys 0m0.010s > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 4 01:29:36 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 3 Jan 2017 19:29:36 -0600 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Message-ID: On 1/3/17 4:40 PM, Laurence Horrocks-Barlow wrote: > Andrew, > > You may have been stung by: > > 2.34 What considerations are there when running on SELinux? > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux se is disabled here. Also if you strace the parent ganesha.nfsd process it dies. Is that a bug? > > I've see this issue on a customer site myself. > > > Matt, > > Could you increase the logging verbosity and check the logs further? > As per > http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm yes bumped it to the max of 3 not much help. > > -- Lauz > > On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie > wrote: > > Matt > > What Operating system are you running? > > I have an open PMR at present with something very similar > when ever we publish an NFS export via the protocol nodes the nfs > service stops, although we have no issues publishing SMB exports. > > I"m waiting on some testing by the customer but L3 support have > indicated that they think there is a bug in the SElinux code, > which is causing this issue, and have suggested that we disable > SElinux and try again. > > My clients environment is currently deployed on Centos 7. > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Matt Weil > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding > Date: Wed, Jan 4, 2017 6:27 AM > > this follows the IP what ever node the ip lands on. the > ganesha.nfsd > process seems to stop working. any ideas? there is nothing > helpful in > the logs. > > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s > user 0m0.000s > sys 0m0.010s > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Wed Jan 4 02:16:54 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Tue, 03 Jan 2017 21:16:54 -0500 Subject: [gpfsug-discuss] What is LTFS/EE now called, and what version should I be on? In-Reply-To: References: <186951.1483484208@turing-police.cc.vt.edu> Message-ID: <200291.1483496214@turing-police.cc.vt.edu> On Wed, 04 Jan 2017 00:14:21 +0100, Jan-Frode Myklebust said: > This looks like Spectrum Archive v1.2.1.0 (Build 10230). Newest version > available on fixcentral is v1.2.2.0, but it doesn't support GPFS v4.2.2.x > yet. That's what I was afraid of. OK, shelve that option, and call IBM for the efix. (The backstory: IBM announced a security issue in GPFS: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009639&myns=s033&mynp=OCSTXKQY&mynp=OCSWJ00&mync=E&cm_sp=s033-_-OCSTXKQY-OCSWJ00-_-E A security vulnerability has been identified in IBM Spectrum Scale (GPFS) that could allow a remote authenticated attacker to overflow a buffer and execute arbitrary code on the system with root privileges or cause the server to crash. This vulnerability is only applicable if: - file encryption is being used - the key management infrastructure has been compromised -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From rkomandu at in.ibm.com Wed Jan 4 07:17:25 2017 From: rkomandu at in.ibm.com (Ravi K Komanduri) Date: Wed, 4 Jan 2017 12:47:25 +0530 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu><28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu><4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov><0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov><5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu><5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu><45b19a50-bb70-1025-71ea-80a260623712@wustl.edu><0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Message-ID: My two cents, Have the SELinux enabled on my RH7.3 cluster (where CES nodes are RH 7,3). GPFS latest version(4.2.2) is on the cluster. Non SELinux env, should mount w/o issues as well Tried mounting for 50 iters as V3 for 2 different mounts from 4 client nodes. Ran successfully. My client nodes are RH/SLES clients Could you elaborate further. With Regards, Ravi K Komanduri From: Matt Weil To: Date: 01/04/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES nodes mount nfsv3 not responding Sent by: gpfsug-discuss-bounces at spectrumscale.org On 1/3/17 4:40 PM, Laurence Horrocks-Barlow wrote: Andrew, You may have been stung by: 2.34 What considerations are there when running on SELinux? https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux se is disabled here. Also if you strace the parent ganesha.nfsd process it dies. Is that a bug? I've see this issue on a customer site myself. Matt, Could you increase the logging verbosity and check the logs further? As per http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm yes bumped it to the max of 3 not much help. -- Lauz On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie wrote: Matt What Operating system are you running? I have an open PMR at present with something very similar when ever we publish an NFS export via the protocol nodes the nfs service stops, although we have no issues publishing SMB exports. I"m waiting on some testing by the customer but L3 support have indicated that they think there is a bug in the SElinux code, which is causing this issue, and have suggested that we disable SElinux and try again. My clients environment is currently deployed on Centos 7. Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Matt Weil Sent by: gpfsug-discuss-bounces at spectrumscale.org To: Cc: Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding Date: Wed, Jan 4, 2017 6:27 AM this follows the IP what ever node the ip lands on. the ganesha.nfsd process seems to stop working. any ideas? there is nothing helpful in the logs. time mount ces200:/vol/aggr14/temp403 /mnt/test mount.nfs: mount system call failed real 1m0.000s user 0m0.000s sys 0m0.010s _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 4 09:06:29 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 4 Jan 2017 09:06:29 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: , Message-ID: Simon, Is this PMR still open or was the issue resolved? I'm very interested to know as 4.2.2 is on my roadmap. Thanks Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 20 December 2016 17:14 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB issues Nope, just lots of messages with the same error, but different folders. I've opened a pmr with IBM and supplied the usual logs. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Christof Schmitt [christof.schmitt at us.ibm.com] Sent: 19 December 2016 17:31 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB issues >From this message, it does not look like a known problem. Are there other messages leading up to the one you mentioned? I would suggest reporting this through a PMR. Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/19/2016 08:37 AM Subject: [gpfsug-discuss] SMB issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Wed Jan 4 10:20:30 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 4 Jan 2017 10:20:30 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: Its still open. I can say we are happily running 4.2.2, just not the SMB packages that go with it. So the GPFS part, I wouldn't have thought would be a problem to upgrade. Simon On 04/01/2017, 09:06, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A" wrote: >Simon, > >Is this PMR still open or was the issue resolved? I'm very interested to >know as 4.2.2 is on my roadmap. > >Thanks >Richard > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon >Thompson (Research Computing - IT Services) >Sent: 20 December 2016 17:14 >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] SMB issues > > >Nope, just lots of messages with the same error, but different folders. > >I've opened a pmr with IBM and supplied the usual logs. > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org >[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Christof Schmitt >[christof.schmitt at us.ibm.com] >Sent: 19 December 2016 17:31 >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] SMB issues > >From this message, it does not look like a known problem. Are there other >messages leading up to the one you mentioned? > >I would suggest reporting this through a PMR. > >Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ >christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) > > > >From: "Simon Thompson (Research Computing - IT Services)" > >To: "gpfsug-discuss at spectrumscale.org" > >Date: 12/19/2016 08:37 AM >Subject: [gpfsug-discuss] SMB issues >Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > >Hi All, > >We upgraded to 4.2.2.0 last week as well as to >gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. > >We've since been getting random users reporting that they get access >denied errors when trying to access folders. Some seem to work fine and >others not, but it seems to vary and change by user (for example this >morning, I could see all my folders fine, but later I could only see >some). From my Mac connecting to the SMB shares, I could connect fine to >the share, but couldn't list files in the folder (I guess this is what >users were seeing from Windows as access denied). > >In the log.smbd, we are seeing errors such as this: > >[2016/12/19 15:20:40.649580, 0] >../source3/lib/sysquotas.c:457(sys_get_quota) > sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! > > > >Reverting to the previous version of SMB we were running >(gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. > >Before I log a PMR, has anyone else seen this behaviour or have any >suggestions? > >Thanks > >Simon > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From laurence at qsplace.co.uk Wed Jan 4 17:13:50 2017 From: laurence at qsplace.co.uk (laurence at qsplace.co.uk) Date: Wed, 04 Jan 2017 17:13:50 +0000 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Message-ID: Hi Matt, The only time I've seen strace "crash" ganesha is when having selinux enabled which ofc was related to selinux. Have you also changed NFS's logging level (also in the link given)? Check the current level with: mmnfs configuration list | grep LOG_LEVEL I find INFO or DEBUG enough to get just that little extra nugget of information you need, however if that's already at FULL_DEBUG and your still not finding anything helpful it might be time to log a PMR. --Lauz On 2017-01-04 01:29, Matt Weil wrote: > On 1/3/17 4:40 PM, Laurence Horrocks-Barlow wrote: > >> Andrew, >> >> You may have been stung by: >> >> 2.34 What considerations are there when running on SELinux? >> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux [1] > se is disabled here. > Also if you strace the parent ganesha.nfsd process it dies. Is that a bug? > >> I've see this issue on a customer site myself. >> >> Matt, >> >> Could you increase the logging verbosity and check the logs further? As per >> http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm [2] > yes bumped it to the max of 3 not much help. > > -- Lauz > > On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie wrote: > > Matt > > What Operating system are you running? > > I have an open PMR at present with something very similar > when ever we publish an NFS export via the protocol nodes the nfs service stops, although we have no issues publishing SMB exports. > > I"m waiting on some testing by the customer but L3 support have indicated that they think there is a bug in the SElinux code, which is causing this issue, and have suggested that we disable SElinux and try again. > > My clients environment is currently deployed on Centos 7. > > Andrew Beattie > Software Defined Storage - IT Specialist > > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > ----- Original message ----- > From: Matt Weil > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding > Date: Wed, Jan 4, 2017 6:27 AM > > this follows the IP what ever node the ip lands on. the ganesha.nfsd > process seems to stop working. any ideas? there is nothing helpful in > the logs. > > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s > user 0m0.000s > sys 0m0.010s > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss [3] -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [3] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [3] Links: ------ [1] https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux [2] http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm [3] http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Wed Jan 4 17:55:13 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Wed, 4 Jan 2017 12:55:13 -0500 Subject: [gpfsug-discuss] strange mmchnsd error? Message-ID: [root at cl001 ~]# cat chnsd_home_flh %nsd: nsd=r10f1e5 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e5 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 %nsd: nsd=r10f1e6 servers=cl006,cl007,cl008,cl001,cl002,cl003,cl004,cl005 %nsd: nsd=r10f6e6 servers=cl005,cl006,cl007,cl008,cl001,cl002,cl003,cl004 %nsd: nsd=r10f1e7 servers=cl004,cl005,cl006,cl007,cl008,cl001,cl002,cl003 %nsd: nsd=r10f6e7 servers=cl003,cl004,cl005,cl006,cl007,cl008,cl001,cl002 %nsd: nsd=r10f1e8 servers=cl002,cl003,cl004,cl005,cl006,cl007,cl008,cl001 %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 %nsd: nsd=r10f1e9 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e9 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 [root at cl001 ~]# mmchnsd -F chnsd_home_flh mmchnsd: Processing disk r10f6e5 mmchnsd: Processing disk r10f6e6 mmchnsd: Processing disk r10f6e7 mmchnsd: Processing disk r10f6e8 mmchnsd: Node cl005.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl006.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl007.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl008.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Error found while processing stanza %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 mmchnsd: Processing disk r10f1e9 mmchnsd: Processing disk r10f6e9 mmchnsd: Command failed. Examine previous error messages to determine cause. I comment out the r10f6e8 line and then it completes? I have some sort of fabric san issue: [root at cl005 ~]# for i in {1..8}; do ssh cl00$i lsscsi -s | grep 38xx | grep 1.97 | wc -l; done 80 80 80 80 68 72 70 72 but i'm suprised removing one line allows it to complete. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Jan 4 17:58:25 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 4 Jan 2017 17:58:25 +0000 Subject: [gpfsug-discuss] strange mmchnsd error? In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DCF61@CHI-EXCHANGEW1.w2k.jumptrading.com> ENODEV usually means that the disk device was not found on the server(s) in the server list. In this case c100[5-8] do not apparently have access to r10f6e8, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Wednesday, January 04, 2017 11:55 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] strange mmchnsd error? [root at cl001 ~]# cat chnsd_home_flh %nsd: nsd=r10f1e5 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e5 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 %nsd: nsd=r10f1e6 servers=cl006,cl007,cl008,cl001,cl002,cl003,cl004,cl005 %nsd: nsd=r10f6e6 servers=cl005,cl006,cl007,cl008,cl001,cl002,cl003,cl004 %nsd: nsd=r10f1e7 servers=cl004,cl005,cl006,cl007,cl008,cl001,cl002,cl003 %nsd: nsd=r10f6e7 servers=cl003,cl004,cl005,cl006,cl007,cl008,cl001,cl002 %nsd: nsd=r10f1e8 servers=cl002,cl003,cl004,cl005,cl006,cl007,cl008,cl001 %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 %nsd: nsd=r10f1e9 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e9 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 [root at cl001 ~]# mmchnsd -F chnsd_home_flh mmchnsd: Processing disk r10f6e5 mmchnsd: Processing disk r10f6e6 mmchnsd: Processing disk r10f6e7 mmchnsd: Processing disk r10f6e8 mmchnsd: Node cl005.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl006.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl007.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl008.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Error found while processing stanza %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 mmchnsd: Processing disk r10f1e9 mmchnsd: Processing disk r10f6e9 mmchnsd: Command failed. Examine previous error messages to determine cause. I comment out the r10f6e8 line and then it completes? I have some sort of fabric san issue: [root at cl005 ~]# for i in {1..8}; do ssh cl00$i lsscsi -s | grep 38xx | grep 1.97 | wc -l; done 80 80 80 80 68 72 70 72 but i'm suprised removing one line allows it to complete. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Jan 4 19:57:07 2017 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 4 Jan 2017 19:57:07 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server Message-ID: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> I?m getting stumped trying to test out TCT on a centos based 4.2.2.0 cluster and getting the following error when I?m trying to install the gpfs.tct.server rpm. rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm error: Failed dependencies: redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 I realize that Centos isn?t ?officially? supported but this is kind of lame to check for the redhat-release package instead of whatever library (ssl) or some such that is installed instead. Anyone able to do this or know a workaround? I did a quick search on the wiki and in previous posts on this list and didn?t see anything obvious. Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jan 4 20:00:50 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 4 Jan 2017 20:00:50 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server Message-ID: Just add ??nodeps? to the rpm install line, it will go just fine. Been working just fine on my CentOS system using this method. rpm -ivh --nodeps gpfs.tct.server-1.1.2_987.x86_64.rpm Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Mark.Bush at siriuscom.com" Reply-To: gpfsug main discussion list Date: Wednesday, January 4, 2017 at 1:57 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] TCT and redhat-release-server I?m getting stumped trying to test out TCT on a centos based 4.2.2.0 cluster and getting the following error when I?m trying to install the gpfs.tct.server rpm. rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm error: Failed dependencies: redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 I realize that Centos isn?t ?officially? supported but this is kind of lame to check for the redhat-release package instead of whatever library (ssl) or some such that is installed instead. Anyone able to do this or know a workaround? I did a quick search on the wiki and in previous posts on this list and didn?t see anything obvious. Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevindjo at us.ibm.com Wed Jan 4 20:04:23 2017 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Wed, 4 Jan 2017 20:04:23 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> References: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Wed Jan 4 20:10:11 2017 From: orichards at pixitmedia.com (Orlando Richards) Date: Wed, 4 Jan 2017 20:10:11 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: References: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> Message-ID: This is an RPM dependency check, rather than checking anything about the system state (such as the contents of /etc/redhat-release). In the past, I've built a dummy rpm with no contents to work around these. I don't think you can do a "--force" on a yum install - so you can't "yum install gpfs.tct.server" unless you do something like that. Would be great to get it removed from the rpm dependencies if possible. On 04/01/2017 20:04, Kevin D Johnson wrote: > I believe it's checking /etc/redhat-release --- if you create that > file with the appropriate red hat version number (like /etc/issue for > CentOS), it should work. > > ----- Original message ----- > From: "Mark.Bush at siriuscom.com" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] TCT and redhat-release-server > Date: Wed, Jan 4, 2017 2:58 PM > > I?m getting stumped trying to test out TCT on a centos based > 4.2.2.0 cluster and getting the following error when I?m trying to > install the gpfs.tct.server rpm. > > rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm > > error: Failed dependencies: > > redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 > > I realize that Centos isn?t ?officially? supported but this is > kind of lame to check for the redhat-release package instead of > whatever library (ssl) or some such that is installed instead. > > Anyone able to do this or know a workaround? I did a quick search > on the wiki and in previous posts on this list and didn?t see > anything obvious. > > Mark > > This message (including any attachments) is intended only for the > use of the individual or entity to which it is addressed and may > contain information that is non-public, proprietary, privileged, > confidential, and exempt from disclosure under applicable law. If > you are not the intended recipient, you are hereby notified that > any use, dissemination, distribution, or copying of this > communication is strictly prohibited. This message may be viewed > by parties at Sirius Computer Solutions other than those named in > the message header. This message does not contain an official > representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions > immediately and (i) destroy this message if a facsimile or (ii) > delete this message immediately if this is an electronic > communication. Thank you. > > Sirius Computer Solutions > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Orlando Richards* VP Product Development, Pixit Media 07930742808|orichards at pixitmedia.com www.pixitmedia.com |Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevindjo at us.ibm.com Wed Jan 4 20:15:19 2017 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Wed, 4 Jan 2017 20:15:19 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: References: , <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Jan 4 20:16:37 2017 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 4 Jan 2017 20:16:37 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: References: Message-ID: <3EBE8846-7757-4957-9F01-DE4CAE558106@siriuscom.com> Success! Thanks Robert. From: "Oesterlin, Robert" Reply-To: gpfsug main discussion list Date: Wednesday, January 4, 2017 at 2:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] TCT and redhat-release-server Just add ??nodeps? to the rpm install line, it will go just fine. Been working just fine on my CentOS system using this method. rpm -ivh --nodeps gpfs.tct.server-1.1.2_987.x86_64.rpm Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Mark.Bush at siriuscom.com" Reply-To: gpfsug main discussion list Date: Wednesday, January 4, 2017 at 1:57 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] TCT and redhat-release-server I?m getting stumped trying to test out TCT on a centos based 4.2.2.0 cluster and getting the following error when I?m trying to install the gpfs.tct.server rpm. rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm error: Failed dependencies: redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 I realize that Centos isn?t ?officially? supported but this is kind of lame to check for the redhat-release package instead of whatever library (ssl) or some such that is installed instead. Anyone able to do this or know a workaround? I did a quick search on the wiki and in previous posts on this list and didn?t see anything obvious. Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Jan 5 20:00:36 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 5 Jan 2017 15:00:36 -0500 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? Message-ID: I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jan 5 20:06:18 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 5 Jan 2017 20:06:18 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> There may be an issue with one of the other NSDs in the file system according to the ?mmadddisk: File system home has some disks that are in a non-ready state.? message in our output. Best to check the status of the NSDs in the file system using the `mmlsdisk home` and if any disks are not ?up? then run the `mmchdisk home start -a` command after confirming that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes tspreparedisk ?s | dshbak ?c` for that. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:01 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] nsd not adding with one quorum node down? I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Jan 5 20:13:28 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 5 Jan 2017 15:13:28 -0500 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Bryan: Have you ever attempted to do this knowing that one quorum server is down? *all* nsdservers will not see the nsd about to be added? How about temporarily removing quorum from a nsd server...? Thanks On Thu, Jan 5, 2017 at 3:06 PM, Bryan Banister wrote: > There may be an issue with one of the other NSDs in the file system > according to the ?mmadddisk: File system home has some disks that are in > a non-ready state.? message in our output. Best to check the status of > the NSDs in the file system using the `mmlsdisk home` and if any disks are > not ?up? then run the `mmchdisk home start -a` command after confirming > that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes > tspreparedisk ?s | dshbak ?c` for that. > > > > Hope that helps, > > -Bryan > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss- > bounces at spectrumscale.org] *On Behalf Of *J. Eric Wonderley > *Sent:* Thursday, January 05, 2017 2:01 PM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] nsd not adding with one quorum node down? > > > > I have one quorum node down and attempting to add a nsd to a fs: > [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee > /root/adddisk_flh_home.out > Verifying file system configuration information ... > > The following disks of home will be formatted on node cl003: > r10f1e5: size 1879610 MB > Extending Allocation Map > Checking Allocation Map for storage pool fc_ssd400G > 55 % complete on Thu Jan 5 14:43:31 2017 > Lost connection to file system daemon. > mmadddisk: tsadddisk failed. > Verifying file system configuration information ... > mmadddisk: File system home has some disks that are in a non-ready state. > mmadddisk: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > mmadddisk: Command failed. Examine previous error messages to determine > cause. > > Had to use -v no (this failed once before). Anyhow I next see: > [root at cl002 ~]# mmgetstate -aL > > Node number Node name Quorum Nodes up Total nodes GPFS state > Remarks > ------------------------------------------------------------ > ------------------------ > 1 cl001 0 0 8 down > quorum node > 2 cl002 5 6 8 active > quorum node > 3 cl003 5 0 8 arbitrating > quorum node > 4 cl004 5 6 8 active > quorum node > 5 cl005 5 6 8 active > quorum node > 6 cl006 5 6 8 active > quorum node > 7 cl007 5 6 8 active > quorum node > 8 cl008 5 6 8 active > quorum node > [root at cl002 ~]# mmlsdisk home > disk driver sector failure holds > holds storage > name type size group metadata data status > availability pool > ------------ -------- ------ ----------- -------- ----- ------------- > ------------ ------------ > r10f1e5 nsd 512 1001 No Yes allocmap add > up fc_ssd400G > r6d2e8 nsd 512 1001 No Yes ready > up fc_8T > r6d3e8 nsd 512 1001 No Yes ready > up fc_8T > > Do all quorum node have to be up and participating to do these admin type > operations? > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jan 5 20:27:24 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 5 Jan 2017 20:27:24 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> Removing the quorum designation is an option. However I believe the file system manager must be assigned to the file system in order for the mmadddisk to work. If the file system manager is not assigned (mmlsmgr to check) or continuously is reassigned to nodes but that fails (check /var/adm/ras/mmfs.log.latest on all nodes) or is blocked from being assigned due to the apparent node recovery in the cluster indicated by the one node in the ?arbitrating? state, then the mmadddisk will not succeed. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] nsd not adding with one quorum node down? Bryan: Have you ever attempted to do this knowing that one quorum server is down? *all* nsdservers will not see the nsd about to be added? How about temporarily removing quorum from a nsd server...? Thanks On Thu, Jan 5, 2017 at 3:06 PM, Bryan Banister > wrote: There may be an issue with one of the other NSDs in the file system according to the ?mmadddisk: File system home has some disks that are in a non-ready state.? message in our output. Best to check the status of the NSDs in the file system using the `mmlsdisk home` and if any disks are not ?up? then run the `mmchdisk home start -a` command after confirming that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes tspreparedisk ?s | dshbak ?c` for that. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:01 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] nsd not adding with one quorum node down? I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jan 5 20:44:33 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 5 Jan 2017 20:44:33 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> Looking at this further, the output says the ?The following disks of home will be formatted on node cl003:? however that node is the node in ?arbitrating? state, so I don?t see how that would work, -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister Sent: Thursday, January 05, 2017 2:27 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] nsd not adding with one quorum node down? Removing the quorum designation is an option. However I believe the file system manager must be assigned to the file system in order for the mmadddisk to work. If the file system manager is not assigned (mmlsmgr to check) or continuously is reassigned to nodes but that fails (check /var/adm/ras/mmfs.log.latest on all nodes) or is blocked from being assigned due to the apparent node recovery in the cluster indicated by the one node in the ?arbitrating? state, then the mmadddisk will not succeed. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:13 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] nsd not adding with one quorum node down? Bryan: Have you ever attempted to do this knowing that one quorum server is down? *all* nsdservers will not see the nsd about to be added? How about temporarily removing quorum from a nsd server...? Thanks On Thu, Jan 5, 2017 at 3:06 PM, Bryan Banister > wrote: There may be an issue with one of the other NSDs in the file system according to the ?mmadddisk: File system home has some disks that are in a non-ready state.? message in our output. Best to check the status of the NSDs in the file system using the `mmlsdisk home` and if any disks are not ?up? then run the `mmchdisk home start -a` command after confirming that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes tspreparedisk ?s | dshbak ?c` for that. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:01 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] nsd not adding with one quorum node down? I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Thu Jan 5 21:38:39 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Thu, 05 Jan 2017 16:38:39 -0500 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <28063.1483652319@turing-police.cc.vt.edu> On Thu, 05 Jan 2017 20:44:33 +0000, Bryan Banister said: > Looking at this further, the output says the ???The following disks of home > will be formatted on node cl003:??? however that node is the node in > ???arbitrating??? state, so I don???t see how that would work, The bigger question: If it was in "arbitrating", why was it selected as the node to do the formatting? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Thu Jan 5 21:53:17 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 05 Jan 2017 16:53:17 -0500 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? Message-ID: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. % getfacl -R /path/to/source > /root/perms.ac % setfacl --restore=/root/perms.acl * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. Any ideas? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 5 22:01:18 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 5 Jan 2017 22:01:18 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Message-ID: Hi Jaime, IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? Kevin > On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: > > Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? > > * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. > > * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. > % getfacl -R /path/to/source > /root/perms.ac > % setfacl --restore=/root/perms.acl > > * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. > > Any ideas? > > Thanks > Jaime > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From laurence at qsplace.co.uk Thu Jan 5 22:03:53 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Thu, 5 Jan 2017 22:03:53 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Message-ID: <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> Are you talking about the GPFSUG github? https://github.com/gpfsug/gpfsug-tools The patched rsync there I believe was done by Orlando. -- Lauz On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: > Hi Jaime, > > IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? > > Kevin > >> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >> >> Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? >> >> * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. >> >> * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. >> % getfacl -R /path/to/source > /root/perms.ac >> % setfacl --restore=/root/perms.acl >> >> * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. >> >> Any ideas? >> >> Thanks >> Jaime >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From robbyb at us.ibm.com Thu Jan 5 22:18:08 2017 From: robbyb at us.ibm.com (Rob Basham) Date: Thu, 5 Jan 2017 22:18:08 +0000 Subject: [gpfsug-discuss] TCT and CentOS Message-ID: An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Thu Jan 5 22:42:28 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Thu, 05 Jan 2017 17:42:28 -0500 Subject: [gpfsug-discuss] TCT and CentOS In-Reply-To: References: Message-ID: <32702.1483656148@turing-police.cc.vt.edu> On Thu, 05 Jan 2017 22:18:08 +0000, "Rob Basham" said: > By way of introduction, I am TCT architect across all of IBM's storage > products, including Spectrum Scale. There have been queries as to whether or > not CentOS is supported with TCT Server on Spectrum Scale. It is not currently > supported and should not be used as a TCT Server. Is that a "we haven't qualified it and you're on your own" not supported, or "there be known dragons" not supported? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From gmcpheeters at anl.gov Thu Jan 5 23:34:04 2017 From: gmcpheeters at anl.gov (McPheeters, Gordon) Date: Thu, 5 Jan 2017 23:34:04 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <28063.1483652319@turing-police.cc.vt.edu> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> <28063.1483652319@turing-police.cc.vt.edu> Message-ID: You might want to check the gpfs logs on the node cl003. Often the message "Lost connection to file system daemon.? means that the daemon asserted while it was doing something... hence the lost connection. If you are checking the state and seeing it in arbitrating mode immed after the command fails that also makes sense as it?s now re-joining the cluster. If you aren?t watching carefully you can miss these events due to way mmfsd will resume the old mounts, hence you check the node with ?df? and see the file system is still mounted, then assume all is well, but in fact mmfsd has died and restarted. Gordon McPheeters ALCF Storage (630) 252-6430 gmcpheeters at anl.gov On Jan 5, 2017, at 3:38 PM, Valdis.Kletnieks at vt.edu wrote: On Thu, 05 Jan 2017 20:44:33 +0000, Bryan Banister said: Looking at this further, the output says the ?The following disks of home will be formatted on node cl003:? however that node is the node in ?arbitrating? state, so I don?t see how that would work, The bigger question: If it was in "arbitrating", why was it selected as the node to do the formatting? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbyb at us.ibm.com Fri Jan 6 00:28:47 2017 From: robbyb at us.ibm.com (Rob Basham) Date: Fri, 6 Jan 2017 00:28:47 +0000 Subject: [gpfsug-discuss] TCT and CentOS Message-ID: An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri Jan 6 02:16:04 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 05 Jan 2017 21:16:04 -0500 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> Message-ID: <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Great guys!!! Just what I was looking for. Everyone is always so helpful on this forum. Thanks a lot. Jaime Quoting "Laurence Horrocks-Barlow" : > Are you talking about the GPFSUG github? > > https://github.com/gpfsug/gpfsug-tools > > The patched rsync there I believe was done by Orlando. > > -- Lauz > > > On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >> Hi Jaime, >> >> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >> used it and it works great ? can?t remember where we downloaded it >> from, though. Maybe someone else on the list who *isn?t* having a >> senior moment can point you to it? >> >> Kevin >> >>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >>> >>> Does anyone know of a functional standard alone tool to >>> systematically and recursively find and replicate ACLs that works >>> well with GPFS? >>> >>> * We're currently using rsync, which will replicate permissions >>> fine, however it leaves the ACL's behind. The --perms option for >>> rsync is blind to ACLs. >>> >>> * The native linux trick below works well with ext4 after an >>> rsync, but makes a mess on GPFS. >>> % getfacl -R /path/to/source > /root/perms.ac >>> % setfacl --restore=/root/perms.acl >>> >>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>> recursive option. >>> >>> Any ideas? >>> >>> Thanks >>> Jaime >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University >>> of Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From S.J.Thompson at bham.ac.uk Fri Jan 6 07:17:46 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 6 Jan 2017 07:17:46 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk>, <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Message-ID: Just a cautionary note, it doesn't work with symlinks as it fails to get the acl and so doesn't copy the symlink. So you may want to run a traditional rsync after just to get all your symlinks on place. (having been using this over the Christmas period to merge some filesets with acls...) Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jaime Pinto [pinto at scinet.utoronto.ca] Sent: 06 January 2017 02:16 To: gpfsug main discussion list; Laurence Horrocks-Barlow Cc: support at scinet.utoronto.ca Subject: Re: [gpfsug-discuss] replicating ACLs across GPFS's? Great guys!!! Just what I was looking for. Everyone is always so helpful on this forum. Thanks a lot. Jaime Quoting "Laurence Horrocks-Barlow" : > Are you talking about the GPFSUG github? > > https://github.com/gpfsug/gpfsug-tools > > The patched rsync there I believe was done by Orlando. > > -- Lauz > > > On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >> Hi Jaime, >> >> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >> used it and it works great ? can?t remember where we downloaded it >> from, though. Maybe someone else on the list who *isn?t* having a >> senior moment can point you to it? >> >> Kevin >> >>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >>> >>> Does anyone know of a functional standard alone tool to >>> systematically and recursively find and replicate ACLs that works >>> well with GPFS? >>> >>> * We're currently using rsync, which will replicate permissions >>> fine, however it leaves the ACL's behind. The --perms option for >>> rsync is blind to ACLs. >>> >>> * The native linux trick below works well with ext4 after an >>> rsync, but makes a mess on GPFS. >>> % getfacl -R /path/to/source > /root/perms.ac >>> % setfacl --restore=/root/perms.acl >>> >>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>> recursive option. >>> >>> Any ideas? >>> >>> Thanks >>> Jaime >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University >>> of Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jtucker at pixitmedia.com Fri Jan 6 08:29:53 2017 From: jtucker at pixitmedia.com (Jez Tucker) Date: Fri, 6 Jan 2017 08:29:53 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Message-ID: <4a934973-691c-977a-1d19-81102ecb3d37@pixitmedia.com> Hi, Here: https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync For those of you with Pixit Media / ArcaStream support, just install our maintained ap-rsync which has this patch and additional fixes for other 'fun' things that break between GPFS and rsync. If anyone wants to contribute to the git repo wave your arms. Jez On 05/01/17 22:01, Buterbaugh, Kevin L wrote: > Hi Jaime, > > IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? > > Kevin > >> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >> >> Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? >> >> * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. >> >> * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. >> % getfacl -R /path/to/source > /root/perms.ac >> % setfacl --restore=/root/perms.acl >> >> * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. >> >> Any ideas? >> >> Thanks >> Jaime >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Fri Jan 6 08:31:16 2017 From: jtucker at pixitmedia.com (Jez Tucker) Date: Fri, 6 Jan 2017 08:31:16 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Message-ID: <6928a73b-a8fa-4255-813a-0ddd6c9579f7@pixitmedia.com> Some of the 'fun things' being such as that very issue. On 06/01/17 07:17, Simon Thompson (Research Computing - IT Services) wrote: > Just a cautionary note, it doesn't work with symlinks as it fails to get the acl and so doesn't copy the symlink. > > So you may want to run a traditional rsync after just to get all your symlinks on place. (having been using this over the Christmas period to merge some filesets with acls...) > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jaime Pinto [pinto at scinet.utoronto.ca] > Sent: 06 January 2017 02:16 > To: gpfsug main discussion list; Laurence Horrocks-Barlow > Cc: support at scinet.utoronto.ca > Subject: Re: [gpfsug-discuss] replicating ACLs across GPFS's? > > Great guys!!! > Just what I was looking for. > Everyone is always so helpful on this forum. > Thanks a lot. > Jaime > > Quoting "Laurence Horrocks-Barlow" : > >> Are you talking about the GPFSUG github? >> >> https://github.com/gpfsug/gpfsug-tools >> >> The patched rsync there I believe was done by Orlando. >> >> -- Lauz >> >> >> On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >>> Hi Jaime, >>> >>> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >>> used it and it works great ? can?t remember where we downloaded it >>> from, though. Maybe someone else on the list who *isn?t* having a >>> senior moment can point you to it? >>> >>> Kevin >>> >>>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >>>> >>>> Does anyone know of a functional standard alone tool to >>>> systematically and recursively find and replicate ACLs that works >>>> well with GPFS? >>>> >>>> * We're currently using rsync, which will replicate permissions >>>> fine, however it leaves the ACL's behind. The --perms option for >>>> rsync is blind to ACLs. >>>> >>>> * The native linux trick below works well with ext4 after an >>>> rsync, but makes a mess on GPFS. >>>> % getfacl -R /path/to/source > /root/perms.ac >>>> % setfacl --restore=/root/perms.acl >>>> >>>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>>> recursive option. >>>> >>>> Any ideas? >>>> >>>> Thanks >>>> Jaime >>>> >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University >>>> of Toronto. >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Fri Jan 6 08:50:43 2017 From: orichards at pixitmedia.com (Orlando Richards) Date: Fri, 6 Jan 2017 08:50:43 +0000 Subject: [gpfsug-discuss] (Re)Introduction Message-ID: Hi folks, Since I've re-joined this list with my new identity, I thought I'd ping over a brief re-intro email. Some of you will know me from my past life working for the University of Edinburgh, but in November last year I joined the team at Pixit Media / ArcaStream. For those I've not met before - I've been working with GPFS since 2007 in a University environment, initially as an HPC storage engine but quickly realised the benefits that GPFS could offer as a general file/NAS storage platform as well, and developed its use in the University of Edinburgh (and for the national UKRDF service) in that vein. These days I'm spending a lot of my time looking at the deployment, operations and support processes around GPFS - which means I get to play with all sorts of hip and trendy buzzwords :) -- *Orlando Richards* VP Product Development, Pixit Media 07930742808|orichards at pixitmedia.com www.pixitmedia.com |Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Fri Jan 6 08:51:19 2017 From: orichards at pixitmedia.com (Orlando Richards) Date: Fri, 6 Jan 2017 08:51:19 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Message-ID: Glad to see it's still doing good work out there! :) On 06/01/2017 02:16, Jaime Pinto wrote: > Great guys!!! > Just what I was looking for. > Everyone is always so helpful on this forum. > Thanks a lot. > Jaime > > Quoting "Laurence Horrocks-Barlow" : > >> Are you talking about the GPFSUG github? >> >> https://github.com/gpfsug/gpfsug-tools >> >> The patched rsync there I believe was done by Orlando. >> >> -- Lauz >> >> >> On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >>> Hi Jaime, >>> >>> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >>> used it and it works great ? can?t remember where we downloaded it >>> from, though. Maybe someone else on the list who *isn?t* having a >>> senior moment can point you to it? >>> >>> Kevin >>> >>>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto >>>> wrote: >>>> >>>> Does anyone know of a functional standard alone tool to >>>> systematically and recursively find and replicate ACLs that works >>>> well with GPFS? >>>> >>>> * We're currently using rsync, which will replicate permissions >>>> fine, however it leaves the ACL's behind. The --perms option for >>>> rsync is blind to ACLs. >>>> >>>> * The native linux trick below works well with ext4 after an >>>> rsync, but makes a mess on GPFS. >>>> % getfacl -R /path/to/source > /root/perms.ac >>>> % setfacl --restore=/root/perms.acl >>>> >>>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>>> recursive option. >>>> >>>> Any ideas? >>>> >>>> Thanks >>>> Jaime >>>> >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University >>>> of Toronto. >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Orlando Richards* VP Product Development, Pixit Media 07930742808|orichards at pixitmedia.com www.pixitmedia.com |Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From erich at uw.edu Fri Jan 6 19:07:22 2017 From: erich at uw.edu (Eric Horst) Date: Fri, 6 Jan 2017 11:07:22 -0800 Subject: [gpfsug-discuss] undo fileset inode allocation Message-ID: Greetings all, I've been setting up and migrating to a new 225TB filesystem on 4.2.1. Separate data and metadata disks. There are about 20 independent filesets as second level directories which have all the files. One of the independent filesets hit its inode limit of 28M. Without carefully checking my work I accidentally changed the limit to 3.2B inodes instead of 32M inodes. This ran for 15 minutes and when it was done I see mmdf shows that I had 0% metadata space free. There was previously 72% free. Thinking about it I reasoned that as independent filesets I might get that metadata space back if I unlinked and deleted that fileset. After doing so I find I have metadata 11% free. A far cry from the 72% I used to have. Are there other options for undoing this mistake? Or should I not worry that I'm at 11% and assume that whatever was preallocated will be productively used over the life of this filesystem? Thanks, -Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Jan 6 20:08:17 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 6 Jan 2017 20:08:17 +0000 Subject: [gpfsug-discuss] undo fileset inode allocation In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064E1624@CHI-EXCHANGEW1.w2k.jumptrading.com> Honestly this sounds like you may be in a very dangerous situation and would HIGHLY recommend opening a PMR immediately to get direct, authoritative instruction from IBM, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Eric Horst Sent: Friday, January 06, 2017 1:07 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] undo fileset inode allocation Greetings all, I've been setting up and migrating to a new 225TB filesystem on 4.2.1. Separate data and metadata disks. There are about 20 independent filesets as second level directories which have all the files. One of the independent filesets hit its inode limit of 28M. Without carefully checking my work I accidentally changed the limit to 3.2B inodes instead of 32M inodes. This ran for 15 minutes and when it was done I see mmdf shows that I had 0% metadata space free. There was previously 72% free. Thinking about it I reasoned that as independent filesets I might get that metadata space back if I unlinked and deleted that fileset. After doing so I find I have metadata 11% free. A far cry from the 72% I used to have. Are there other options for undoing this mistake? Or should I not worry that I'm at 11% and assume that whatever was preallocated will be productively used over the life of this filesystem? Thanks, -Eric ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Tomlinson at awe.co.uk Mon Jan 9 15:09:43 2017 From: Paul.Tomlinson at awe.co.uk (Paul.Tomlinson at awe.co.uk) Date: Mon, 9 Jan 2017 15:09:43 +0000 Subject: [gpfsug-discuss] AFM Migration Issue Message-ID: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR From janfrode at tanso.net Mon Jan 9 15:29:45 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 09 Jan 2017 15:29:45 +0000 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: <201701091501.v09F1i5A015912@msw1.awe.co.uk> References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: Untested, and I have no idea if it will work on the number of files and directories you have, but maybe you can fix it by rsyncing just the directories? rsync -av --dry-run --include='*/' --exclude='*' source/ destination/ -jf man. 9. jan. 2017 kl. 16.09 skrev : > Hi All, > > We have just completed the first data move from our old cluster to the new > one using AFM Local Update as per the guide, however we have noticed that > all date stamps on the directories have the date they were created on(e.g. > 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), > whereas all the files have the correct dates. > > Has anyone else seen this issue as we now have to convert all the > directory dates to their original dates ! > > > > > The information in this email and in any attachment(s) is > commercial in confidence. If you are not the named addressee(s) > or > if you receive this email in error then any distribution, copying or > use of this communication or the information in it is strictly > prohibited. Please notify us immediately by email at > admin.internet(at)awe.co.uk, and then delete this message from > your computer. While attachments are virus checked, AWE plc > does not accept any liability in respect of any virus which is not > detected. > > AWE Plc > Registered in England and Wales > Registration No 02763902 > AWE, Aldermaston, Reading, RG7 4PR > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jan 9 15:48:43 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 9 Jan 2017 15:48:43 +0000 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: <201701091501.v09F1i5A015912@msw1.awe.co.uk> References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on..... (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Tomlinson at awe.co.uk Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Tomlinson at awe.co.uk Mon Jan 9 16:00:04 2017 From: Paul.Tomlinson at awe.co.uk (Paul.Tomlinson at awe.co.uk) Date: Mon, 9 Jan 2017 16:00:04 +0000 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: <201701091552.v09Fq4kj012315@msw1.awe.co.uk> Hi, We have already come across the issues you have seen below, and worked around them. If you run the pre-fetch with just the --meta-data-only, then all the date stamps are correct for the dirs., as soon as you run --list-only all the directory times change to now. We have tried rsync but this did not appear to work. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs Sent: 09 January 2017 15:49 To: gpfsug-discuss at spectrumscale.org Subject: EXTERNAL: Re: [gpfsug-discuss] AFM Migration Issue Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on..... (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Tomlinson at awe.co.uk Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR From YARD at il.ibm.com Mon Jan 9 19:12:08 2017 From: YARD at il.ibm.com (Yaron Daniel) Date: Mon, 9 Jan 2017 21:12:08 +0200 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: Hi Do u have nfsv4 acl's ? Try to ask from IBM support to get Sonas rsync in order to migrate the data. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Jan-Frode Myklebust To: gpfsug main discussion list Date: 01/09/2017 05:30 PM Subject: Re: [gpfsug-discuss] AFM Migration Issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Untested, and I have no idea if it will work on the number of files and directories you have, but maybe you can fix it by rsyncing just the directories? rsync -av --dry-run --include='*/' --exclude='*' source/ destination/ -jf man. 9. jan. 2017 kl. 16.09 skrev : Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From mimarsh2 at vt.edu Mon Jan 9 20:16:55 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 9 Jan 2017 15:16:55 -0500 Subject: [gpfsug-discuss] replication and no failure groups Message-ID: All, If I have a filesystem with replication set to 2 and 1 failure group: 1) I assume replication won't actually happen, correct? 2) Will this impact performance i.e cut write performance in half even though it really only keeps 1 copy? End goal - I would like a single storage pool within the filesystem to be replicated without affecting the performance of all other pools(which only have a single failure group) Thanks, Brian Marshall VT - ARC -------------- next part -------------- An HTML attachment was scrubbed... URL: From YARD at il.ibm.com Mon Jan 9 20:34:29 2017 From: YARD at il.ibm.com (Yaron Daniel) Date: Mon, 9 Jan 2017 22:34:29 +0200 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi 1) Yes in case u have only 1 Failure group - replication will not work. 2) Do you have 2 Storage Systems ? When using GPFS replication write stay the same - but read can be double - since it read from 2 Storage systems Hope this help - what do you try to achive , can you share your env setup ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Brian Marshall To: gpfsug main discussion list Date: 01/09/2017 10:17 PM Subject: [gpfsug-discuss] replication and no failure groups Sent by: gpfsug-discuss-bounces at spectrumscale.org All, If I have a filesystem with replication set to 2 and 1 failure group: 1) I assume replication won't actually happen, correct? 2) Will this impact performance i.e cut write performance in half even though it really only keeps 1 copy? End goal - I would like a single storage pool within the filesystem to be replicated without affecting the performance of all other pools(which only have a single failure group) Thanks, Brian Marshall VT - ARC_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From eric.wonderley at vt.edu Mon Jan 9 20:47:12 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 9 Jan 2017 15:47:12 -0500 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi Yaron: This is the filesystem: [root at cl005 net]# mmlsdisk work disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ nsd_a_7 nsd 512 -1 No Yes ready up system nsd_b_7 nsd 512 -1 No Yes ready up system nsd_c_7 nsd 512 -1 No Yes ready up system nsd_d_7 nsd 512 -1 No Yes ready up system nsd_a_8 nsd 512 -1 No Yes ready up system nsd_b_8 nsd 512 -1 No Yes ready up system nsd_c_8 nsd 512 -1 No Yes ready up system nsd_d_8 nsd 512 -1 No Yes ready up system nsd_a_9 nsd 512 -1 No Yes ready up system nsd_b_9 nsd 512 -1 No Yes ready up system nsd_c_9 nsd 512 -1 No Yes ready up system nsd_d_9 nsd 512 -1 No Yes ready up system nsd_a_10 nsd 512 -1 No Yes ready up system nsd_b_10 nsd 512 -1 No Yes ready up system nsd_c_10 nsd 512 -1 No Yes ready up system nsd_d_10 nsd 512 -1 No Yes ready up system nsd_a_11 nsd 512 -1 No Yes ready up system nsd_b_11 nsd 512 -1 No Yes ready up system nsd_c_11 nsd 512 -1 No Yes ready up system nsd_d_11 nsd 512 -1 No Yes ready up system nsd_a_12 nsd 512 -1 No Yes ready up system nsd_b_12 nsd 512 -1 No Yes ready up system nsd_c_12 nsd 512 -1 No Yes ready up system nsd_d_12 nsd 512 -1 No Yes ready up system work_md_pf1_1 nsd 512 200 Yes No ready up system jbf1z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf1z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z4 nsd 4096 2034 No Yes ready up sas_ssd4T work_md_pf1_2 nsd 512 200 Yes No ready up system work_md_pf1_3 nsd 512 200 Yes No ready up system work_md_pf1_4 nsd 512 200 Yes No ready up system work_md_pf2_5 nsd 512 199 Yes No ready up system work_md_pf2_6 nsd 512 199 Yes No ready up system work_md_pf2_7 nsd 512 199 Yes No ready up system work_md_pf2_8 nsd 512 199 Yes No ready up system [root at cl005 net]# mmlsfs work -R -r -M -m -K flag value description ------------------- ------------------------ ----------------------------------- -R 2 Maximum number of data replicas -r 2 Default number of data replicas -M 2 Maximum number of metadata replicas -m 2 Default number of metadata replicas -K whenpossible Strict replica allocation option On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel wrote: > Hi > > 1) Yes in case u have only 1 Failure group - replication will not work. > > 2) Do you have 2 Storage Systems ? When using GPFS replication write stay > the same - but read can be double - since it read from 2 Storage systems > > Hope this help - what do you try to achive , can you share your env setup ? > > > > Regards > > > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Server, **Storage and Data Services* > *- > Team Leader* Petach Tiqva, 49527 > *Global Technology Services* Israel > Phone: +972-3-916-5672 <+972%203-916-5672> > Fax: +972-3-916-5672 <+972%203-916-5672> > Mobile: +972-52-8395593 <+972%2052-839-5593> > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/09/2017 10:17 PM > Subject: [gpfsug-discuss] replication and no failure groups > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > All, > > If I have a filesystem with replication set to 2 and 1 failure group: > > 1) I assume replication won't actually happen, correct? > > 2) Will this impact performance i.e cut write performance in half even > though it really only keeps 1 copy? > > End goal - I would like a single storage pool within the filesystem to be > replicated without affecting the performance of all other pools(which only > have a single failure group) > > Thanks, > Brian Marshall > VT - ARC_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From YARD at il.ibm.com Mon Jan 9 20:53:38 2017 From: YARD at il.ibm.com (Yaron Daniel) Date: Mon, 9 Jan 2017 20:53:38 +0000 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi So - do u able to have GPFS replication for the MD Failure Groups ? I can see that u have 3 Failure Groups for Data -1, 2012,2034 , how many Storage Subsystems you have ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "J. Eric Wonderley" To: gpfsug main discussion list Date: 01/09/2017 10:48 PM Subject: Re: [gpfsug-discuss] replication and no failure groups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Yaron: This is the filesystem: [root at cl005 net]# mmlsdisk work disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ nsd_a_7 nsd 512 -1 No Yes ready up system nsd_b_7 nsd 512 -1 No Yes ready up system nsd_c_7 nsd 512 -1 No Yes ready up system nsd_d_7 nsd 512 -1 No Yes ready up system nsd_a_8 nsd 512 -1 No Yes ready up system nsd_b_8 nsd 512 -1 No Yes ready up system nsd_c_8 nsd 512 -1 No Yes ready up system nsd_d_8 nsd 512 -1 No Yes ready up system nsd_a_9 nsd 512 -1 No Yes ready up system nsd_b_9 nsd 512 -1 No Yes ready up system nsd_c_9 nsd 512 -1 No Yes ready up system nsd_d_9 nsd 512 -1 No Yes ready up system nsd_a_10 nsd 512 -1 No Yes ready up system nsd_b_10 nsd 512 -1 No Yes ready up system nsd_c_10 nsd 512 -1 No Yes ready up system nsd_d_10 nsd 512 -1 No Yes ready up system nsd_a_11 nsd 512 -1 No Yes ready up system nsd_b_11 nsd 512 -1 No Yes ready up system nsd_c_11 nsd 512 -1 No Yes ready up system nsd_d_11 nsd 512 -1 No Yes ready up system nsd_a_12 nsd 512 -1 No Yes ready up system nsd_b_12 nsd 512 -1 No Yes ready up system nsd_c_12 nsd 512 -1 No Yes ready up system nsd_d_12 nsd 512 -1 No Yes ready up system work_md_pf1_1 nsd 512 200 Yes No ready up system jbf1z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf1z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z4 nsd 4096 2034 No Yes ready up sas_ssd4T work_md_pf1_2 nsd 512 200 Yes No ready up system work_md_pf1_3 nsd 512 200 Yes No ready up system work_md_pf1_4 nsd 512 200 Yes No ready up system work_md_pf2_5 nsd 512 199 Yes No ready up system work_md_pf2_6 nsd 512 199 Yes No ready up system work_md_pf2_7 nsd 512 199 Yes No ready up system work_md_pf2_8 nsd 512 199 Yes No ready up system [root at cl005 net]# mmlsfs work -R -r -M -m -K flag value description ------------------- ------------------------ ----------------------------------- -R 2 Maximum number of data replicas -r 2 Default number of data replicas -M 2 Maximum number of metadata replicas -m 2 Default number of metadata replicas -K whenpossible Strict replica allocation option On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel wrote: Hi 1) Yes in case u have only 1 Failure group - replication will not work. 2) Do you have 2 Storage Systems ? When using GPFS replication write stay the same - but read can be double - since it read from 2 Storage systems Hope this help - what do you try to achive , can you share your env setup ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services- Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Brian Marshall To: gpfsug main discussion list Date: 01/09/2017 10:17 PM Subject: [gpfsug-discuss] replication and no failure groups Sent by: gpfsug-discuss-bounces at spectrumscale.org All, If I have a filesystem with replication set to 2 and 1 failure group: 1) I assume replication won't actually happen, correct? 2) Will this impact performance i.e cut write performance in half even though it really only keeps 1 copy? End goal - I would like a single storage pool within the filesystem to be replicated without affecting the performance of all other pools(which only have a single failure group) Thanks, Brian Marshall VT - ARC_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From eric.wonderley at vt.edu Mon Jan 9 21:01:14 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 9 Jan 2017 16:01:14 -0500 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi Yuran: We have 5...4x md3860fs and 1x if150. the if150 requires data replicas=2 to get the ha and protection they recommend. we have it presented in a fileset that appears in a users work area. On Mon, Jan 9, 2017 at 3:53 PM, Yaron Daniel wrote: > Hi > > So - do u able to have GPFS replication for the MD Failure Groups ? > > I can see that u have 3 Failure Groups for Data -1, 2012,2034 , how many > Storage Subsystems you have ? > > > > > Regards > > > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Server, **Storage and Data Services* > *- > Team Leader* Petach Tiqva, 49527 > *Global Technology Services* Israel > Phone: +972-3-916-5672 <+972%203-916-5672> > Fax: +972-3-916-5672 <+972%203-916-5672> > Mobile: +972-52-8395593 <+972%2052-839-5593> > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > > > > From: "J. Eric Wonderley" > To: gpfsug main discussion list > Date: 01/09/2017 10:48 PM > Subject: Re: [gpfsug-discuss] replication and no failure groups > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Yaron: > > This is the filesystem: > > [root at cl005 net]# mmlsdisk work > disk driver sector failure holds > holds storage > name type size group metadata data status > availability pool > ------------ -------- ------ ----------- -------- ----- ------------- > ------------ ------------ > nsd_a_7 nsd 512 -1 No Yes ready > up system > nsd_b_7 nsd 512 -1 No Yes ready > up system > nsd_c_7 nsd 512 -1 No Yes ready > up system > nsd_d_7 nsd 512 -1 No Yes ready > up system > nsd_a_8 nsd 512 -1 No Yes ready > up system > nsd_b_8 nsd 512 -1 No Yes ready > up system > nsd_c_8 nsd 512 -1 No Yes ready > up system > nsd_d_8 nsd 512 -1 No Yes ready > up system > nsd_a_9 nsd 512 -1 No Yes ready > up system > nsd_b_9 nsd 512 -1 No Yes ready > up system > nsd_c_9 nsd 512 -1 No Yes ready > up system > nsd_d_9 nsd 512 -1 No Yes ready > up system > nsd_a_10 nsd 512 -1 No Yes ready > up system > nsd_b_10 nsd 512 -1 No Yes ready > up system > nsd_c_10 nsd 512 -1 No Yes ready > up system > nsd_d_10 nsd 512 -1 No Yes ready > up system > nsd_a_11 nsd 512 -1 No Yes ready > up system > nsd_b_11 nsd 512 -1 No Yes ready > up system > nsd_c_11 nsd 512 -1 No Yes ready > up system > nsd_d_11 nsd 512 -1 No Yes ready > up system > nsd_a_12 nsd 512 -1 No Yes ready > up system > nsd_b_12 nsd 512 -1 No Yes ready > up system > nsd_c_12 nsd 512 -1 No Yes ready > up system > nsd_d_12 nsd 512 -1 No Yes ready > up system > work_md_pf1_1 nsd 512 200 Yes No ready > up system > jbf1z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf2z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf3z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf4z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf5z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf6z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf7z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf8z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf1z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf2z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf3z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf4z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf5z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf6z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf7z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf8z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf1z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf2z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf3z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf4z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf5z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf6z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf7z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf8z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf1z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf2z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf3z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf4z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf5z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf6z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf7z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf8z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > work_md_pf1_2 nsd 512 200 Yes No ready > up system > work_md_pf1_3 nsd 512 200 Yes No ready > up system > work_md_pf1_4 nsd 512 200 Yes No ready > up system > work_md_pf2_5 nsd 512 199 Yes No ready > up system > work_md_pf2_6 nsd 512 199 Yes No ready > up system > work_md_pf2_7 nsd 512 199 Yes No ready > up system > work_md_pf2_8 nsd 512 199 Yes No ready > up system > [root at cl005 net]# mmlsfs work -R -r -M -m -K > flag value description > ------------------- ------------------------ ------------------------------ > ----- > -R 2 Maximum number of data > replicas > -r 2 Default number of data > replicas > -M 2 Maximum number of metadata > replicas > -m 2 Default number of metadata > replicas > -K whenpossible Strict replica allocation > option > > > On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel <*YARD at il.ibm.com* > > wrote: > Hi > > 1) Yes in case u have only 1 Failure group - replication will not work. > > 2) Do you have 2 Storage Systems ? When using GPFS replication write stay > the same - but read can be double - since it read from 2 Storage systems > > Hope this help - what do you try to achive , can you share your env setup ? > > > Regards > > > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Server, **Storage and Data Services* > *- > Team Leader* Petach Tiqva, 49527 > *Global Technology Services* Israel > Phone: *+972-3-916-5672* <+972%203-916-5672> > Fax: *+972-3-916-5672* <+972%203-916-5672> > Mobile: *+972-52-8395593* <+972%2052-839-5593> > e-mail: *yard at il.ibm.com* > *IBM Israel* > > > > > > > > From: Brian Marshall <*mimarsh2 at vt.edu* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 01/09/2017 10:17 PM > Subject: [gpfsug-discuss] replication and no failure groups > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > > ------------------------------ > > > > > All, > > If I have a filesystem with replication set to 2 and 1 failure group: > > 1) I assume replication won't actually happen, correct? > > 2) Will this impact performance i.e cut write performance in half even > though it really only keeps 1 copy? > > End goal - I would like a single storage pool within the filesystem to be > replicated without affecting the performance of all other pools(which only > have a single failure group) > > Thanks, > Brian Marshall > VT - ARC_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From janfrode at tanso.net Mon Jan 9 22:24:45 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 09 Jan 2017 22:24:45 +0000 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Yaron, doesn't "-1" make each of these disk an independent failure group? >From 'man mmcrnsd': "The default is -1, which indicates this disk has no point of failure in common with any other disk." -jf man. 9. jan. 2017 kl. 21.54 skrev Yaron Daniel : > Hi > > So - do u able to have GPFS replication > > for the MD Failure Groups ? > > I can see that u have 3 Failure Groups > > for Data -1, 2012,2034 , how many Storage Subsystems you have ? > > > > > Regards > > > > ------------------------------ > > > > > > *YaronDaniel* 94 > > Em Ha'Moshavot Rd > > > *Server,* > > *Storageand Data Services* > *- > Team Leader* > > Petach > > Tiqva, 49527 > > > *GlobalTechnology Services* Israel > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > > > Mobile: +972-52-8395593 > > > e-mail: yard at il.ibm.com > > > > > *IBMIsrael* > > > > > > > > > > From: > > "J. Eric Wonderley" > > > > > To: > > gpfsug main discussion > > list > > Date: > > 01/09/2017 10:48 PM > Subject: > > Re: [gpfsug-discuss] > > replication and no failure groups > Sent by: > > gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Yaron: > > This is the filesystem: > > [root at cl005 net]# mmlsdisk work > disk driver > > sector failure holds holds > > storage > name type > > size group metadata data status > > availability pool > ------------ -------- ------ ----------- -------- ----- ------------- > ------------ > > ------------ > nsd_a_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_12 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_12 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_12 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_12 nsd > > 512 -1 No > > Yes ready up > > system > work_md_pf1_1 nsd 512 > > 200 Yes No ready > > up system > > > jbf1z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf2z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf3z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf4z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf5z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf6z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf7z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf8z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf1z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf2z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf3z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf4z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf5z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf6z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf7z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf8z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf1z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf2z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf3z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf4z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf5z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf6z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf7z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf8z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf1z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf2z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf3z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf4z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf5z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf6z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf7z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf8z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > work_md_pf1_2 nsd 512 > > 200 Yes No ready > > up system > > > work_md_pf1_3 nsd 512 > > 200 Yes No ready > > up system > > > work_md_pf1_4 nsd 512 > > 200 Yes No ready > > up system > > > work_md_pf2_5 nsd 512 > > 199 Yes No ready > > up system > > > work_md_pf2_6 nsd 512 > > 199 Yes No ready > > up system > > > work_md_pf2_7 nsd 512 > > 199 Yes No ready > > up system > > > work_md_pf2_8 nsd 512 > > 199 Yes No ready > > up system > > > [root at cl005 net]# mmlsfs work -R -r -M -m -K > flag > > value > > description > ------------------- ------------------------ > ----------------------------------- > -R > > 2 > > Maximum number of data replicas > -r > > 2 > > Default number of data replicas > -M > > 2 > > Maximum number of metadata replicas > -m > > 2 > > Default number of metadata replicas > -K > > whenpossible > > Strict replica allocation option > > > On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel <*YARD at il.ibm.com* > > > > wrote: > Hi > > 1) Yes in case u have only 1 Failure group - replication will not work. > > 2) Do you have 2 Storage Systems ? When using GPFS replication write > > stay the same - but read can be double - since it read from 2 Storage > systems > > Hope this help - what do you try to achive , can you share your env setup > > ? > > > Regards > > > > ------------------------------ > > > > > > *YaronDaniel* 94 > > Em Ha'Moshavot Rd > > > *Server,* > > *Storageand Data Services* > > > *-Team Leader* Petach > > Tiqva, 49527 > > > *GlobalTechnology Services* Israel > Phone: *+972-3-916-5672* <+972%203-916-5672> > Fax: *+972-3-916-5672* <+972%203-916-5672> > > > Mobile: *+972-52-8395593* <+972%2052-839-5593> > > > e-mail: *yard at il.ibm.com* > > > > > *IBMIsrael* > > > > > > > > > > From: Brian > > Marshall <*mimarsh2 at vt.edu* > > To: gpfsug > > main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 01/09/2017 > > 10:17 PM > Subject: [gpfsug-discuss] > > replication and no failure groups > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > > ------------------------------ > > > > > All, > > If I have a filesystem with replication set to 2 and 1 failure group: > > 1) I assume replication won't actually happen, correct? > > 2) Will this impact performance i.e cut write performance in half even > > though it really only keeps 1 copy? > > End goal - I would like a single storage pool within the filesystem to > > be replicated without affecting the performance of all other pools(which > > only have a single failure group) > > Thanks, > Brian Marshall > VT - ARC_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From vpuvvada at in.ibm.com Tue Jan 10 08:44:19 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Tue, 10 Jan 2017 14:14:19 +0530 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: <201701091552.v09Fq4kj012315@msw1.awe.co.uk> References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> <201701091552.v09Fq4kj012315@msw1.awe.co.uk> Message-ID: AFM cannot keep directory mtime in sync. Directory mtime changes during readdir when files are created inside it after initial lookup. This is a known limitation today. ~Venkat (vpuvvada at in.ibm.com) From: To: Date: 01/09/2017 09:30 PM Subject: Re: [gpfsug-discuss] AFM Migration Issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We have already come across the issues you have seen below, and worked around them. If you run the pre-fetch with just the --meta-data-only, then all the date stamps are correct for the dirs., as soon as you run --list-only all the directory times change to now. We have tried rsync but this did not appear to work. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs Sent: 09 January 2017 15:49 To: gpfsug-discuss at spectrumscale.org Subject: EXTERNAL: Re: [gpfsug-discuss] AFM Migration Issue Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on..... (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Tomlinson at awe.co.uk Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Jan 10 13:24:33 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 10 Jan 2017 08:24:33 -0500 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: That`s the answer. We hadn`t read deep enough and just assumed that -1 meant default failure group or no failure groups at all. Thanks, Brian On Mon, Jan 9, 2017 at 5:24 PM, Jan-Frode Myklebust wrote: > Yaron, doesn't "-1" make each of these disk an independent failure group? > > From 'man mmcrnsd': > > "The default is -1, which indicates this disk has no point of failure in > common with any other disk." > > > -jf > > > man. 9. jan. 2017 kl. 21.54 skrev Yaron Daniel : > >> Hi >> >> So - do u able to have GPFS replication >> >> for the MD Failure Groups ? >> >> I can see that u have 3 Failure Groups >> >> for Data -1, 2012,2034 , how many Storage Subsystems you have ? >> >> >> >> >> Regards >> >> >> >> ------------------------------ >> >> >> >> >> >> *YaronDaniel* 94 >> >> Em Ha'Moshavot Rd >> >> >> *Server,* >> >> *Storageand Data Services* >> *- >> Team Leader* >> >> Petach >> >> Tiqva, 49527 >> >> >> *GlobalTechnology Services* Israel >> Phone: +972-3-916-5672 <+972%203-916-5672> >> Fax: +972-3-916-5672 <+972%203-916-5672> >> >> >> Mobile: +972-52-8395593 <+972%2052-839-5593> >> >> >> e-mail: yard at il.ibm.com >> >> >> >> >> *IBMIsrael* >> >> >> >> >> >> >> >> >> >> From: >> >> "J. Eric Wonderley" >> >> >> >> >> To: >> >> gpfsug main discussion >> >> list >> >> Date: >> >> 01/09/2017 10:48 PM >> Subject: >> >> Re: [gpfsug-discuss] >> >> replication and no failure groups >> Sent by: >> >> gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> Hi Yaron: >> >> This is the filesystem: >> >> [root at cl005 net]# mmlsdisk work >> disk driver >> >> sector failure holds holds >> >> storage >> name type >> >> size group metadata data status >> >> availability pool >> ------------ -------- ------ ----------- -------- ----- ------------- >> ------------ >> >> ------------ >> nsd_a_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> work_md_pf1_1 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> jbf1z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf1z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf1z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf1z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> work_md_pf1_2 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> work_md_pf1_3 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> work_md_pf1_4 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> work_md_pf2_5 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> work_md_pf2_6 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> work_md_pf2_7 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> work_md_pf2_8 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> [root at cl005 net]# mmlsfs work -R -r -M -m -K >> flag >> >> value >> >> description >> ------------------- ------------------------ >> ----------------------------------- >> -R >> >> 2 >> >> Maximum number of data replicas >> -r >> >> 2 >> >> Default number of data replicas >> -M >> >> 2 >> >> Maximum number of metadata replicas >> -m >> >> 2 >> >> Default number of metadata replicas >> -K >> >> whenpossible >> >> Strict replica allocation option >> >> >> On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel <*YARD at il.ibm.com* >> > >> >> wrote: >> Hi >> >> 1) Yes in case u have only 1 Failure group - replication will not work. >> >> 2) Do you have 2 Storage Systems ? When using GPFS replication write >> >> stay the same - but read can be double - since it read from 2 Storage >> systems >> >> Hope this help - what do you try to achive , can you share your env setup >> >> ? >> >> >> Regards >> >> >> >> ------------------------------ >> >> >> >> >> >> *YaronDaniel* 94 >> >> Em Ha'Moshavot Rd >> >> >> *Server,* >> >> *Storageand Data Services* >> >> >> *-Team Leader* Petach >> >> Tiqva, 49527 >> >> >> *GlobalTechnology Services* Israel >> Phone: *+972-3-916-5672* <+972%203-916-5672> >> Fax: *+972-3-916-5672* <+972%203-916-5672> >> >> >> Mobile: *+972-52-8395593* <+972%2052-839-5593> >> >> >> e-mail: *yard at il.ibm.com* >> >> >> >> >> *IBMIsrael* >> >> >> >> >> >> >> >> >> >> From: Brian >> >> Marshall <*mimarsh2 at vt.edu* > >> To: gpfsug >> >> main discussion list <*gpfsug-discuss at spectrumscale.org* >> > >> Date: 01/09/2017 >> >> 10:17 PM >> Subject: [gpfsug-discuss] >> >> replication and no failure groups >> Sent by: *gpfsug-discuss-bounces at spectrumscale.org* >> >> >> ------------------------------ >> >> >> >> >> All, >> >> If I have a filesystem with replication set to 2 and 1 failure group: >> >> 1) I assume replication won't actually happen, correct? >> >> 2) Will this impact performance i.e cut write performance in half even >> >> though it really only keeps 1 copy? >> >> End goal - I would like a single storage pool within the filesystem to >> >> be replicated without affecting the performance of all other pools(which >> >> only have a single failure group) >> >> Thanks, >> Brian Marshall >> VT - ARC_______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> _______________________________________________ >> >> gpfsug-discuss mailing list >> >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Jan 10 17:59:52 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 10 Jan 2017 11:59:52 -0600 Subject: [gpfsug-discuss] CES nodes Hyper threads or no Message-ID: <5376d22b-abdc-7ead-5ea8-ae9da3073c4f@wustl.edu> All, I typically turn Hyper threading off on storage nodes. So I did on our CES nodes as well. Now they are running at a load of over 100 and have 25% cpu idle. With two 8 cores I am now wondering if hyper threading would help or did we just under size them :-(. These are nfs v3 servers only with lroc enabled. Load average: 156.13 160.40 158.97 any opinions on if it would help. Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From jtolson at us.ibm.com Tue Jan 10 20:17:01 2017 From: jtolson at us.ibm.com (John T Olson) Date: Tue, 10 Jan 2017 13:17:01 -0700 Subject: [gpfsug-discuss] Updated whitepaper published In-Reply-To: References: Message-ID: An updated white paper has been published which shows integration of the Varonis UNIX agent in Spectrum Scale for audit logging. This version of the paper is updated to include test results from new capabilities provided in Spectrum Scale version 4.2.2.1. Here is a link to the paper: https://www.ibm.com/developerworks/community/wikis/form/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 11 09:27:06 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 11 Jan 2017 09:27:06 +0000 Subject: [gpfsug-discuss] CES log files Message-ID: Which files do I need to look in to determine what's happening with CES... supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Jan 11 09:54:39 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 11 Jan 2017 09:54:39 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 11:21:00 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 12:21:00 +0100 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: I also struggle with where to look for CES log files.. but maybe the new "mmprotocoltrace" command can be useful? # mmprotocoltrace start smb ### reproduce problem # mmprotocoltrace stop smb Check log files it has collected. -jf On Wed, Jan 11, 2017 at 10:27 AM, Sobey, Richard A wrote: > Which files do I need to look in to determine what?s happening with CES? > supposing for example a load of domain controllers were shut down and CES > had no clue how to handle this and stopped working until the DCs were > switched back on again. > > > > Mmfs.log.latest said everything was fine btw. > > > > Thanks > > Richard > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 11 13:59:30 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 11 Jan 2017 13:59:30 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" > Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 11 14:29:39 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 11 Jan 2017 14:29:39 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: What did the smb log claim on the nodes? Should be in /var/adm/ras, for example if SMB failed, then I could see that CES would mark the node as degraded. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 11 January 2017 at 13:59 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] CES log files Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" > Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Jan 11 14:39:13 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 11 Jan 2017 14:39:13 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster Message-ID: We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are connected via Infiniband (FDR14). At the time of implementation of ESS, we were instructed to enable RDMA in addition to IPoIB. Previously we only ran IPoIB on our GPFS3.5 cluster. Every since the implementation (sometime back in July of 2016) we see a lot of compute nodes being ejected. What usually precedes the ejection are following messages: Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 1 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 400 Even our ESS IO server sometimes ends up being ejected (case in point - yesterday morning): Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3001 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2671 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2495 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3077 Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease renewal is overdue. Pinging to check if it is alive I've had multiple PMRs open for this issue, and I am told that our ESS needs code level upgrades in order to fix this issue. Looking at the errors, I think the issue is Infiniband related, and I am wondering if anyone on this list has seen similar issues? Thanks for your help in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Jan 11 15:03:13 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 11 Jan 2017 16:03:13 +0100 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 15:10:03 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 16:10:03 +0100 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: My first guess would also be rdmaSend, which the gssClientConfig.sh enables by default, but isn't scalable to large clusters. It fits with your error message: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning - """For GPFS version 3.5.0.11 and later, IB error IBV_WC_RNR_RETRY_EXC_ERR may occur if the cluster is too large when verbsRdmaSend is enabled Idf these errors are observed in the mmfs log, disable verbsRdmaSend on all nodes.. Additionally, out of memory errors may occur if verbsRdmaSend is enabled on very large clusters. If out of memory errors are observed, disabled verbsRdmaSend on all nodes in the cluster.""" Otherwise it would be nice if you could post your mmlsconfig to see if something else sticks out.. -jf On Wed, Jan 11, 2017 at 4:03 PM, Olaf Weiser wrote: > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jan 11 15:15:52 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Wed, 11 Jan 2017 15:15:52 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster References: [gpfsug-discuss] nodes being ejected out of the cluster Message-ID: <5F910253243E6A47B81A9A2EB424BBA101E91A4A@NDMSMBX404.ndc.nasa.gov> The RDMA errors I think are secondary to what's going on with either your IPoIB or Ethernet fabrics that's causing I assume IPoIB communication breakdowns and expulsions. We've had entire IB fabrics go offline and if the nodes werent depending on it for daemon communication nobody got expelled. Do you have a subnet defined for your IPoIB network or are your nodes daemon interfaces already set to their IPoIB interface? Have you checked your SM logs? From: Damir Krstic Sent: 1/11/17, 9:39 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] nodes being ejected out of the cluster We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are connected via Infiniband (FDR14). At the time of implementation of ESS, we were instructed to enable RDMA in addition to IPoIB. Previously we only ran IPoIB on our GPFS3.5 cluster. Every since the implementation (sometime back in July of 2016) we see a lot of compute nodes being ejected. What usually precedes the ejection are following messages: Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 1 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 400 Even our ESS IO server sometimes ends up being ejected (case in point - yesterday morning): Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3001 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2671 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2495 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3077 Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease renewal is overdue. Pinging to check if it is alive I've had multiple PMRs open for this issue, and I am told that our ESS needs code level upgrades in order to fix this issue. Looking at the errors, I think the issue is Infiniband related, and I am wondering if anyone on this list has seen similar issues? Thanks for your help in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Jan 11 15:16:09 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 11 Jan 2017 15:16:09 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From syi at ca.ibm.com Wed Jan 11 17:30:08 2017 From: syi at ca.ibm.com (Yi Sun) Date: Wed, 11 Jan 2017 12:30:08 -0500 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: Sometime increasing CES debug level to get more info, e.g. "mmces log level 3". Here are two public wiki links (probably you already know). https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Protocol%20Node%20-%20Tuning%20and%20Analysis https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Protocols%20Problem%20Determination Yi. gpfsug-discuss-bounces at spectrumscale.org wrote on 01/11/2017 07:00:06 AM: > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 01/11/2017 07:00 AM > Subject: gpfsug-discuss Digest, Vol 60, Issue 26 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: CES log files (Jan-Frode Myklebust) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 11 Jan 2017 12:21:00 +0100 > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] CES log files > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > I also struggle with where to look for CES log files.. but maybe the new > "mmprotocoltrace" command can be useful? > > # mmprotocoltrace start smb > ### reproduce problem > # mmprotocoltrace stop smb > > Check log files it has collected. > > > -jf > > > On Wed, Jan 11, 2017 at 10:27 AM, Sobey, Richard A > wrote: > > > Which files do I need to look in to determine what?s happening with CES? > > supposing for example a load of domain controllers were shut down and CES > > had no clue how to handle this and stopped working until the DCs were > > switched back on again. > > > > > > > > Mmfs.log.latest said everything was fine btw. > > > > > > > > Thanks > > > > Richard > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20170111/4ea25ddf/attachment-0001.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 60, Issue 26 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Jan 11 17:53:50 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 11 Jan 2017 17:53:50 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: Thanks for all the suggestions. Here is our mmlsconfig file. We just purchased another GL6. During the installation of the new GL6 IBM will upgrade our existing GL6 up to the latest code levels. This will happen during the week of 23rd of Jan. I am skeptical that the upgrade is going to fix the issue. On our IO servers we are running in connected mode (please note that IB interfaces are bonded) [root at gssio1 ~]# cat /sys/class/net/ib0/mode connected [root at gssio1 ~]# cat /sys/class/net/ib1/mode connected [root at gssio1 ~]# cat /sys/class/net/ib2/mode connected [root at gssio1 ~]# cat /sys/class/net/ib3/mode connected [root at gssio2 ~]# cat /sys/class/net/ib0/mode connected [root at gssio2 ~]# cat /sys/class/net/ib1/mode connected [root at gssio2 ~]# cat /sys/class/net/ib2/mode connected [root at gssio2 ~]# cat /sys/class/net/ib3/mode connected Our login nodes are also running connected mode as well. However, all of our compute nodes are running in datagram: [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode qnode0758: datagram qnode0763: datagram qnode0760: datagram qnode0772: datagram qnode0773: datagram ....etc. Here is our mmlsconfig: [root at gssio1 ~]# mmlsconfig Configuration data for cluster ess-qstorage.it.northwestern.edu: ---------------------------------------------------------------- clusterName ess-qstorage.it.northwestern.edu clusterId 17746506346828356609 dmapiFileHandleSize 32 minReleaseLevel 4.2.0.1 ccrEnabled yes cipherList AUTHONLY [gss_ppc64] nsdRAIDBufferPoolSizePct 80 maxBufferDescs 2m prefetchPct 5 nsdRAIDTracks 128k nsdRAIDSmallBufferSize 256k nsdMaxWorkerThreads 3k nsdMinWorkerThreads 3k nsdRAIDSmallThreadRatio 2 nsdRAIDThreadsPerQueue 16 nsdRAIDEventLogToConsole all nsdRAIDFastWriteFSDataLimit 256k nsdRAIDFastWriteFSMetadataLimit 1M nsdRAIDReconstructAggressiveness 1 nsdRAIDFlusherBuffersLowWatermarkPct 20 nsdRAIDFlusherBuffersLimitPct 80 nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDFlusherTracksLimitPct 80 nsdRAIDFlusherFWLogHighWatermarkMB 1000 nsdRAIDFlusherFWLogLimitMB 5000 nsdRAIDFlusherThreadsLowWatermark 1 nsdRAIDFlusherThreadsHighWatermark 512 nsdRAIDBlockDeviceMaxSectorsKB 8192 nsdRAIDBlockDeviceNrRequests 32 nsdRAIDBlockDeviceQueueDepth 16 nsdRAIDBlockDeviceScheduler deadline nsdRAIDMaxTransientStale2FT 1 nsdRAIDMaxTransientStale3FT 1 nsdMultiQueue 512 syncWorkerThreads 256 nsdInlineWriteMax 32k maxGeneralThreads 1280 maxReceiverThreads 128 nspdQueues 64 [common] maxblocksize 16m [ems1-fdr,compute,gss_ppc64] numaMemoryInterleave yes [gss_ppc64] maxFilesToCache 12k [ems1-fdr,compute] maxFilesToCache 128k [ems1-fdr,compute,gss_ppc64] flushedDataTarget 1024 flushedInodeTarget 1024 maxFileCleaners 1024 maxBufferCleaners 1024 logBufferCount 20 logWrapAmountPct 2 logWrapThreads 128 maxAllocRegionsPerNode 32 maxBackgroundDeletionThreads 16 maxInodeDeallocPrefetch 128 [gss_ppc64] maxMBpS 16000 [ems1-fdr,compute] maxMBpS 10000 [ems1-fdr,compute,gss_ppc64] worker1Threads 1024 worker3Threads 32 [gss_ppc64] ioHistorySize 64k [ems1-fdr,compute] ioHistorySize 4k [gss_ppc64] verbsRdmaMinBytes 16k [ems1-fdr,compute] verbsRdmaMinBytes 32k [ems1-fdr,compute,gss_ppc64] verbsRdmaSend yes [gss_ppc64] verbsRdmasPerConnection 16 [ems1-fdr,compute] verbsRdmasPerConnection 256 [gss_ppc64] verbsRdmasPerNode 3200 [ems1-fdr,compute] verbsRdmasPerNode 1024 [ems1-fdr,compute,gss_ppc64] verbsSendBufferMemoryMB 1024 verbsRdmasPerNodeOptimize yes verbsRdmaUseMultiCqThreads yes [ems1-fdr,compute] ignorePrefetchLUNCount yes [gss_ppc64] scatterBufferSize 256K [ems1-fdr,compute] scatterBufferSize 256k syncIntervalStrict yes [ems1-fdr,compute,gss_ppc64] nsdClientCksumTypeLocal ck64 nsdClientCksumTypeRemote ck64 [gss_ppc64] pagepool 72856M [ems1-fdr] pagepool 17544M [compute] pagepool 4g [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] verbsRdma enable [gss_ppc64] verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 [ems1-fdr] verbsPorts mlx5_0/1 mlx5_0/2 [qsched03-ib0,quser10-fdr,compute] verbsPorts mlx4_0/1 [common] autoload no [ems1-fdr,compute,gss_ppc64] maxStatCache 0 [common] envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 deadlockOverloadThreshold 0 deadlockDetectionThreshold 0 adminMode central File systems in cluster ess-qstorage.it.northwestern.edu: --------------------------------------------------------- /dev/home /dev/hpc /dev/projects /dev/tthome On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches wrote: > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 18:38:30 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 18:38:30 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: And there you have: [ems1-fdr,compute,gss_ppc64] verbsRdmaSend yes Try turning this off. -jf ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic : > Thanks for all the suggestions. Here is our mmlsconfig file. We just > purchased another GL6. During the installation of the new GL6 IBM will > upgrade our existing GL6 up to the latest code levels. This will happen > during the week of 23rd of Jan. > > I am skeptical that the upgrade is going to fix the issue. > > On our IO servers we are running in connected mode (please note that IB > interfaces are bonded) > > [root at gssio1 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib3/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib3/mode > > connected > > Our login nodes are also running connected mode as well. > > However, all of our compute nodes are running in datagram: > > [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode > > qnode0758: datagram > > qnode0763: datagram > > qnode0760: datagram > > qnode0772: datagram > > qnode0773: datagram > ....etc. > > Here is our mmlsconfig: > > [root at gssio1 ~]# mmlsconfig > > Configuration data for cluster ess-qstorage.it.northwestern.edu: > > ---------------------------------------------------------------- > > clusterName ess-qstorage.it.northwestern.edu > > clusterId 17746506346828356609 > > dmapiFileHandleSize 32 > > minReleaseLevel 4.2.0.1 > > ccrEnabled yes > > cipherList AUTHONLY > > [gss_ppc64] > > nsdRAIDBufferPoolSizePct 80 > > maxBufferDescs 2m > > prefetchPct 5 > > nsdRAIDTracks 128k > > nsdRAIDSmallBufferSize 256k > > nsdMaxWorkerThreads 3k > > nsdMinWorkerThreads 3k > > nsdRAIDSmallThreadRatio 2 > > nsdRAIDThreadsPerQueue 16 > > nsdRAIDEventLogToConsole all > > nsdRAIDFastWriteFSDataLimit 256k > > nsdRAIDFastWriteFSMetadataLimit 1M > > nsdRAIDReconstructAggressiveness 1 > > nsdRAIDFlusherBuffersLowWatermarkPct 20 > > nsdRAIDFlusherBuffersLimitPct 80 > > nsdRAIDFlusherTracksLowWatermarkPct 20 > > nsdRAIDFlusherTracksLimitPct 80 > > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > > nsdRAIDFlusherFWLogLimitMB 5000 > > nsdRAIDFlusherThreadsLowWatermark 1 > > nsdRAIDFlusherThreadsHighWatermark 512 > > nsdRAIDBlockDeviceMaxSectorsKB 8192 > > nsdRAIDBlockDeviceNrRequests 32 > > nsdRAIDBlockDeviceQueueDepth 16 > > nsdRAIDBlockDeviceScheduler deadline > > nsdRAIDMaxTransientStale2FT 1 > > nsdRAIDMaxTransientStale3FT 1 > > nsdMultiQueue 512 > > syncWorkerThreads 256 > > nsdInlineWriteMax 32k > > maxGeneralThreads 1280 > > maxReceiverThreads 128 > > nspdQueues 64 > > [common] > > maxblocksize 16m > > [ems1-fdr,compute,gss_ppc64] > > numaMemoryInterleave yes > > [gss_ppc64] > > maxFilesToCache 12k > > [ems1-fdr,compute] > > maxFilesToCache 128k > > [ems1-fdr,compute,gss_ppc64] > > flushedDataTarget 1024 > > flushedInodeTarget 1024 > > maxFileCleaners 1024 > > maxBufferCleaners 1024 > > logBufferCount 20 > > logWrapAmountPct 2 > > logWrapThreads 128 > > maxAllocRegionsPerNode 32 > > maxBackgroundDeletionThreads 16 > > maxInodeDeallocPrefetch 128 > > [gss_ppc64] > > maxMBpS 16000 > > [ems1-fdr,compute] > > maxMBpS 10000 > > [ems1-fdr,compute,gss_ppc64] > > worker1Threads 1024 > > worker3Threads 32 > > [gss_ppc64] > > ioHistorySize 64k > > [ems1-fdr,compute] > > ioHistorySize 4k > > [gss_ppc64] > > verbsRdmaMinBytes 16k > > [ems1-fdr,compute] > > verbsRdmaMinBytes 32k > > [ems1-fdr,compute,gss_ppc64] > > verbsRdmaSend yes > > [gss_ppc64] > > verbsRdmasPerConnection 16 > > [ems1-fdr,compute] > > verbsRdmasPerConnection 256 > > [gss_ppc64] > > verbsRdmasPerNode 3200 > > [ems1-fdr,compute] > > verbsRdmasPerNode 1024 > > [ems1-fdr,compute,gss_ppc64] > > verbsSendBufferMemoryMB 1024 > > verbsRdmasPerNodeOptimize yes > > verbsRdmaUseMultiCqThreads yes > > [ems1-fdr,compute] > > ignorePrefetchLUNCount yes > > [gss_ppc64] > > scatterBufferSize 256K > > [ems1-fdr,compute] > > scatterBufferSize 256k > > syncIntervalStrict yes > > [ems1-fdr,compute,gss_ppc64] > > nsdClientCksumTypeLocal ck64 > > nsdClientCksumTypeRemote ck64 > > [gss_ppc64] > > pagepool 72856M > > [ems1-fdr] > > pagepool 17544M > > [compute] > > pagepool 4g > > [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] > > verbsRdma enable > > [gss_ppc64] > > verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 > > [ems1-fdr] > > verbsPorts mlx5_0/1 mlx5_0/2 > > [qsched03-ib0,quser10-fdr,compute] > > verbsPorts mlx4_0/1 > > [common] > > autoload no > > [ems1-fdr,compute,gss_ppc64] > > maxStatCache 0 > > [common] > > envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 > > deadlockOverloadThreshold 0 > > deadlockDetectionThreshold 0 > > adminMode central > > > File systems in cluster ess-qstorage.it.northwestern.edu: > > --------------------------------------------------------- > > /dev/home > > /dev/hpc > > /dev/projects > > /dev/tthome > > On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches > wrote: > > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Jan 11 19:22:31 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 11 Jan 2017 19:22:31 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: Can this be done live? Meaning can GPFS remain up when I turn this off? Thanks, Damir On Wed, Jan 11, 2017 at 12:38 PM Jan-Frode Myklebust wrote: > And there you have: > > [ems1-fdr,compute,gss_ppc64] > verbsRdmaSend yes > > Try turning this off. > > > -jf > ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic : > > Thanks for all the suggestions. Here is our mmlsconfig file. We just > purchased another GL6. During the installation of the new GL6 IBM will > upgrade our existing GL6 up to the latest code levels. This will happen > during the week of 23rd of Jan. > > I am skeptical that the upgrade is going to fix the issue. > > On our IO servers we are running in connected mode (please note that IB > interfaces are bonded) > > [root at gssio1 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib3/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib3/mode > > connected > > Our login nodes are also running connected mode as well. > > However, all of our compute nodes are running in datagram: > > [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode > > qnode0758: datagram > > qnode0763: datagram > > qnode0760: datagram > > qnode0772: datagram > > qnode0773: datagram > ....etc. > > Here is our mmlsconfig: > > [root at gssio1 ~]# mmlsconfig > > Configuration data for cluster ess-qstorage.it.northwestern.edu: > > ---------------------------------------------------------------- > > clusterName ess-qstorage.it.northwestern.edu > > clusterId 17746506346828356609 > > dmapiFileHandleSize 32 > > minReleaseLevel 4.2.0.1 > > ccrEnabled yes > > cipherList AUTHONLY > > [gss_ppc64] > > nsdRAIDBufferPoolSizePct 80 > > maxBufferDescs 2m > > prefetchPct 5 > > nsdRAIDTracks 128k > > nsdRAIDSmallBufferSize 256k > > nsdMaxWorkerThreads 3k > > nsdMinWorkerThreads 3k > > nsdRAIDSmallThreadRatio 2 > > nsdRAIDThreadsPerQueue 16 > > nsdRAIDEventLogToConsole all > > nsdRAIDFastWriteFSDataLimit 256k > > nsdRAIDFastWriteFSMetadataLimit 1M > > nsdRAIDReconstructAggressiveness 1 > > nsdRAIDFlusherBuffersLowWatermarkPct 20 > > nsdRAIDFlusherBuffersLimitPct 80 > > nsdRAIDFlusherTracksLowWatermarkPct 20 > > nsdRAIDFlusherTracksLimitPct 80 > > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > > nsdRAIDFlusherFWLogLimitMB 5000 > > nsdRAIDFlusherThreadsLowWatermark 1 > > nsdRAIDFlusherThreadsHighWatermark 512 > > nsdRAIDBlockDeviceMaxSectorsKB 8192 > > nsdRAIDBlockDeviceNrRequests 32 > > nsdRAIDBlockDeviceQueueDepth 16 > > nsdRAIDBlockDeviceScheduler deadline > > nsdRAIDMaxTransientStale2FT 1 > > nsdRAIDMaxTransientStale3FT 1 > > nsdMultiQueue 512 > > syncWorkerThreads 256 > > nsdInlineWriteMax 32k > > maxGeneralThreads 1280 > > maxReceiverThreads 128 > > nspdQueues 64 > > [common] > > maxblocksize 16m > > [ems1-fdr,compute,gss_ppc64] > > numaMemoryInterleave yes > > [gss_ppc64] > > maxFilesToCache 12k > > [ems1-fdr,compute] > > maxFilesToCache 128k > > [ems1-fdr,compute,gss_ppc64] > > flushedDataTarget 1024 > > flushedInodeTarget 1024 > > maxFileCleaners 1024 > > maxBufferCleaners 1024 > > logBufferCount 20 > > logWrapAmountPct 2 > > logWrapThreads 128 > > maxAllocRegionsPerNode 32 > > maxBackgroundDeletionThreads 16 > > maxInodeDeallocPrefetch 128 > > [gss_ppc64] > > maxMBpS 16000 > > [ems1-fdr,compute] > > maxMBpS 10000 > > [ems1-fdr,compute,gss_ppc64] > > worker1Threads 1024 > > worker3Threads 32 > > [gss_ppc64] > > ioHistorySize 64k > > [ems1-fdr,compute] > > ioHistorySize 4k > > [gss_ppc64] > > verbsRdmaMinBytes 16k > > [ems1-fdr,compute] > > verbsRdmaMinBytes 32k > > [ems1-fdr,compute,gss_ppc64] > > verbsRdmaSend yes > > [gss_ppc64] > > verbsRdmasPerConnection 16 > > [ems1-fdr,compute] > > verbsRdmasPerConnection 256 > > [gss_ppc64] > > verbsRdmasPerNode 3200 > > [ems1-fdr,compute] > > verbsRdmasPerNode 1024 > > [ems1-fdr,compute,gss_ppc64] > > verbsSendBufferMemoryMB 1024 > > verbsRdmasPerNodeOptimize yes > > verbsRdmaUseMultiCqThreads yes > > [ems1-fdr,compute] > > ignorePrefetchLUNCount yes > > [gss_ppc64] > > scatterBufferSize 256K > > [ems1-fdr,compute] > > scatterBufferSize 256k > > syncIntervalStrict yes > > [ems1-fdr,compute,gss_ppc64] > > nsdClientCksumTypeLocal ck64 > > nsdClientCksumTypeRemote ck64 > > [gss_ppc64] > > pagepool 72856M > > [ems1-fdr] > > pagepool 17544M > > [compute] > > pagepool 4g > > [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] > > verbsRdma enable > > [gss_ppc64] > > verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 > > [ems1-fdr] > > verbsPorts mlx5_0/1 mlx5_0/2 > > [qsched03-ib0,quser10-fdr,compute] > > verbsPorts mlx4_0/1 > > [common] > > autoload no > > [ems1-fdr,compute,gss_ppc64] > > maxStatCache 0 > > [common] > > envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 > > deadlockOverloadThreshold 0 > > deadlockDetectionThreshold 0 > > adminMode central > > > File systems in cluster ess-qstorage.it.northwestern.edu: > > --------------------------------------------------------- > > /dev/home > > /dev/hpc > > /dev/projects > > /dev/tthome > > On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches > wrote: > > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 19:46:00 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 19:46:00 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: Don't think you can change it without reloading gpfs. Also it should be turned off for all nodes.. So it's a big change, unfortunately.. -jf ons. 11. jan. 2017 kl. 20.22 skrev Damir Krstic : > Can this be done live? Meaning can GPFS remain up when I turn this off? > > Thanks, > Damir > > On Wed, Jan 11, 2017 at 12:38 PM Jan-Frode Myklebust > wrote: > > And there you have: > > [ems1-fdr,compute,gss_ppc64] > verbsRdmaSend yes > > Try turning this off. > > > -jf > ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic : > > Thanks for all the suggestions. Here is our mmlsconfig file. We just > purchased another GL6. During the installation of the new GL6 IBM will > upgrade our existing GL6 up to the latest code levels. This will happen > during the week of 23rd of Jan. > > I am skeptical that the upgrade is going to fix the issue. > > On our IO servers we are running in connected mode (please note that IB > interfaces are bonded) > > [root at gssio1 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib3/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib3/mode > > connected > > Our login nodes are also running connected mode as well. > > However, all of our compute nodes are running in datagram: > > [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode > > qnode0758: datagram > > qnode0763: datagram > > qnode0760: datagram > > qnode0772: datagram > > qnode0773: datagram > ....etc. > > Here is our mmlsconfig: > > [root at gssio1 ~]# mmlsconfig > > Configuration data for cluster ess-qstorage.it.northwestern.edu: > > ---------------------------------------------------------------- > > clusterName ess-qstorage.it.northwestern.edu > > clusterId 17746506346828356609 > > dmapiFileHandleSize 32 > > minReleaseLevel 4.2.0.1 > > ccrEnabled yes > > cipherList AUTHONLY > > [gss_ppc64] > > nsdRAIDBufferPoolSizePct 80 > > maxBufferDescs 2m > > prefetchPct 5 > > nsdRAIDTracks 128k > > nsdRAIDSmallBufferSize 256k > > nsdMaxWorkerThreads 3k > > nsdMinWorkerThreads 3k > > nsdRAIDSmallThreadRatio 2 > > nsdRAIDThreadsPerQueue 16 > > nsdRAIDEventLogToConsole all > > nsdRAIDFastWriteFSDataLimit 256k > > nsdRAIDFastWriteFSMetadataLimit 1M > > nsdRAIDReconstructAggressiveness 1 > > nsdRAIDFlusherBuffersLowWatermarkPct 20 > > nsdRAIDFlusherBuffersLimitPct 80 > > nsdRAIDFlusherTracksLowWatermarkPct 20 > > nsdRAIDFlusherTracksLimitPct 80 > > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > > nsdRAIDFlusherFWLogLimitMB 5000 > > nsdRAIDFlusherThreadsLowWatermark 1 > > nsdRAIDFlusherThreadsHighWatermark 512 > > nsdRAIDBlockDeviceMaxSectorsKB 8192 > > nsdRAIDBlockDeviceNrRequests 32 > > nsdRAIDBlockDeviceQueueDepth 16 > > nsdRAIDBlockDeviceScheduler deadline > > nsdRAIDMaxTransientStale2FT 1 > > nsdRAIDMaxTransientStale3FT 1 > > nsdMultiQueue 512 > > syncWorkerThreads 256 > > nsdInlineWriteMax 32k > > maxGeneralThreads 1280 > > maxReceiverThreads 128 > > nspdQueues 64 > > [common] > > maxblocksize 16m > > [ems1-fdr,compute,gss_ppc64] > > numaMemoryInterleave yes > > [gss_ppc64] > > maxFilesToCache 12k > > [ems1-fdr,compute] > > maxFilesToCache 128k > > [ems1-fdr,compute,gss_ppc64] > > flushedDataTarget 1024 > > flushedInodeTarget 1024 > > maxFileCleaners 1024 > > maxBufferCleaners 1024 > > logBufferCount 20 > > logWrapAmountPct 2 > > logWrapThreads 128 > > maxAllocRegionsPerNode 32 > > maxBackgroundDeletionThreads 16 > > maxInodeDeallocPrefetch 128 > > [gss_ppc64] > > maxMBpS 16000 > > [ems1-fdr,compute] > > maxMBpS 10000 > > [ems1-fdr,compute,gss_ppc64] > > worker1Threads 1024 > > worker3Threads 32 > > [gss_ppc64] > > ioHistorySize 64k > > [ems1-fdr,compute] > > ioHistorySize 4k > > [gss_ppc64] > > verbsRdmaMinBytes 16k > > [ems1-fdr,compute] > > verbsRdmaMinBytes 32k > > [ems1-fdr,compute,gss_ppc64] > > verbsRdmaSend yes > > [gss_ppc64] > > verbsRdmasPerConnection 16 > > [ems1-fdr,compute] > > verbsRdmasPerConnection 256 > > [gss_ppc64] > > verbsRdmasPerNode 3200 > > [ems1-fdr,compute] > > verbsRdmasPerNode 1024 > > [ems1-fdr,compute,gss_ppc64] > > verbsSendBufferMemoryMB 1024 > > verbsRdmasPerNodeOptimize yes > > verbsRdmaUseMultiCqThreads yes > > [ems1-fdr,compute] > > ignorePrefetchLUNCount yes > > [gss_ppc64] > > scatterBufferSize 256K > > [ems1-fdr,compute] > > scatterBufferSize 256k > > syncIntervalStrict yes > > [ems1-fdr,compute,gss_ppc64] > > nsdClientCksumTypeLocal ck64 > > nsdClientCksumTypeRemote ck64 > > [gss_ppc64] > > pagepool 72856M > > [ems1-fdr] > > pagepool 17544M > > [compute] > > pagepool 4g > > [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] > > verbsRdma enable > > [gss_ppc64] > > verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 > > [ems1-fdr] > > verbsPorts mlx5_0/1 mlx5_0/2 > > [qsched03-ib0,quser10-fdr,compute] > > verbsPorts mlx4_0/1 > > [common] > > autoload no > > [ems1-fdr,compute,gss_ppc64] > > maxStatCache 0 > > [common] > > envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 > > deadlockOverloadThreshold 0 > > deadlockDetectionThreshold 0 > > adminMode central > > > File systems in cluster ess-qstorage.it.northwestern.edu: > > --------------------------------------------------------- > > /dev/home > > /dev/hpc > > /dev/projects > > /dev/tthome > > On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches > wrote: > > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jan 11 22:33:24 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 11 Jan 2017 15:33:24 -0700 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From r.sobey at imperial.ac.uk Thu Jan 12 09:51:12 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 12 Jan 2017 09:51:12 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not? This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible. Food for thought though! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 11 January 2017 22:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES log files A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aleciarm at us.ibm.com Thu Jan 12 14:54:12 2017 From: aleciarm at us.ibm.com (Alecia A Ramsay) Date: Thu, 12 Jan 2017 09:54:12 -0500 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: The Spectrum Scale Knowledge Center does have a topic on collecting CES log files. This might be helpful (4.2.2 version): http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1pdg_ces_monitor_admin.htm Alecia A. Ramsay, PMP? Program Manager, New Technology Introduction IBM Systems - Storage aleciarm at us.ibm.com work: 919-435-6494; mobile: 651-260-4928 https://www-01.ibm.com/marketing/iwm/iwmdocs/web/cc/earlyprograms/systems.shtml From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/12/2017 04:51 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not? This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible. Food for thought though! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 11 January 2017 22:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES log files A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From christof.schmitt at us.ibm.com Thu Jan 12 18:06:48 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 12 Jan 2017 11:06:48 -0700 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: It looks like the patch for the mentioned bugzilla is in 4.2.2.0, but not in 4.2.1.2. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/12/2017 02:51 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not? This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible. Food for thought though! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 11 January 2017 22:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES log files A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mimarsh2 at vt.edu Fri Jan 13 19:50:10 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 13 Jan 2017 14:50:10 -0500 Subject: [gpfsug-discuss] Authorized Key Messages Message-ID: All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jan 13 20:14:03 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 13 Jan 2017 15:14:03 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: Brian, This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 02:50 PM Subject: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Fri Jan 13 20:19:25 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 13 Jan 2017 15:19:25 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: We are running 4.2.1 (there may be some point fixes we don't have) Any report of it being in this version? Brian On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop wrote: > Brian, > > This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. > > Regards, > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/13/2017 02:50 PM > Subject: [gpfsug-discuss] Authorized Key Messages > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > All, > > I just saw this message start popping up constantly on one our NSD Servers. > > [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist > > CCR Auth is disabled on all the NSD Servers. > > What other features/checks would look for the ccr keys? > > Thanks, > Brian Marshall > Virginia Tech_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jan 13 22:58:02 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 13 Jan 2017 17:58:02 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: Brian, I had to check again whether the fix in question was in 4.2.0.0 (as opposed to a newer mod release), but confirmed that it seems to be. So this could be a new or different problem than the one I was thinking about. Researching a bit further, I found another potential match (internal defect number 981469), but that should be fixed in 4.2.1 as well. I have not seen recent reports of this problem. Perhaps this could be pursued via a PMR. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 03:21 PM Subject: Re: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org We are running 4.2.1 (there may be some point fixes we don't have) Any report of it being in this version? Brian On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop wrote: Brian, This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 02:50 PM Subject: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Jan 13 23:30:05 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 13 Jan 2017 18:30:05 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: Our intent was to have ccr turned off since all nodes are quorum in the server cluster: Considering this: [root at cl001 ~]# mmfsadm dump config | grep -i ccr ! ccrEnabled 0 ccrMaxChallengeCheckRetries 4 ccr : 0 (cluster configuration repository) ccr : 1 (cluster configuration repository) Will this disable ccr? On Fri, Jan 13, 2017 at 5:58 PM, Felipe Knop wrote: > Brian, > > I had to check again whether the fix in question was in 4.2.0.0 (as > opposed to a newer mod release), but confirmed that it seems to be. So > this could be a new or different problem than the one I was thinking about. > > Researching a bit further, I found another potential match (internal > defect number 981469), but that should be fixed in 4.2.1 as well. I have > not seen recent reports of this problem. > > Perhaps this could be pursued via a PMR. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/13/2017 03:21 PM > Subject: Re: [gpfsug-discuss] Authorized Key Messages > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running 4.2.1 (there may be some point fixes we don't have) > > Any report of it being in this version? > > Brian > > On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop <*knop at us.ibm.com* > > wrote: > Brian, > > This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. > > Regards, > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > *(845) 433-9314* <(845)%20433-9314> T/L 293-9314 > > > > > > From: Brian Marshall <*mimarsh2 at vt.edu* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 01/13/2017 02:50 PM > Subject: [gpfsug-discuss] Authorized Key Messages > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > All, > > I just saw this message start popping up constantly on one our NSD Servers. > > [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist > > CCR Auth is disabled on all the NSD Servers. > > What other features/checks would look for the ccr keys? > > Thanks, > Brian Marshall > Virginia Tech_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jan 13 23:48:37 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 13 Jan 2017 18:48:37 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: "! ccrEnabled 0" does indicate that CCR is disabled on the (server) cluster. In fact, instances of this '/var/mmfs/ssl/authorized_ccr_keys' does not exist message have been seen in clusters where CCR was disabled. It's just somewhat puzzling that the error message is appears in 4.2.1 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "J. Eric Wonderley" To: gpfsug main discussion list Date: 01/13/2017 06:30 PM Subject: Re: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org Our intent was to have ccr turned off since all nodes are quorum in the server cluster: Considering this: [root at cl001 ~]# mmfsadm dump config | grep -i ccr ! ccrEnabled 0 ccrMaxChallengeCheckRetries 4 ccr : 0 (cluster configuration repository) ccr : 1 (cluster configuration repository) Will this disable ccr? On Fri, Jan 13, 2017 at 5:58 PM, Felipe Knop wrote: Brian, I had to check again whether the fix in question was in 4.2.0.0 (as opposed to a newer mod release), but confirmed that it seems to be. So this could be a new or different problem than the one I was thinking about. Researching a bit further, I found another potential match (internal defect number 981469), but that should be fixed in 4.2.1 as well. I have not seen recent reports of this problem. Perhaps this could be pursued via a PMR. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 03:21 PM Subject: Re: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org We are running 4.2.1 (there may be some point fixes we don't have) Any report of it being in this version? Brian On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop wrote: Brian, This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 02:50 PM Subject: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Sun Jan 15 21:18:31 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Sun, 15 Jan 2017 21:18:31 +0000 Subject: [gpfsug-discuss] GUI "maintenance mode" Message-ID: Is there a way, perhaps through the CLI, to set a node in maintenance mode so the GUI alerting doesn't flag it up as being down? Pretty sure the option isn't available through the GUI's GUI if you'll pardon the expression. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Jan 17 21:50:53 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 17 Jan 2017 16:50:53 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM Message-ID: UG, I have a GPFS filesystem. I have a OpenStack private cloud. What is the best way for Nova Compute VMs to have access to data inside the GPFS filesystem? 1)Should VMs mount GPFS directly with a GPFS client? 2) Should the hypervisor mount GPFS and share to nova computes? 3) Should I create GPFS protocol servers that allow nova computes to mount of NFS? All advice is welcome. Best, Brian Marshall Virginia Tech -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Tue Jan 17 21:16:20 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Tue, 17 Jan 2017 16:16:20 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs Message-ID: I have messages like these frequent my logs: Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 vendor_err 136 Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error IBV_WC_REM_ACCESS_ERR index 23 Any ideas on cause..? -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jan 18 00:47:04 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Tue, 17 Jan 2017 19:47:04 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> I think the 1st option creates the challenges both with security (e.g. do you fully trust the users of your VMs not to do bad things as root either maliciously or accidentally? how do you ensure userids are properly mapped inside the guest?) and logistically (as VMs come and go how do you automate adding them/removing them to/from the GPFS cluster). I think the 2nd option is ideal perhaps using something like 9p (http://www.linux-kvm.org/page/9p_virtio) to export filesystems from the hypervisor to the guest. I'm not sure how you would integrate this with Nova and I've heard from others that there are stability issues, but I can't comment first hand. Another option might be to NFS/CIFS export the filesystems from the hypervisor to the guests via the 169.254.169.254 metadata address although I don't know how feasible that may or may not be. The advantage to using the metadata address is it should scale well and it should take the pain out of a guest mapping an IP address to its local hypervisor using an external method. Perhaps number 3 is the best way to go, especially (arguably only) if you use kerberized NFS or SMB. That way you don't have to trust anything about the guest and you theoretically should get decent performance. I'm really curious what other folks have done on this front. -Aaron On 1/17/17 4:50 PM, Brian Marshall wrote: > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to > mount of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From malone12 at illinois.edu Wed Jan 18 03:05:15 2017 From: malone12 at illinois.edu (Maloney, John Daniel) Date: Wed, 18 Jan 2017 03:05:15 +0000 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> References: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> Message-ID: <6CADE9B2-3691-4F44-B241-DABA02385B42@illinois.edu> I agree with Aaron on option 1, trusting users to do nothing malicious would be quite a stretch for most people?s use cases. Even if they do, if their user?s credentials getting stolen, and then used by someone else it could be a real issue as the hacker wouldn?t have to get lucky and find a VM with an un-patched root escalation vulnerability. Security aside, you?ll probably want to make sure your VMs have an external IP that is able to be reached by the GPFS cluster. We found having GPFS route through the Openstack NAT to be possible, but tricky (though this was an older version of Openstack?could be better now?). Using the external IP may be the natural way for most folks, but wanted to point it out none-the-less. We haven?t done much in regards to option 2, we?ve done work using native clients on the hypervisors to provide cinder/glance storage, but not to share other data into the VM?s. Currently use option 3 to export group?s project directories to their VMs using the CES protocol nodes. It?s getting the job done right now (have close to 100 VMs mounting from it). I would definitely recommend giving your maxFilesToCache and maxStatCache parameters a big bump from defaults on the export nodes if you weren?t planning to already (set mine at 1,000,000 on each of those). We saw that become a point of contention with our user?s workloads. That change was implemented fairly recently and so far, so good. Aaron?s point about logistics from his answer to option 1 is relevant here too, especially if you have high VM turnover rate where IP addresses are recycled and different projects are getting exported. You?ll want to keep track of VM?s and exports to prevent a new VM from picking up an old IP that has access on an export it isn?t supposed to because it hasn?t been flushed out. In our situation there are 30-40 projects, all names of them known to users who ls the project directory, wouldn?t take much for them to spin up a new VM and give them all a try. I agree this is a really interesting topic, there?s a lot of ways to come at this so hopefully more folks chime in on what they?re doing. Best, J.D. Maloney Storage Engineer | Storage Enabling Technologies Group National Center for Supercomputing Applications (NCSA) On 1/17/17, 6:47 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister" wrote: I think the 1st option creates the challenges both with security (e.g. do you fully trust the users of your VMs not to do bad things as root either maliciously or accidentally? how do you ensure userids are properly mapped inside the guest?) and logistically (as VMs come and go how do you automate adding them/removing them to/from the GPFS cluster). I think the 2nd option is ideal perhaps using something like 9p (http://www.linux-kvm.org/page/9p_virtio) to export filesystems from the hypervisor to the guest. I'm not sure how you would integrate this with Nova and I've heard from others that there are stability issues, but I can't comment first hand. Another option might be to NFS/CIFS export the filesystems from the hypervisor to the guests via the 169.254.169.254 metadata address although I don't know how feasible that may or may not be. The advantage to using the metadata address is it should scale well and it should take the pain out of a guest mapping an IP address to its local hypervisor using an external method. Perhaps number 3 is the best way to go, especially (arguably only) if you use kerberized NFS or SMB. That way you don't have to trust anything about the guest and you theoretically should get decent performance. I'm really curious what other folks have done on this front. -Aaron On 1/17/17 4:50 PM, Brian Marshall wrote: > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to > mount of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Wed Jan 18 08:46:53 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 18 Jan 2017 08:46:53 +0000 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> References: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> Message-ID: >Another option might be to NFS/CIFS export the >filesystems from the hypervisor to the guests via the 169.254.169.254 >metadata address although I don't know how feasible that may or may not Doesn't the metadata IP site on the network nodes though and not the hypervisor? We currently have created interfaces on out net nodes attached to the appropriate VLAN/VXLAN and then run CES on top of that. The problem with this is if you have the same subnet existing in two networks, then you have a problem. I had some discussion with some of the IBM guys about the possibility of using a different CES protocol group and running multiple ganesha servers (maybe a container attached to the net?) so you could then have different NFS configs on different ganesha instances with CES managing a floating IP that could exist multiple times. There were some potential issues in the way the CES HA bits work though with this approach. Simon From S.J.Thompson at bham.ac.uk Wed Jan 18 08:59:48 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 18 Jan 2017 08:59:48 +0000 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: References: Message-ID: I'd be inclined to look at something like: ibqueryerrors -s PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors -c And see if you have a high number of symbol errors, might be a cable needs replugging or replacing. Simon From: > on behalf of "J. Eric Wonderley" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 17 January 2017 at 21:16 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs I have messages like these frequent my logs: Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 vendor_err 136 Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error IBV_WC_REM_ACCESS_ERR index 23 Any ideas on cause..? -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jan 18 15:22:51 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 18 Jan 2017 10:22:51 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: References: Message-ID: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> I'm curious about this too. We see these messages sometimes when things have gone horribly wrong but also sometimes during recovery events. Here's a recent one: loremds20 (manager/nsd node): Mon Jan 16 14:19:02.048 2017: [E] VERBS RDMA rdma read error IBV_WC_REM_ACCESS_ERR to 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 vendor_err 136 Mon Jan 16 14:19:02.049 2017: [E] VERBS RDMA closed connection to 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 due to RDMA read error IBV_WC_REM_ACCESS_ERR index 11 lorej006 (client): Mon Jan 16 14:19:01.990 2017: [N] VERBS RDMA closed connection to 10.101.53.18 (loremds18) on mlx5_0 port 1 fabnum 3 index 2 Mon Jan 16 14:19:01.995 2017: [N] VERBS RDMA closed connection to 10.101.53.19 (loremds19) on mlx5_0 port 1 fabnum 3 index 0 Mon Jan 16 14:19:01.997 2017: [I] Recovering nodes: 10.101.53.18 10.101.53.19 Mon Jan 16 14:19:02.047 2017: [W] VERBS RDMA async event IBV_EVENT_QP_ACCESS_ERR on mlx5_0 qp 0x7fffe550f1c8. Mon Jan 16 14:19:02.051 2017: [E] VERBS RDMA closed connection to 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 error 733 index 1 Mon Jan 16 14:19:02.071 2017: [I] Recovered 2 nodes for file system tnb32. Mon Jan 16 14:19:02.140 2017: [I] VERBS RDMA connecting to 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 index 0 Mon Jan 16 14:19:02.160 2017: [I] VERBS RDMA connected to 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 sl 0 index 0 I had just shut down loremds18 and loremds19 so there was certainly recovery taking place and during that time is when the error seems to have occurred. I looked up the meaning of IBV_WC_REM_ACCESS_ERR here (http://www.rdmamojo.com/2013/02/15/ibv_poll_cq/) and see this: IBV_WC_REM_ACCESS_ERR (10) - Remote Access Error: a protection error occurred on a remote data buffer to be read by an RDMA Read, written by an RDMA Write or accessed by an atomic operation. This error is reported only on RDMA operations or atomic operations. Relevant for RC QPs. my take on it during recovery it seems like one end of the connection more or less hanging up on the other end (e.g. Connection reset by peer /ECONNRESET). But like I said at the start, we also see this when there something has gone awfully wrong. -Aaron On 1/18/17 3:59 AM, Simon Thompson (Research Computing - IT Services) wrote: > I'd be inclined to look at something like: > > ibqueryerrors -s > PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors > -c > > And see if you have a high number of symbol errors, might be a cable > needs replugging or replacing. > > Simon > > From: > on behalf of "J. Eric > Wonderley" > > Reply-To: "gpfsug-discuss at spectrumscale.org > " > > > Date: Tuesday, 17 January 2017 at 21:16 > To: "gpfsug-discuss at spectrumscale.org > " > > > Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs > > I have messages like these frequent my logs: > Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error > IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 > vendor_err 136 > Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to > 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error > IBV_WC_REM_ACCESS_ERR index 23 > > Any ideas on cause..? > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jan 18 15:56:16 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 18 Jan 2017 15:56:16 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Message-ID: Hi All, We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.b.mills at nasa.gov Wed Jan 18 16:10:51 2017 From: jonathan.b.mills at nasa.gov (Jonathan Mills) Date: Wed, 18 Jan 2017 11:10:51 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> Message-ID: <8d41b8c8-eb84-3d1c-eec2-d26f1816108b@nasa.gov> On 1/18/17 3:46 AM, Simon Thompson (Research Computing - IT Services) wrote: > >> Another option might be to NFS/CIFS export the >> filesystems from the hypervisor to the guests via the 169.254.169.254 >> metadata address although I don't know how feasible that may or may not > > Doesn't the metadata IP site on the network nodes though and not the > hypervisor? Not when Neutron is in DVR mode. It is intercepted at the hypervisor and redirected to the neutron-ns-metadata-proxy. See below: [root at gpcc003 ~]# ip netns exec qrouter-bc4aa217-5128-4eec-b9af-67923dae319a iptables -t nat -nvL neutron-l3-agent-PREROUTING Chain neutron-l3-agent-PREROUTING (1 references) pkts bytes target prot opt in out source destination 19 1140 REDIRECT tcp -- qr-+ * 0.0.0.0/0 169.254.169.254 tcp dpt:80 redir ports 9697 281 12650 DNAT all -- rfp-bc4aa217-5 * 0.0.0.0/0 169.154.180.32 to:10.0.4.22 [root at gpcc003 ~]# ip netns exec qrouter-bc4aa217-5128-4eec-b9af-67923dae319a netstat -tulpn |grep 9697 tcp 0 0 0.0.0.0:9697 0.0.0.0:* LISTEN 28130/python2 [root at gpcc003 ~]# ps aux |grep 28130 neutron 28130 0.0 0.0 286508 41364 ? S Jan04 0:02 /usr/bin/python2 /bin/neutron-ns-metadata-proxy --pid_file=/var/lib/neutron/external/pids/bc4aa217-5128-4eec-b9af-67923dae319a.pid --metadata_proxy_socket=/var/lib/neutron/metadata_proxy --router_id=bc4aa217-5128-4eec-b9af-67923dae319a --state_path=/var/lib/neutron --metadata_port=9697 --metadata_proxy_user=989 --metadata_proxy_group=986 --verbose --log-file=neutron-ns-metadata-proxy-bc4aa217-5128-4eec-b9af-67923dae319a.log --log-dir=/var/log/neutron root 31220 0.0 0.0 112652 972 pts/1 S+ 11:08 0:00 grep --color=auto 28130 > > We currently have created interfaces on out net nodes attached to the > appropriate VLAN/VXLAN and then run CES on top of that. > > The problem with this is if you have the same subnet existing in two > networks, then you have a problem. > > I had some discussion with some of the IBM guys about the possibility of > using a different CES protocol group and running multiple ganesha servers > (maybe a container attached to the net?) so you could then have different > NFS configs on different ganesha instances with CES managing a floating IP > that could exist multiple times. > > There were some potential issues in the way the CES HA bits work though > with this approach. > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Jonathan Mills / jonathan.mills at nasa.gov NASA GSFC / NCCS HPC (606.2) Bldg 28, Rm. S230 / c. 252-412-5710 From mimarsh2 at vt.edu Wed Jan 18 16:22:12 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Wed, 18 Jan 2017 11:22:12 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: To answer some more questions: What sort of workload will your Nova VM's be running? This is largely TBD but we anticipate webapps and other non-batch ways of interacting with and post processing data that has been computed on HPC batch systems. For example a user might host a website that allows users to view pieces of a large data set and do some processing in private cloud or kick off larger jobs on HPC clusters How many VM's are you running? This work is still in the design / build phase. We have 48 servers slated for the project. At max maybe 500 VMs; again this is a pretty wild estimate. This is a new service we are looking to provide What is your Network interconnect between the Scale Storage cluster and the Nova Compute cluster Each nova node has a dual 10gigE connection to switches that uplink to our core 40 gigE switches were NSD Servers are directly connectly. The information so far has been awesome. Thanks everyone. I am definitely leaning towards option #3 of creating protocol servers. Are there any design/build white papers targetting the virutalization use case? Thanks, Brian On Tue, Jan 17, 2017 at 5:55 PM, Andrew Beattie wrote: > HI Brian, > > > Couple of questions for you: > > What sort of workload will your Nova VM's be running? > How many VM's are you running? > What is your Network interconnect between the Scale Storage cluster and > the Nova Compute cluster > > I have cc'd Jake Carrol from University of Queensland in on the email as I > know they have done some basic performance testing using Scale to provide > storage to Openstack. > One of the issues that they found was the Openstack network translation > was a performance limiting factor. > > I think from memory the best performance scenario they had was, when they > installed the scale client locally into the virtual machines > > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Brian Marshall > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM > Date: Wed, Jan 18, 2017 7:51 AM > > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to mount > of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Wed Jan 18 16:58:24 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Wed, 18 Jan 2017 11:58:24 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> References: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> Message-ID: As background, we recently upgraded GPFS from 4.2.0 to 4.2.1 and updated the Mellanox OFED on our compute cluster to allow it to move from CentOS 7.1 to 7.2 We do some transient warnings from the Mellanox switch gear about various port counters that we are tracking down with them. Jobs and filesystem seem stable, but the logs are concerning. On Wed, Jan 18, 2017 at 10:22 AM, Aaron Knister wrote: > I'm curious about this too. We see these messages sometimes when things > have gone horribly wrong but also sometimes during recovery events. Here's > a recent one: > > loremds20 (manager/nsd node): > Mon Jan 16 14:19:02.048 2017: [E] VERBS RDMA rdma read error > IBV_WC_REM_ACCESS_ERR to 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 > vendor_err 136 > Mon Jan 16 14:19:02.049 2017: [E] VERBS RDMA closed connection to > 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 due to RDMA read error > IBV_WC_REM_ACCESS_ERR index 11 > > lorej006 (client): > Mon Jan 16 14:19:01.990 2017: [N] VERBS RDMA closed connection to > 10.101.53.18 (loremds18) on mlx5_0 port 1 fabnum 3 index 2 > Mon Jan 16 14:19:01.995 2017: [N] VERBS RDMA closed connection to > 10.101.53.19 (loremds19) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:01.997 2017: [I] Recovering nodes: 10.101.53.18 > 10.101.53.19 > Mon Jan 16 14:19:02.047 2017: [W] VERBS RDMA async event > IBV_EVENT_QP_ACCESS_ERR on mlx5_0 qp 0x7fffe550f1c8. > Mon Jan 16 14:19:02.051 2017: [E] VERBS RDMA closed connection to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 error 733 index 1 > Mon Jan 16 14:19:02.071 2017: [I] Recovered 2 nodes for file system tnb32. > Mon Jan 16 14:19:02.140 2017: [I] VERBS RDMA connecting to 10.101.53.20 > (loremds20) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:02.160 2017: [I] VERBS RDMA connected to 10.101.53.20 > (loremds20) on mlx5_0 port 1 fabnum 3 sl 0 index 0 > > I had just shut down loremds18 and loremds19 so there was certainly > recovery taking place and during that time is when the error seems to have > occurred. > > I looked up the meaning of IBV_WC_REM_ACCESS_ERR here ( > http://www.rdmamojo.com/2013/02/15/ibv_poll_cq/) and see this: > > IBV_WC_REM_ACCESS_ERR (10) - Remote Access Error: a protection error > occurred on a remote data buffer to be read by an RDMA Read, written by an > RDMA Write or accessed by an atomic operation. This error is reported only > on RDMA operations or atomic operations. Relevant for RC QPs. > > my take on it during recovery it seems like one end of the connection more > or less hanging up on the other end (e.g. Connection reset by peer > /ECONNRESET). > > But like I said at the start, we also see this when there something has > gone awfully wrong. > > -Aaron > > On 1/18/17 3:59 AM, Simon Thompson (Research Computing - IT Services) > wrote: > >> I'd be inclined to look at something like: >> >> ibqueryerrors -s >> PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemot >> ePhysicalErrors >> -c >> >> And see if you have a high number of symbol errors, might be a cable >> needs replugging or replacing. >> >> Simon >> >> From: > > on behalf of "J. Eric >> Wonderley" > >> Reply-To: "gpfsug-discuss at spectrumscale.org >> " >> > mscale.org>> >> Date: Tuesday, 17 January 2017 at 21:16 >> To: "gpfsug-discuss at spectrumscale.org >> " >> > mscale.org>> >> Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs >> >> I have messages like these frequent my logs: >> Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error >> IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 >> vendor_err 136 >> Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to >> 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error >> IBV_WC_REM_ACCESS_ERR index 23 >> >> Any ideas on cause..? >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From veb2005 at med.cornell.edu Wed Jan 18 22:54:10 2017 From: veb2005 at med.cornell.edu (Vanessa Borcherding) Date: Wed, 18 Jan 2017 17:54:10 -0500 Subject: [gpfsug-discuss] Issue with X forwarding Message-ID: Hi All, We've got a new-ish 4.1.1.0 Advanced cluster and we've run into a strange problem: users who have their home directory on the GPFS filesystem cannot do X11 forwarding. They get the following error: "/usr/bin/xauth: error in locking authority file /home/user/.Xauthority" The file ~/.Xauthority is there and also a new one ~/.Xauthority-c. Similarly, "xauth -b" also fails: Attempting to break locks on authority file /home/user/.Xauthority xauth: error in locking authority file /home/user/.Xauthority This behavior happens regardless of the client involved, and happens across multiple OS/kernel versions, and if GPFS is mounted natively or via NFS export. For any given host, if the user's home directory is moved to another NFS-exported location, X forwarding works correctly. Has anyone seen this before, or have any idea as to where it's coming from? Thanks, Vanessa -- * * * * * Vanessa Borcherding Director, Scientific Computing Technology Manager - Applied Bioinformatics Core Dept. of Physiology and Biophysics Institute for Computational Biomedicine Weill Cornell Medical College (212) 746-6281 - office (917) 861-9777 - cell * * * * * -------------- next part -------------- An HTML attachment was scrubbed... URL: From farid.chabane at ymail.com Thu Jan 19 06:00:54 2017 From: farid.chabane at ymail.com (FC) Date: Thu, 19 Jan 2017 06:00:54 +0000 (UTC) Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> Message-ID: <51281598.14159900.1484805654772@mail.yahoo.com> Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Jan 19 08:45:20 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 19 Jan 2017 09:45:20 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 19 15:46:55 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jan 2017 15:46:55 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: Hi Olaf, The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? Thanks... Kevin On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: have you checked, where th fsmgr runs as you have nodes with different code levels mmlsmgr From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 01/18/2017 04:57 PM Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Jan 19 16:05:41 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 19 Jan 2017 17:05:41 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 19 16:25:20 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jan 2017 16:25:20 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> Hi Olaf, We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: 1. mmrepquota reports mostly GIDs, only a few group names 2. /etc/nsswitch.conf says to look at files first 3. the GID is in /etc/group 4. length of group name doesn?t matter I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) Thanks? Kevin On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster cheers From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 01/19/2017 04:48 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? Thanks... Kevin On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: have you checked, where th fsmgr runs as you have nodes with different code levels mmlsmgr From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 01/18/2017 04:57 PM Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Thu Jan 19 16:36:42 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Thu, 19 Jan 2017 17:36:42 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> Message-ID: <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek From peserocka at gmail.com Thu Jan 19 17:07:55 2017 From: peserocka at gmail.com (Peter Serocka) Date: Fri, 20 Jan 2017 01:07:55 +0800 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: <7D8E5B3D-6BA9-4362-984D-6A74448FA7BC@gmail.com> Any caching in effect? Like nscd which is configured separately in /etc/nscd.conf Any insights from strace?ing mmrepquota? For example, when a plain ls -l doesn?t look groups up in /etc/group but queries from nscd instead, strace output has something like: connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0 sendto(4, "\2\0\0\0\f\0\0\0\6\0\0\0group\0", 18, MSG_NOSIGNAL, NULL, 0) = 18 ? Peter > On 2017 Jan 19 Thu, at 23:46, Buterbaugh, Kevin L wrote: > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > >> On Jan 19, 2017, at 2:45 AM, Olaf Weiser wrote: >> >> have you checked, where th fsmgr runs as you have nodes with different code levels >> >> mmlsmgr >> >> >> >> >> From: "Buterbaugh, Kevin L" >> To: gpfsug main discussion list >> Date: 01/18/2017 04:57 PM >> Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi All, >> >> We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. >> >> From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. >> >> However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). >> >> I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) >> >> I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? >> >> Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Thu Jan 19 17:16:27 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 19 Jan 2017 18:16:27 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Thu Jan 19 18:07:32 2017 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 19 Jan 2017 19:07:32 +0100 Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 In-Reply-To: <51281598.14159900.1484805654772@mail.yahoo.com> References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> <51281598.14159900.1484805654772@mail.yahoo.com> Message-ID: Hi Farid, there is no official way for disabling the system health monitoring because other components rely on it (e.g. GUI, CES, Install Toolkit,..) If you are fine with the consequences you can just delete the mmsysmonitor.conf, which will prevent the monitor from starting. During our testing we did not see a significant performance impact caused by the monitoring. In 4.2.2 some component monitors (e.g. disk) have been further improved to reduce polling and use notifications instead. Nevertheless, I would like to better understand what the issue is. What kind of workload do you run ? Do you see spikes in CPU usage every 30 seconds ? Is it the same on all cluster nodes or just on some of them ? Could you send us the output of "mmhealth node show -v" to see which monitors are active. It might make sense to open a PMR to get this issue fixed. Thanks. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale - Release Lead Architect (4.2.X Release) System Health and Problem Determination Architect IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: FC To: "gpfsug-discuss at spectrumscale.org" Date: 01/19/2017 07:06 AM Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jan 19 18:21:18 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 19 Jan 2017 18:21:18 +0000 Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 In-Reply-To: References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> <51281598.14159900.1484805654772@mail.yahoo.com>, Message-ID: On some of our nodes we were regularly seeing procees hung timeouts in dmesg from a python process, which I vaguely thought was related to the monitoring process (though we have other python bits from openstack running on these boxes). These are all running 4.2.2.0 code Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mathias Dietz [MDIETZ at de.ibm.com] Sent: 19 January 2017 18:07 To: FC; gpfsug main discussion list Subject: Re: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Hi Farid, there is no official way for disabling the system health monitoring because other components rely on it (e.g. GUI, CES, Install Toolkit,..) If you are fine with the consequences you can just delete the mmsysmonitor.conf, which will prevent the monitor from starting. During our testing we did not see a significant performance impact caused by the monitoring. In 4.2.2 some component monitors (e.g. disk) have been further improved to reduce polling and use notifications instead. Nevertheless, I would like to better understand what the issue is. What kind of workload do you run ? Do you see spikes in CPU usage every 30 seconds ? Is it the same on all cluster nodes or just on some of them ? Could you send us the output of "mmhealth node show -v" to see which monitors are active. It might make sense to open a PMR to get this issue fixed. Thanks. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale - Release Lead Architect (4.2.X Release) System Health and Problem Determination Architect IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: FC To: "gpfsug-discuss at spectrumscale.org" Date: 01/19/2017 07:06 AM Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Thu Jan 19 21:22:40 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Thu, 19 Jan 2017 21:22:40 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz>, Message-ID: <1484860960203.43563@csiro.au> It's not something to do with the value of the GID, like being less or greater than some number? ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Friday, 20 January 2017 3:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x in my eyes.. that's the hint .. not to wait until all 700 clients 'll have been updated .. before open PMR .. ;-) ... From: Lukas Hejtmanek To: gpfsug main discussion list Date: 01/19/2017 05:37 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 19 21:51:07 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jan 2017 21:51:07 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <1484860960203.43563@csiro.au> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> Message-ID: <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> Hi All, Let me try to answer some questions that have been raised by various list members? 1. I am not using nscd. 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me Here?s another point ? I am 95% sure that I have a client that was running 4.2.1.1 and mmrepquota displayed the group names ? I then upgraded GPFS on it ? no other changes ? and now it?s mostly GID?s. I?m not 100% sure because output scrolled out of my terminal buffer. Thanks to all for the suggestions ? please feel free to keep them coming. To any of the GPFS team on this mailing list, at least one other person has reported the same behavior ? is this a known bug? Kevin On Jan 19, 2017, at 3:22 PM, Greg.Lehmann at csiro.au wrote: It's not something to do with the value of the GID, like being less or greater than some number? ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, 20 January 2017 3:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x in my eyes.. that's the hint .. not to wait until all 700 clients 'll have been updated .. before open PMR .. ;-) ... From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 01/19/2017 05:37 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at uni-mainz.de Fri Jan 20 08:41:26 2017 From: martin at uni-mainz.de (Christoph Martin) Date: Fri, 20 Jan 2017 09:41:26 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> Message-ID: Hi, I have a system with two servers with GPFS 4.2.1.2 on SLES 12.1 and some clients with GPFS 4.2.2.1 on SLES 11 and Centos 7. mmrepquota shows on all systems group names. I still have to upgrade the servers to 4.2.2.1. Christoph -- ============================================================================ Christoph Martin, Leiter Unix-Systeme Zentrum f?r Datenverarbeitung, Uni-Mainz, Germany Anselm Franz von Bentzel-Weg 12, 55128 Mainz Telefon: +49(6131)3926337 Instant-Messaging: Jabber: martin at uni-mainz.de (Siehe http://www.zdv.uni-mainz.de/4010.php) -------------- next part -------------- A non-text attachment was scrubbed... Name: martin.vcf Type: text/x-vcard Size: 421 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From Achim.Rehor at de.ibm.com Fri Jan 20 09:01:12 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Fri, 20 Jan 2017 10:01:12 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu><20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From farid.chabane at ymail.com Fri Jan 20 09:02:32 2017 From: farid.chabane at ymail.com (FC) Date: Fri, 20 Jan 2017 09:02:32 +0000 (UTC) Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 In-Reply-To: References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> <51281598.14159900.1484805654772@mail.yahoo.com> Message-ID: <1898813661.15589480.1484902952833@mail.yahoo.com> Hi Mathias, It's OK when we remove the configuration file, the process doens't start. The problem occurs mainly with our compute nodes (all of them) and we don't use GUI and CES. Ideed, I confirm we don't see performance impact with Linpack running on more than hundred nodes, it appears especially when there is a lot of communications wich is the case of our applications, our high speed network is based on Intel OmniPath Fabric. We are seeing irregular iteration time every 30 sec. By Enabling HyperThreading, the issue is a little bit hidden but still there. By using less cores per nodes (26 instead of 28), we don't see this behavior as if it needs one core for mmsysmon process. I agree with you, might be good idea to open a PMR... Please find below the output of mmhealth node show --verbose Node status:???????????? HEALTHY Component??????????????? Status?????????????????? Reasons ------------------------------------------------------------------- GPFS???????????????????? HEALTHY????????????????? - NETWORK????????????????? HEALTHY????????????????? - ? ib0????????????????????? HEALTHY????????????????? - FILESYSTEM?????????????? HEALTHY????????????????? - ? gpfs1??????????????????? HEALTHY????????????????? - ? gpfs2??????????????????? HEALTHY????????????????? - DISK???????????????????? HEALTHY????????????????? - Thanks Farid Le Jeudi 19 janvier 2017 19h21, Simon Thompson (Research Computing - IT Services) a ?crit : On some of our nodes we were regularly seeing procees hung timeouts in dmesg from a python process, which I vaguely thought was related to the monitoring process (though we have other python bits from openstack running on these boxes). These are all running 4.2.2.0 code Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mathias Dietz [MDIETZ at de.ibm.com] Sent: 19 January 2017 18:07 To: FC; gpfsug main discussion list Subject: Re: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Hi Farid, there is no official way for disabling the system health monitoring because other components rely on it (e.g. GUI, CES, Install Toolkit,..) If you are fine with the consequences you can just delete the mmsysmonitor.conf, which will prevent the monitor from starting. During our testing we did not see a significant performance impact caused by the monitoring. In 4.2.2 some component monitors (e.g. disk) have been further improved to reduce polling and use notifications instead. Nevertheless, I would like to better understand what the issue is. What kind of workload do you run ? Do you see spikes in CPU usage every 30 seconds ? Is it the same on all cluster nodes or just on some of them ? Could you send us the output of "mmhealth node show -v" to see which monitors are active. It might make sense to open a PMR to get this issue fixed. Thanks. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale - Release Lead Architect (4.2.X Release) System Health and Problem Determination Architect IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From:? ? ? ? FC To:? ? ? ? "gpfsug-discuss at spectrumscale.org" Date:? ? ? ? 01/19/2017 07:06 AM Subject:? ? ? ? [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Sent by:? ? ? ? gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Fri Jan 20 09:45:04 2017 From: st.graf at fz-juelich.de (Stephan Graf) Date: Fri, 20 Jan 2017 10:45:04 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: Guten Morgen herr Rehor! Ich habe gerade geguckt. Auf dem Knoten, auf dem wir das mmlsquota -g Problem haben, sehe ich auch beim mmlsrepquota -g, dass einige Gruppen nur numerisch ausgegeben werden. Ich kann gerne einen PMR dazu ?ffnen. Viele Gr??e, Stephan Graf On 01/20/17 10:01, Achim Rehor wrote: fully agreed, there are PMRs open on "mmlsquota -g failes : no such group" where the handling of group names vs. ids is being tracked. a PMR on mmrepquota and a slightly different facette of a similar problem might give more and faster insight and solution. Mit freundlichen Gr??en / Kind regards Achim Rehor ________________________________ Software Technical Support Specialist AIX/ Emea HPC Support [cid:part1.A7833F18.D0EA2498 at fz-juelich.de] IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services ________________________________ Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke, Dieter Scholz, Gregor Pillen, Ivo Koerner, Christian Noll Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: Olaf Weiser/Germany/IBM at IBMDE To: gpfsug main discussion list Date: 01/19/2017 06:17 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ in my eyes.. that's the hint .. not to wait until all 700 clients 'll have been updated .. before open PMR .. ;-) ... From: Lukas Hejtmanek To: gpfsug main discussion list Date: 01/19/2017 05:37 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Stephan Graf Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From st.graf at fz-juelich.de Fri Jan 20 10:22:09 2017 From: st.graf at fz-juelich.de (Stephan Graf) Date: Fri, 20 Jan 2017 11:22:09 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> Message-ID: Sorry for the mail. I just can tell, that we are facing the same issue: We run GPFS 4.1.1.11 & 4.2.1.2 In both versions the mmlsquota -g fails. I also tried the mmrepquota -g command on GPFS 4.2.1.2, and some groups are displayed only numerical. Stephan On 01/20/17 09:41, Christoph Martin wrote: Hi, I have a system with two servers with GPFS 4.2.1.2 on SLES 12.1 and some clients with GPFS 4.2.2.1 on SLES 11 and Centos 7. mmrepquota shows on all systems group names. I still have to upgrade the servers to 4.2.2.1. Christoph _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Stephan Graf Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Fri Jan 20 10:54:37 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Fri, 20 Jan 2017 11:54:37 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu><20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From duersch at us.ibm.com Fri Jan 20 14:14:23 2017 From: duersch at us.ibm.com (Steve Duersch) Date: Fri, 20 Jan 2017 09:14:23 -0500 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: Kevin, Please go ahead and open a PMR. Cursorily, we don't know of an obvious known bug. Thank you. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York gpfsug-discuss-bounces at spectrumscale.org wrote on 01/19/2017 04:52:02 PM: > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 01/19/2017 04:52 PM > Subject: gpfsug-discuss Digest, Vol 60, Issue 47 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: mmrepquota and group names in GPFS 4.2.2.x > (Buterbaugh, Kevin L) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 19 Jan 2017 21:51:07 +0000 > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS > 4.2.2.x > Message-ID: <31F584FD-A926-4D86-B365-63EA244DEE45 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > Let me try to answer some questions that have been raised by various > list members? > > 1. I am not using nscd. > 2. getent group with either a GID or a group name resolves GID?s / > names that are being printed as GIDs by mmrepquota > 3. The GID?s in question are all in a normal range ? i.e. some > group names that are being printed by mmrepquota have GIDs ?close? > to others that are being printed as GID?s > 4. strace?ing mmrepquota doesn?t show anything relating to nscd or > anything that jumps out at me > > Here?s another point ? I am 95% sure that I have a client that was > running 4.2.1.1 and mmrepquota displayed the group names ? I then > upgraded GPFS on it ? no other changes ? and now it?s mostly GID?s. > I?m not 100% sure because output scrolled out of my terminal buffer. > > Thanks to all for the suggestions ? please feel free to keep them > coming. To any of the GPFS team on this mailing list, at least one > other person has reported the same behavior ? is this a known bug? > > Kevin > > On Jan 19, 2017, at 3:22 PM, Greg.Lehmann at csiro.au< > mailto:Greg.Lehmann at csiro.au> wrote: > > > It's not something to do with the value of the GID, like being less > or greater than some number? > > ________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org discuss-bounces at spectrumscale.org> mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Olaf > Weiser > > Sent: Friday, 20 January 2017 3:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > > in my eyes.. that's the hint .. not to wait until all 700 clients > 'll have been updated .. before open PMR .. ;-) ... > > > > From: Lukas Hejtmanek >> > To: gpfsug main discussion list mailto:gpfsug-discuss at spectrumscale.org>> > Date: 01/19/2017 05:37 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in > GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org< > mailto:gpfsug-discuss-bounces at spectrumscale.org> > ________________________________ > > > > Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota > resolves only some of group names. > > On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > > Hi Olaf, > > > > We will continue upgrading clients in a rolling fashion, but with > ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t > consider figuring out why this is happening a waste of time and > therefore having systems on both versions is a good thing. > > > > While I would prefer not to paste actual group names and GIDs into > this public forum, I can assure you that on every 4.2.1.1 system > that I have tried this on: > > > > 1. mmrepquota reports mostly GIDs, only a few group names > > 2. /etc/nsswitch.conf says to look at files first > > 3. the GID is in /etc/group > > 4. length of group name doesn?t matter > > > > I have a support contract with IBM, so I can open a PMR if > necessary. I just thought someone on the list might have an idea as > to what is happening or be able to point out the obvious explanation > that I?m missing. ;-) > > > > Thanks? > > > > Kevin > > > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser mailto:olaf.weiser at de.ibm.com>> wrote: > > > > unfortunately , I don't own a cluster right now, which has 4.2.2 > to double check... SpectrumScale should resolve the GID into a name, > if it find the name somewhere... > > > > but in your case.. I would say.. before we waste to much time in a > version-mismatch issue.. finish the rolling migration, especially > RHEL .. and then we continue > > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > > cheers > > > > > > > > From: "Buterbaugh, Kevin L" mailto:Kevin.Buterbaugh at Vanderbilt.Edu> >> > > To: gpfsug main discussion list mailto:gpfsug-discuss at spectrumscale.org> discuss at spectrumscale.org>> > > Date: 01/19/2017 04:48 PM > > Subject: Re: [gpfsug-discuss] mmrepquota and group names in > GPFS 4.2.2.x > > Sent by: gpfsug-discuss-bounces at spectrumscale.org< > mailto:gpfsug-discuss-bounces at spectrumscale.org> discuss-bounces at spectrumscale.org> > > ________________________________ > > > > > > > > Hi Olaf, > > > > The filesystem manager runs on one of our servers, all of which > are upgraded to 4.2.2.x. > > > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf > has ?files? listed first for /etc/group. > > > > In addition to a mixture of GPFS versions, we also have a mixture > of OS versions (RHEL 6/7). AFAIK tell with all of my testing / > experimenting the only factor that seems to change the behavior of > mmrepquota in regards to GIDs versus group names is the GPFS version. > > > > Other ideas, anyone? Is anyone else in a similar situation and > can test whether they see similar behavior? > > > > Thanks... > > > > Kevin > > > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser mailto:olaf.weiser at de.ibm.com>> wrote: > > > > have you checked, where th fsmgr runs as you have nodes with > different code levels > > > > mmlsmgr > > > > > > > > > > From: "Buterbaugh, Kevin L" mailto:Kevin.Buterbaugh at Vanderbilt.Edu> >> > > To: gpfsug main discussion list mailto:gpfsug-discuss at spectrumscale.org> discuss at spectrumscale.org>> > > Date: 01/18/2017 04:57 PM > > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > > Sent by: gpfsug-discuss-bounces at spectrumscale.org< > mailto:gpfsug-discuss-bounces at spectrumscale.org> discuss-bounces at spectrumscale.org> > > ________________________________ > > > > > > > > Hi All, > > > > We recently upgraded our cluster (well, the servers are all > upgraded; the clients are still in progress) from GPFS 4.2.1.1 to > GPFS 4.2.2.1 and there appears to be a change in how mmrepquota > handles group names in its? output. I?m trying to get a handle on > it, because it is messing with some of my scripts and - more > importantly - because I don?t understand the behavior. > > > > From one of my clients which is still running GPFS 4.2.1.1 I can > run an ?mmrepquota -g ? and if the group exists in /etc/group > the group name is displayed. Of course, if the group doesn?t exist > in /etc/group, the GID is displayed. Makes sense. > > > > However, on my servers which have been upgraded to GPFS 4.2.2.1 > most - but not all - of the time I see GID numbers instead of group > names. My question is, what is the criteria GPFS 4.2.2.x is using > to decide when to display a GID instead of a group name? It?s > apparently *not* the length of the name of the group, because I have > output in front of me where a 13 character long group name is > displayed but a 7 character long group name is *not* displayed - > its? GID is instead (and yes, both exist in /etc/group). > > > > I know that sample output would be useful to illustrate this, but > I do not want to post group names or GIDs to a public mailing list ? > if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > > > I am in the process of updating scripts to use ?mmrepquota -gn > ? and then looking up the group name myself, but I want to try > to understand this. Thanks? > > > > Kevin > > > > > > ? > > Kevin Buterbaugh - Senior System Administrator > > Vanderbilt University - Advanced Computing Center for Research andEducation > > Kevin.Buterbaugh at vanderbilt.edu< > mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633 > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > Luk?? Hejtm?nek > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20170119/8e599938/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 60, Issue 47 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 20 14:33:23 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 20 Jan 2017 14:33:23 +0000 Subject: [gpfsug-discuss] Weird log message Message-ID: So today I was just trying to collect a gpfs.snap to log a ticket, and part way through the log collection it said: Month '12' out of range 0..11 at /usr/lpp/mmfs/bin/mmlogsort line 114. This is a cluster running 4.2.2.0 It carried on anyway so hardly worth me logging a ticket, but just in case someone want to pick it up internally in IBM ...? Simon From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jan 20 15:09:06 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 20 Jan 2017 15:09:06 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> Message-ID: <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> Hi Phil, Nope - that was the very first thought I had but on a 4.2.2.1 node I have a 13 character group name displaying and a resolvable 7 character long group name being displayed as its? GID? Kevin > On Jan 20, 2017, at 9:06 AM, Phil Pishioneri wrote: > > On 1/19/17 4:51 PM, Buterbaugh, Kevin L wrote: >> Hi All, >> >> Let me try to answer some questions that have been raised by various list members? >> >> 1. I am not using nscd. >> 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota >> 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s >> 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me >> > > Anything unique about the lengths of the names of the affected groups? (i.e., all a certain value, all greater than some value, etc.) > > -Phil From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jan 20 15:10:05 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 20 Jan 2017 15:10:05 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: <8F3B6E42-6B37-48DF-8870-0CC5F293DCF7@vanderbilt.edu> Steve, I just opened a PMR - thanks? Kevin On Jan 20, 2017, at 8:14 AM, Steve Duersch > wrote: Kevin, Please go ahead and open a PMR. Cursorily, we don't know of an obvious known bug. Thank you. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Jan 20 15:32:17 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Fri, 20 Jan 2017 10:32:17 -0500 Subject: [gpfsug-discuss] Path to NSD lost when host_sas_address changed on port Message-ID: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> We have most of our GPFS NSD storage set up as pairs of RAID boxes served by failover pairs of servers. Most of it is FibreChannel, but the newest four boxes and servers are using dual port SAS controllers. Just this week, we had one server lose one out of the paths to one of the raid boxes. Took a while to realize what happened, but apparently the port2 ID changed from 51866da05cf7b001 to 51866da05cf7b002 on the fly, without rebooting. Port1 is still 51866da05cf7b000, which is the card ID (host_add). We?re running gpfs 4.2.2.1 on RHEL7.2 on these hosts. Has anyone else seen this kind of behavior? First noticed these messages, 3 hours 13 minutes after boot: Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd The multipath daemon was sending lots of log messages like: Jan 10 13:49:22 storage043 multipathd: mpathw: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:64 1] Jan 10 13:49:22 storage043 multipathd: mpathaa: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:96 1] Jan 10 13:49:22 storage043 multipathd: mpathx: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:128 1] Currently worked around problem by including 00 01 and 02 for all 8 SAS cards when mapping LUN/volume to host groups. Thanks, ? ddj Dave Johnson Brown University CCV -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 20 15:43:56 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 20 Jan 2017 15:43:56 +0000 Subject: [gpfsug-discuss] SOBAR questions Message-ID: We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon From eric.wonderley at vt.edu Fri Jan 20 16:14:09 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 20 Jan 2017 11:14:09 -0500 Subject: [gpfsug-discuss] Path to NSD lost when host_sas_address changed on port In-Reply-To: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> References: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> Message-ID: Maybe multipath is not seeing all of the wwns? multipath -v3 | grep ^51855 look ok? For some unknown reason multipath does not see our sandisk array...we have to add them to the end of /etc/multipath/wwids file On Fri, Jan 20, 2017 at 10:32 AM, David D. Johnson wrote: > We have most of our GPFS NSD storage set up as pairs of RAID boxes served > by failover pairs of servers. > Most of it is FibreChannel, but the newest four boxes and servers are > using dual port SAS controllers. > Just this week, we had one server lose one out of the paths to one of the > raid boxes. Took a while > to realize what happened, but apparently the port2 ID changed from > 51866da05cf7b001 to > 51866da05cf7b002 on the fly, without rebooting. Port1 is still > 51866da05cf7b000, which is the card ID (host_add). > > We?re running gpfs 4.2.2.1 on RHEL7.2 on these hosts. > > Has anyone else seen this kind of behavior? > First noticed these messages, 3 hours 13 minutes after boot: > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > > The multipath daemon was sending lots of log messages like: > Jan 10 13:49:22 storage043 multipathd: mpathw: load table [0 4642340864 > multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 > 1 8:64 1] > Jan 10 13:49:22 storage043 multipathd: mpathaa: load table [0 4642340864 > multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 > 1 8:96 1] > Jan 10 13:49:22 storage043 multipathd: mpathx: load table [0 4642340864 > multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 > 1 8:128 1] > > Currently worked around problem by including 00 01 and 02 for all 8 SAS > cards when mapping LUN/volume to host groups. > > Thanks, > ? ddj > Dave Johnson > Brown University CCV > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Jan 20 16:27:30 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Fri, 20 Jan 2017 11:27:30 -0500 Subject: [gpfsug-discuss] Path to NSD lost when host_sas_address changed on port In-Reply-To: References: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> Message-ID: Actually, we can see all the Volume LUN WWNs such as 3600a098000a11f990000022457cf5091 1:0:0:0 sdb 8:16 14 undef ready DELL 3600a098000a0b4ea000001fd57cf50b2 1:0:0:1 sdc 8:32 9 undef ready DELL 3600a098000a11f990000024457cf576f 1:0:0:10 sdl 8:176 14 undef ready DELL (45 lines, 11 LUNs from each controller, each showing up twice, plus the boot volume) My problem involves the ID of the server's host adapter as seen by the 60 drive RAID box. [root at storage043 scsi]# lsscsi -Ht [0] megaraid_sas [1] mpt3sas sas:0x51866da05f388a00 [2] ahci sata: [3] ahci sata: [4] ahci sata: [5] ahci sata: [6] ahci sata: [7] ahci sata: [8] ahci sata: [9] ahci sata: [10] ahci sata: [11] ahci sata: [12] mpt3sas sas:0x51866da05cf7b000 Each card [1] and [12] is a dual port card. The address of the second port is not consistent. ? ddj > On Jan 20, 2017, at 11:14 AM, J. Eric Wonderley wrote: > > > Maybe multipath is not seeing all of the wwns? > > multipath -v3 | grep ^51855 look ok? > > For some unknown reason multipath does not see our sandisk array...we have to add them to the end of /etc/multipath/wwids file > > > On Fri, Jan 20, 2017 at 10:32 AM, David D. Johnson > wrote: > We have most of our GPFS NSD storage set up as pairs of RAID boxes served by failover pairs of servers. > Most of it is FibreChannel, but the newest four boxes and servers are using dual port SAS controllers. > Just this week, we had one server lose one out of the paths to one of the raid boxes. Took a while > to realize what happened, but apparently the port2 ID changed from 51866da05cf7b001 to > 51866da05cf7b002 on the fly, without rebooting. Port1 is still 51866da05cf7b000, which is the card ID (host_add). > > We?re running gpfs 4.2.2.1 on RHEL7.2 on these hosts. > > Has anyone else seen this kind of behavior? > First noticed these messages, 3 hours 13 minutes after boot: > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > > The multipath daemon was sending lots of log messages like: > Jan 10 13:49:22 storage043 multipathd: mpathw: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:64 1] > Jan 10 13:49:22 storage043 multipathd: mpathaa: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:96 1] > Jan 10 13:49:22 storage043 multipathd: mpathx: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:128 1] > > Currently worked around problem by including 00 01 and 02 for all 8 SAS cards when mapping LUN/volume to host groups. > > Thanks, > ? ddj > Dave Johnson > Brown University CCV > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From duersch at us.ibm.com Fri Jan 20 16:54:12 2017 From: duersch at us.ibm.com (Steve Duersch) Date: Fri, 20 Jan 2017 11:54:12 -0500 Subject: [gpfsug-discuss] Weird log message In-Reply-To: References: Message-ID: This is a known bug. It is fixed in 4.2.2.1. It does not impact any of the gathering of information. It impacts the sorting of the logs, but all the logs will be there. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York > > Message: 1 > Date: Fri, 20 Jan 2017 14:33:23 +0000 > From: "Simon Thompson (Research Computing - IT Services)" > > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [gpfsug-discuss] Weird log message > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > > So today I was just trying to collect a gpfs.snap to log a ticket, and > part way through the log collection it said: > > Month '12' out of range 0..11 at /usr/lpp/mmfs/bin/mmlogsort line 114. > > This is a cluster running 4.2.2.0 > > It carried on anyway so hardly worth me logging a ticket, but just in case > someone want to pick it up internally in IBM ...? > > Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Jan 20 16:57:56 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 20 Jan 2017 11:57:56 -0500 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaurang.tapase at in.ibm.com Fri Jan 20 18:04:45 2017 From: gaurang.tapase at in.ibm.com (Gaurang Tapase) Date: Fri, 20 Jan 2017 23:34:45 +0530 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: Hi Brian, For option #3, you can use GPFS Manila (OpenStack shared file system service) driver for exporting data from protocol servers to the OpenStack VMs. It was updated to support CES in the Newton release. A new feature of bringing existing filesets under Manila management has also been added recently. Thanks, Gaurang ------------------------------------------------------------------------ Gaurang S Tapase Spectrum Scale & OpenStack IBM India Storage Lab, Pune (India) Email : gaurang.tapase at in.ibm.com Phone : +91-20-42025699 (W), +91-9860082042(Cell) ------------------------------------------------------------------------- From: Brian Marshall To: gpfsug main discussion list Date: 01/18/2017 09:52 PM Subject: Re: [gpfsug-discuss] Mounting GPFS data on OpenStack VM Sent by: gpfsug-discuss-bounces at spectrumscale.org To answer some more questions: What sort of workload will your Nova VM's be running? This is largely TBD but we anticipate webapps and other non-batch ways of interacting with and post processing data that has been computed on HPC batch systems. For example a user might host a website that allows users to view pieces of a large data set and do some processing in private cloud or kick off larger jobs on HPC clusters How many VM's are you running? This work is still in the design / build phase. We have 48 servers slated for the project. At max maybe 500 VMs; again this is a pretty wild estimate. This is a new service we are looking to provide What is your Network interconnect between the Scale Storage cluster and the Nova Compute cluster Each nova node has a dual 10gigE connection to switches that uplink to our core 40 gigE switches were NSD Servers are directly connectly. The information so far has been awesome. Thanks everyone. I am definitely leaning towards option #3 of creating protocol servers. Are there any design/build white papers targetting the virutalization use case? Thanks, Brian On Tue, Jan 17, 2017 at 5:55 PM, Andrew Beattie wrote: HI Brian, Couple of questions for you: What sort of workload will your Nova VM's be running? How many VM's are you running? What is your Network interconnect between the Scale Storage cluster and the Nova Compute cluster I have cc'd Jake Carrol from University of Queensland in on the email as I know they have done some basic performance testing using Scale to provide storage to Openstack. One of the issues that they found was the Openstack network translation was a performance limiting factor. I think from memory the best performance scenario they had was, when they installed the scale client locally into the virtual machines Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Brian Marshall Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM Date: Wed, Jan 18, 2017 7:51 AM UG, I have a GPFS filesystem. I have a OpenStack private cloud. What is the best way for Nova Compute VMs to have access to data inside the GPFS filesystem? 1)Should VMs mount GPFS directly with a GPFS client? 2) Should the hypervisor mount GPFS and share to nova computes? 3) Should I create GPFS protocol servers that allow nova computes to mount of NFS? All advice is welcome. Best, Brian Marshall Virginia Tech _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Fri Jan 20 18:22:11 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 20 Jan 2017 13:22:11 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: Perfect. Thanks for the advice. Further: this might be a basic question: Are their design guides for building CES protocl servers? Brian On Fri, Jan 20, 2017 at 1:04 PM, Gaurang Tapase wrote: > Hi Brian, > > For option #3, you can use GPFS Manila (OpenStack shared file system > service) driver for exporting data from protocol servers to the OpenStack > VMs. > It was updated to support CES in the Newton release. > > A new feature of bringing existing filesets under Manila management has > also been added recently. > > Thanks, > Gaurang > ------------------------------------------------------------------------ > Gaurang S Tapase > Spectrum Scale & OpenStack > IBM India Storage Lab, Pune (India) > Email : gaurang.tapase at in.ibm.com > Phone : +91-20-42025699 <+91%2020%204202%205699> (W), +91-9860082042 > <+91%2098600%2082042>(Cell) > ------------------------------------------------------------------------- > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/18/2017 09:52 PM > Subject: Re: [gpfsug-discuss] Mounting GPFS data on OpenStack VM > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > To answer some more questions: > > What sort of workload will your Nova VM's be running? > This is largely TBD but we anticipate webapps and other non-batch ways of > interacting with and post processing data that has been computed on HPC > batch systems. For example a user might host a website that allows users > to view pieces of a large data set and do some processing in private cloud > or kick off larger jobs on HPC clusters > > How many VM's are you running? > This work is still in the design / build phase. We have 48 servers slated > for the project. At max maybe 500 VMs; again this is a pretty wild > estimate. This is a new service we are looking to provide > > What is your Network interconnect between the Scale Storage cluster and > the Nova Compute cluster > Each nova node has a dual 10gigE connection to switches that uplink to our > core 40 gigE switches were NSD Servers are directly connectly. > > The information so far has been awesome. Thanks everyone. I am > definitely leaning towards option #3 of creating protocol servers. Are > there any design/build white papers targetting the virutalization use case? > > Thanks, > Brian > > On Tue, Jan 17, 2017 at 5:55 PM, Andrew Beattie <*abeattie at au1.ibm.com* > > wrote: > HI Brian, > > > Couple of questions for you: > > What sort of workload will your Nova VM's be running? > How many VM's are you running? > What is your Network interconnect between the Scale Storage cluster and > the Nova Compute cluster > > I have cc'd Jake Carrol from University of Queensland in on the email as I > know they have done some basic performance testing using Scale to provide > storage to Openstack. > One of the issues that they found was the Openstack network translation > was a performance limiting factor. > > I think from memory the best performance scenario they had was, when they > installed the scale client locally into the virtual machines > > > *Andrew Beattie* > *Software Defined Storage - IT Specialist* > *Phone: *614-2133-7927 > *E-mail: **abeattie at au1.ibm.com* > > > ----- Original message ----- > From: Brian Marshall <*mimarsh2 at vt.edu* > > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Cc: > Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM > Date: Wed, Jan 18, 2017 7:51 AM > > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to mount > of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Fri Jan 20 22:23:07 2017 From: ulmer at ulmer.org (Stephen Ulmer) Date: Fri, 20 Jan 2017 17:23:07 -0500 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> Message-ID: <3D2CE694-2A3A-4B5E-8078-238A09681BE8@ulmer.org> My list of questions that might or might not be thought provoking: How about the relative position of the items in the /etc/group file? Are all of the failures later in the file than all of the successes? Do any groups have group passwords (parsing error due to ?different" line format)? Is the /etc/group sorted by either GID or group name (not normally required, but it would be interesting to see if it changed the problem)? Is the set that is translated versus not translated consistent or do they change? (Across all axes of comparison by {node, command invocation, et al.}) Are the not translated groups more or less likely to be the default group of the owning UID? Can you translate the GID other ways? Like with ls? (I think this was in the original problem description, but I don?t remember the answer.) What is you just turn of nscd? -- Stephen > On Jan 20, 2017, at 10:09 AM, Buterbaugh, Kevin L > wrote: > > Hi Phil, > > Nope - that was the very first thought I had but on a 4.2.2.1 node I have a 13 character group name displaying and a resolvable 7 character long group name being displayed as its? GID? > > Kevin > >> On Jan 20, 2017, at 9:06 AM, Phil Pishioneri > wrote: >> >> On 1/19/17 4:51 PM, Buterbaugh, Kevin L wrote: >>> Hi All, >>> >>> Let me try to answer some questions that have been raised by various list members? >>> >>> 1. I am not using nscd. >>> 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota >>> 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s >>> 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me >>> >> >> Anything unique about the lengths of the names of the affected groups? (i.e., all a certain value, all greater than some value, etc.) >> >> -Phil > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Fri Jan 20 22:37:15 2017 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Sat, 21 Jan 2017 08:37:15 +1000 Subject: [gpfsug-discuss] CES permissions Message-ID: Hi we have an existing configuration with a home - cache relationship on linked clusters, we are running CES on the cache cluster. When data is copied to an SMB share the the afm target for the cache is marked dirty and the replication back to the home cluster stops. both clusters are running 4.2.1 We have seen this behaviour whether the acls on the home cluster file system are nfsv4 only or posix and nfsv4 the cache cluster is nfsv4 only so that we can use CES on it for SMB. We are using uid remapping between the cache and the home can anyone suggest why the cache is marked dirty and how we can get around this issue the other thing we would like to do is force group and posix file permissions via samba but these are not supported options in the CES installation of samba any help is appreciated leslie Leslie Elliott, Infrastructure Support Specialist Information Technology Services, The University of Queensland -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Jan 23 01:10:14 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 22 Jan 2017 20:10:14 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? Message-ID: This is going to sound like a ridiculous request, but, is there a way to cause a filesystem to panic everywhere in one "swell foop"? I'm assuming the answer will come with an appropriate disclaimer of "don't ever do this, we don't support it, it might eat your data, summon cthulu, etc.". I swear I've seen the fs manager initiate this type of operation before. I can seem to do it on a per-node basis with "mmfsadm test panic " but if I do that over all 1k nodes in my test cluster at once it results in about 45 minutes of almost total deadlock while each panic is processed by the fs manager. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From abeattie at au1.ibm.com Mon Jan 23 01:16:58 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Mon, 23 Jan 2017 01:16:58 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Mon Jan 23 01:23:34 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Sun, 22 Jan 2017 20:23:34 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: Message-ID: <142910.1485134614@turing-police.cc.vt.edu> On Sun, 22 Jan 2017 20:10:14 -0500, Aaron Knister said: > This is going to sound like a ridiculous request, but, is there a way to > cause a filesystem to panic everywhere in one "swell foop"? (...) > I can seem to do it on a per-node basis with "mmfsadm test panic > " but if I do that over all 1k nodes in my test cluster at > once it results in about 45 minutes of almost total deadlock while each > panic is processed by the fs manager. Sounds like you've already found the upper bound for panicking all at once. :) What exactly are you trying to do here? Force-dismount all over the cluster due to some urgent external condition (UPS fail, whatever)? And how much do you care about file system metadata consistency and/or pending data writes? (Be prepared to Think Outside The Box - the *fastest* way may be to use a controllable power strip in the rack and cut power to your fiber channel switches, isolating the storage *real* fast....) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Jan 23 01:31:06 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 22 Jan 2017 20:31:06 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: Message-ID: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> I was afraid someone would ask :) One possible use would be testing how monitoring reacts to and/or corrects stale filesystems. The use in my case is there's an issue we see quite often where a filesystem won't unmount when trying to shut down gpfs. Linux insists its still busy despite every process being killed on the node just about except init. It's a real pain because it complicates maintenance, requiring a reboot of some nodes prior to patching for example. I dug into it and it appears as though when this happens the filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm trying to debug it further but I need to actually be able to make the condition happen a few more times to debug it. A stripegroup panic isn't a surefire way but it's the only way I've found so far to trigger this behavior somewhat on demand. One way I've found to trigger a mass stripegroup panic is to induce what I call a "301 error": loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted by the system with return code 301 reason code 0 loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument and tickle a known race condition between nodes being expelled from the cluster and a manager node joining the cluster. When this happens it seems to cause a mass stripe group panic that's over in a few minutes. The trick there is that it doesn't happen every time I go through the exercise and when it does there's no guarantee the filesystem that panics is the one in use. If it's not an fs in use then it doesn't help me reproduce the error condition. I was trying to use the "mmfsadm test panic" command to try a more direct approach. Hope that helps shed some light. -Aaron On 1/22/17 8:16 PM, Andrew Beattie wrote: > Out of curiosity -- why would you want to? > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Aaron Knister > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Date: Mon, Jan 23, 2017 11:11 AM > > This is going to sound like a ridiculous request, but, is there a way to > cause a filesystem to panic everywhere in one "swell foop"? I'm assuming > the answer will come with an appropriate disclaimer of "don't ever do > this, we don't support it, it might eat your data, summon cthulu, etc.". > I swear I've seen the fs manager initiate this type of operation before. > > I can seem to do it on a per-node basis with "mmfsadm test panic > " but if I do that over all 1k nodes in my test cluster at > once it results in about 45 minutes of almost total deadlock while each > panic is processed by the fs manager. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at us.ibm.com Mon Jan 23 04:12:02 2017 From: oehmes at us.ibm.com (Sven Oehme) Date: Mon, 23 Jan 2017 04:12:02 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: What version of Scale/ GPFS code is this cluster on ? ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Aaron Knister To: Date: 01/23/2017 01:31 AM Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? Sent by: gpfsug-discuss-bounces at spectrumscale.org I was afraid someone would ask :) One possible use would be testing how monitoring reacts to and/or corrects stale filesystems. The use in my case is there's an issue we see quite often where a filesystem won't unmount when trying to shut down gpfs. Linux insists its still busy despite every process being killed on the node just about except init. It's a real pain because it complicates maintenance, requiring a reboot of some nodes prior to patching for example. I dug into it and it appears as though when this happens the filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm trying to debug it further but I need to actually be able to make the condition happen a few more times to debug it. A stripegroup panic isn't a surefire way but it's the only way I've found so far to trigger this behavior somewhat on demand. One way I've found to trigger a mass stripegroup panic is to induce what I call a "301 error": loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted by the system with return code 301 reason code 0 loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument and tickle a known race condition between nodes being expelled from the cluster and a manager node joining the cluster. When this happens it seems to cause a mass stripe group panic that's over in a few minutes. The trick there is that it doesn't happen every time I go through the exercise and when it does there's no guarantee the filesystem that panics is the one in use. If it's not an fs in use then it doesn't help me reproduce the error condition. I was trying to use the "mmfsadm test panic" command to try a more direct approach. Hope that helps shed some light. -Aaron On 1/22/17 8:16 PM, Andrew Beattie wrote: > Out of curiosity -- why would you want to? > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Aaron Knister > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Date: Mon, Jan 23, 2017 11:11 AM > > This is going to sound like a ridiculous request, but, is there a way to > cause a filesystem to panic everywhere in one "swell foop"? I'm assuming > the answer will come with an appropriate disclaimer of "don't ever do > this, we don't support it, it might eat your data, summon cthulu, etc.". > I swear I've seen the fs manager initiate this type of operation before. > > I can seem to do it on a per-node basis with "mmfsadm test panic > " but if I do that over all 1k nodes in my test cluster at > once it results in about 45 minutes of almost total deadlock while each > panic is processed by the fs manager. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Jan 23 04:22:38 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 22 Jan 2017 23:22:38 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: It's at 4.1.1.10. On 1/22/17 11:12 PM, Sven Oehme wrote: > What version of Scale/ GPFS code is this cluster on ? > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > was afraid someone would ask :) One possible use would beAaron Knister > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > possible use would be testing how monitoring reacts to and/or > > From: Aaron Knister > To: > Date: 01/23/2017 01:31 AM > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > I was afraid someone would ask :) > > One possible use would be testing how monitoring reacts to and/or > corrects stale filesystems. > > The use in my case is there's an issue we see quite often where a > filesystem won't unmount when trying to shut down gpfs. Linux insists > its still busy despite every process being killed on the node just about > except init. It's a real pain because it complicates maintenance, > requiring a reboot of some nodes prior to patching for example. > > I dug into it and it appears as though when this happens the > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > trying to debug it further but I need to actually be able to make the > condition happen a few more times to debug it. A stripegroup panic isn't > a surefire way but it's the only way I've found so far to trigger this > behavior somewhat on demand. > > One way I've found to trigger a mass stripegroup panic is to induce what > I call a "301 error": > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > by the system with return code 301 reason code 0 > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > and tickle a known race condition between nodes being expelled from the > cluster and a manager node joining the cluster. When this happens it > seems to cause a mass stripe group panic that's over in a few minutes. > The trick there is that it doesn't happen every time I go through the > exercise and when it does there's no guarantee the filesystem that > panics is the one in use. If it's not an fs in use then it doesn't help > me reproduce the error condition. I was trying to use the "mmfsadm test > panic" command to try a more direct approach. > > Hope that helps shed some light. > > -Aaron > > On 1/22/17 8:16 PM, Andrew Beattie wrote: >> Out of curiosity -- why would you want to? >> Andrew Beattie >> Software Defined Storage - IT Specialist >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com >> >> >> >> ----- Original message ----- >> From: Aaron Knister >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug main discussion list >> Cc: >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? >> Date: Mon, Jan 23, 2017 11:11 AM >> >> This is going to sound like a ridiculous request, but, is there a way to >> cause a filesystem to panic everywhere in one "swell foop"? I'm assuming >> the answer will come with an appropriate disclaimer of "don't ever do >> this, we don't support it, it might eat your data, summon cthulu, etc.". >> I swear I've seen the fs manager initiate this type of operation before. >> >> I can seem to do it on a per-node basis with "mmfsadm test panic >> " but if I do that over all 1k nodes in my test cluster at >> once it results in about 45 minutes of almost total deadlock while each >> panic is processed by the fs manager. >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Mon Jan 23 05:03:43 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 23 Jan 2017 05:03:43 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: Then i would suggest to move up to at least 4.2.1.LATEST , there is a high chance your problem might already be fixed. i see 2 potential area that got significant improvements , Token Manager recovery and Log Recovery, both are in latest 4.2.1 code enabled : 2 significant improvements on Token Recovery in 4.2.1 : 1. Extendible hashing for token hash table. This speeds up token lookup and thereby reduce tcMutex hold times for configurations with a large ratio of clients to token servers. 2. Cleaning up tokens held by failed nodes was making multiple passes over the whole token table, one for each failed node. The loops are now inverted, so it makes a single pass over the able, and for each token found, does cleanup for all failed nodes. there are multiple smaller enhancements beyond 4.2.1 but thats the minimum level you want to be. i have seen token recovery of 10's of minutes similar to what you described going down to a minute with this change. on Log Recovery - in case of an unclean unmount/shutdown of a node prior 4.2.1 the Filesystem manager would only recover one Log file at a time, using a single thread, with 4.2.1 this is now done with multiple threads and multiple log files in parallel . Sven On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister wrote: > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > ------------------------------------------ > > Sven Oehme > > Scalable Storage Research > > email: oehmes at us.ibm.com > > Phone: +1 (408) 824-8904 <(408)%20824-8904> > > IBM Almaden Research Lab > > ------------------------------------------ > > > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > > was afraid someone would ask :) One possible use would beAaron Knister > > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > > possible use would be testing how monitoring reacts to and/or > > > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > ------------------------------------------------------------------------ > > > > > > > > I was afraid someone would ask :) > > > > One possible use would be testing how monitoring reacts to and/or > > corrects stale filesystems. > > > > The use in my case is there's an issue we see quite often where a > > filesystem won't unmount when trying to shut down gpfs. Linux insists > > its still busy despite every process being killed on the node just about > > except init. It's a real pain because it complicates maintenance, > > requiring a reboot of some nodes prior to patching for example. > > > > I dug into it and it appears as though when this happens the > > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > > trying to debug it further but I need to actually be able to make the > > condition happen a few more times to debug it. A stripegroup panic isn't > > a surefire way but it's the only way I've found so far to trigger this > > behavior somewhat on demand. > > > > One way I've found to trigger a mass stripegroup panic is to induce what > > I call a "301 error": > > > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > > by the system with return code 301 reason code 0 > > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > > > and tickle a known race condition between nodes being expelled from the > > cluster and a manager node joining the cluster. When this happens it > > seems to cause a mass stripe group panic that's over in a few minutes. > > The trick there is that it doesn't happen every time I go through the > > exercise and when it does there's no guarantee the filesystem that > > panics is the one in use. If it's not an fs in use then it doesn't help > > me reproduce the error condition. I was trying to use the "mmfsadm test > > panic" command to try a more direct approach. > > > > Hope that helps shed some light. > > > > -Aaron > > > > On 1/22/17 8:16 PM, Andrew Beattie wrote: > >> Out of curiosity -- why would you want to? > >> Andrew Beattie > >> Software Defined Storage - IT Specialist > >> Phone: 614-2133-7927 > >> E-mail: abeattie at au1.ibm.com > >> > >> > >> > >> ----- Original message ----- > >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> To: gpfsug main discussion list > >> Cc: > >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > >> Date: Mon, Jan 23, 2017 11:11 AM > >> > >> This is going to sound like a ridiculous request, but, is there a > way to > >> cause a filesystem to panic everywhere in one "swell foop"? I'm > assuming > >> the answer will come with an appropriate disclaimer of "don't ever > do > >> this, we don't support it, it might eat your data, summon cthulu, > etc.". > >> I swear I've seen the fs manager initiate this type of operation > before. > >> > >> I can seem to do it on a per-node basis with "mmfsadm test panic > > >> " but if I do that over all 1k nodes in my test cluster > at > >> once it results in about 45 minutes of almost total deadlock while > each > >> panic is processed by the fs manager. > >> > >> -Aaron > >> > >> -- > >> Aaron Knister > >> NASA Center for Climate Simulation (Code 606.2) > >> Goddard Space Flight Center > >> (301) 286-2776 > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jan 23 05:27:53 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 23 Jan 2017 05:27:53 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: Aaron, hold a bit with the upgrade , i just got word that while 4.2.1+ most likely addresses the issues i mentioned, there was a defect in the initial release of the parallel log recovery code. i will get the exact minimum version you need to deploy and send another update to this thread. sven On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme wrote: > Then i would suggest to move up to at least 4.2.1.LATEST , there is a high > chance your problem might already be fixed. > > i see 2 potential area that got significant improvements , Token Manager > recovery and Log Recovery, both are in latest 4.2.1 code enabled : > > 2 significant improvements on Token Recovery in 4.2.1 : > > 1. Extendible hashing for token hash table. This speeds up token lookup > and thereby reduce tcMutex hold times for configurations with a large ratio > of clients to token servers. > 2. Cleaning up tokens held by failed nodes was making multiple passes > over the whole token table, one for each failed node. The loops are now > inverted, so it makes a single pass over the able, and for each token > found, does cleanup for all failed nodes. > > there are multiple smaller enhancements beyond 4.2.1 but thats the minimum > level you want to be. i have seen token recovery of 10's of minutes similar > to what you described going down to a minute with this change. > > on Log Recovery - in case of an unclean unmount/shutdown of a node prior > 4.2.1 the Filesystem manager would only recover one Log file at a time, > using a single thread, with 4.2.1 this is now done with multiple threads > and multiple log files in parallel . > > Sven > > On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister > wrote: > > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > ------------------------------------------ > > Sven Oehme > > Scalable Storage Research > > email: oehmes at us.ibm.com > > Phone: +1 (408) 824-8904 <(408)%20824-8904> > > IBM Almaden Research Lab > > ------------------------------------------ > > > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > > was afraid someone would ask :) One possible use would beAaron Knister > > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > > possible use would be testing how monitoring reacts to and/or > > > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > ------------------------------------------------------------------------ > > > > > > > > I was afraid someone would ask :) > > > > One possible use would be testing how monitoring reacts to and/or > > corrects stale filesystems. > > > > The use in my case is there's an issue we see quite often where a > > filesystem won't unmount when trying to shut down gpfs. Linux insists > > its still busy despite every process being killed on the node just about > > except init. It's a real pain because it complicates maintenance, > > requiring a reboot of some nodes prior to patching for example. > > > > I dug into it and it appears as though when this happens the > > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > > trying to debug it further but I need to actually be able to make the > > condition happen a few more times to debug it. A stripegroup panic isn't > > a surefire way but it's the only way I've found so far to trigger this > > behavior somewhat on demand. > > > > One way I've found to trigger a mass stripegroup panic is to induce what > > I call a "301 error": > > > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > > by the system with return code 301 reason code 0 > > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > > > and tickle a known race condition between nodes being expelled from the > > cluster and a manager node joining the cluster. When this happens it > > seems to cause a mass stripe group panic that's over in a few minutes. > > The trick there is that it doesn't happen every time I go through the > > exercise and when it does there's no guarantee the filesystem that > > panics is the one in use. If it's not an fs in use then it doesn't help > > me reproduce the error condition. I was trying to use the "mmfsadm test > > panic" command to try a more direct approach. > > > > Hope that helps shed some light. > > > > -Aaron > > > > On 1/22/17 8:16 PM, Andrew Beattie wrote: > >> Out of curiosity -- why would you want to? > >> Andrew Beattie > >> Software Defined Storage - IT Specialist > >> Phone: 614-2133-7927 > >> E-mail: abeattie at au1.ibm.com > >> > >> > >> > >> ----- Original message ----- > >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> To: gpfsug main discussion list > >> Cc: > >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > >> Date: Mon, Jan 23, 2017 11:11 AM > >> > >> This is going to sound like a ridiculous request, but, is there a > way to > >> cause a filesystem to panic everywhere in one "swell foop"? I'm > assuming > >> the answer will come with an appropriate disclaimer of "don't ever > do > >> this, we don't support it, it might eat your data, summon cthulu, > etc.". > >> I swear I've seen the fs manager initiate this type of operation > before. > >> > >> I can seem to do it on a per-node basis with "mmfsadm test panic > > >> " but if I do that over all 1k nodes in my test cluster > at > >> once it results in about 45 minutes of almost total deadlock while > each > >> panic is processed by the fs manager. > >> > >> -Aaron > >> > >> -- > >> Aaron Knister > >> NASA Center for Climate Simulation (Code 606.2) > >> Goddard Space Flight Center > >> (301) 286-2776 > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Mon Jan 23 05:40:25 2017 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Mon, 23 Jan 2017 05:40:25 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: I?ve also done the ?panic stripe group everywhere? trick on a test cluster for a large FPO filesystem solution. With FPO it?s not very hard to get a filesystem to become unmountable due to missing disks. Sometimes the best answer, especially in a scratch use-case, may be to throw the filesystem away and start again empty so that research can resume (even though there will be work loss and repeated effort for some). But the stuck mounts problem can make this a long-lived problem. In my case, I just repeatedly panic any nodes which continue to mount the filesystem and try mmdelfs until it works (usually takes a few attempts). In this case, I really don?t want/need the filesystem to be recovered. I just want the cluster to forget about it as quickly as possible. So far, in testing, the panic/destroy times aren?t bad, but I don?t have heavy user workloads running against it yet. It would be interesting to know if there were any shortcuts to skip SG manager reassignment and recovery attempts. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sven Oehme Sent: Monday, January 23, 2017 12:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? Aaron, hold a bit with the upgrade , i just got word that while 4.2.1+ most likely addresses the issues i mentioned, there was a defect in the initial release of the parallel log recovery code. i will get the exact minimum version you need to deploy and send another update to this thread. sven On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme > wrote: Then i would suggest to move up to at least 4.2.1.LATEST , there is a high chance your problem might already be fixed. i see 2 potential area that got significant improvements , Token Manager recovery and Log Recovery, both are in latest 4.2.1 code enabled : 2 significant improvements on Token Recovery in 4.2.1 : 1. Extendible hashing for token hash table. This speeds up token lookup and thereby reduce tcMutex hold times for configurations with a large ratio of clients to token servers. 2. Cleaning up tokens held by failed nodes was making multiple passes over the whole token table, one for each failed node. The loops are now inverted, so it makes a single pass over the able, and for each token found, does cleanup for all failed nodes. there are multiple smaller enhancements beyond 4.2.1 but thats the minimum level you want to be. i have seen token recovery of 10's of minutes similar to what you described going down to a minute with this change. on Log Recovery - in case of an unclean unmount/shutdown of a node prior 4.2.1 the Filesystem manager would only recover one Log file at a time, using a single thread, with 4.2.1 this is now done with multiple threads and multiple log files in parallel . Sven On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister > wrote: It's at 4.1.1.10. On 1/22/17 11:12 PM, Sven Oehme wrote: > What version of Scale/ GPFS code is this cluster on ? > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > was afraid someone would ask :) One possible use would beAaron Knister > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > possible use would be testing how monitoring reacts to and/or > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > I was afraid someone would ask :) > > One possible use would be testing how monitoring reacts to and/or > corrects stale filesystems. > > The use in my case is there's an issue we see quite often where a > filesystem won't unmount when trying to shut down gpfs. Linux insists > its still busy despite every process being killed on the node just about > except init. It's a real pain because it complicates maintenance, > requiring a reboot of some nodes prior to patching for example. > > I dug into it and it appears as though when this happens the > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > trying to debug it further but I need to actually be able to make the > condition happen a few more times to debug it. A stripegroup panic isn't > a surefire way but it's the only way I've found so far to trigger this > behavior somewhat on demand. > > One way I've found to trigger a mass stripegroup panic is to induce what > I call a "301 error": > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > by the system with return code 301 reason code 0 > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > and tickle a known race condition between nodes being expelled from the > cluster and a manager node joining the cluster. When this happens it > seems to cause a mass stripe group panic that's over in a few minutes. > The trick there is that it doesn't happen every time I go through the > exercise and when it does there's no guarantee the filesystem that > panics is the one in use. If it's not an fs in use then it doesn't help > me reproduce the error condition. I was trying to use the "mmfsadm test > panic" command to try a more direct approach. > > Hope that helps shed some light. > > -Aaron > > On 1/22/17 8:16 PM, Andrew Beattie wrote: >> Out of curiosity -- why would you want to? >> Andrew Beattie >> Software Defined Storage - IT Specialist >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com > >> >> >> >> ----- Original message ----- >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug main discussion list > >> Cc: >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? >> Date: Mon, Jan 23, 2017 11:11 AM >> >> This is going to sound like a ridiculous request, but, is there a way to >> cause a filesystem to panic everywhere in one "swell foop"? I'm assuming >> the answer will come with an appropriate disclaimer of "don't ever do >> this, we don't support it, it might eat your data, summon cthulu, etc.". >> I swear I've seen the fs manager initiate this type of operation before. >> >> I can seem to do it on a per-node basis with "mmfsadm test panic >> " but if I do that over all 1k nodes in my test cluster at >> once it results in about 45 minutes of almost total deadlock while each >> panic is processed by the fs manager. >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jan 23 10:17:03 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 23 Jan 2017 10:17:03 +0000 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: Hi Mark, Thanks. I get that using it to move to a new FS version is probably beyond design. But equally, I could easily see that having to support implementing the latest FS version is a strong requirement. I.e. In a DR situation say three years down the line, it would be a new FS of (say) 5.1.1, we wouldn't want to have to go back and find 4.1.1 code, nor would we necessarily be able to even run that version (as kernels and OSes move forward). That?s sorta also the situation where you don't want to suddenly have to run back to IBM support because your DR solution suddenly doesn't work like it says on the tin ;-) I can test 1 and 2 relatively easily, but 3 is a bit more difficult for us to test out as the FS we want to use SOBAR on is 4.2 already. Simon From: > on behalf of Marc A Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 20 January 2017 at 16:57 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SOBAR questions I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 23 15:32:41 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 23 Jan 2017 15:32:41 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <3D2CE694-2A3A-4B5E-8078-238A09681BE8@ulmer.org> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> <3D2CE694-2A3A-4B5E-8078-238A09681BE8@ulmer.org> Message-ID: <031A80F6-B00B-4AF9-963B-98E61BC537B4@vanderbilt.edu> Hi All, Stephens? very first question below has led me to figure out what the problem is ? we have one group in /etc/group that has dozens and dozens of members ? any group above that in /etc/group gets printed as a name by mmrepquota; any group below it gets printed as a GID. Wasn?t there an identical bug in mmlsquota a while back? I will update the PMR I have open with IBM. Thanks to all who took the time to respond with suggestions. Kevin On Jan 20, 2017, at 4:23 PM, Stephen Ulmer > wrote: My list of questions that might or might not be thought provoking: How about the relative position of the items in the /etc/group file? Are all of the failures later in the file than all of the successes? Do any groups have group passwords (parsing error due to ?different" line format)? Is the /etc/group sorted by either GID or group name (not normally required, but it would be interesting to see if it changed the problem)? Is the set that is translated versus not translated consistent or do they change? (Across all axes of comparison by {node, command invocation, et al.}) Are the not translated groups more or less likely to be the default group of the owning UID? Can you translate the GID other ways? Like with ls? (I think this was in the original problem description, but I don?t remember the answer.) What is you just turn of nscd? -- Stephen On Jan 20, 2017, at 10:09 AM, Buterbaugh, Kevin L > wrote: Hi Phil, Nope - that was the very first thought I had but on a 4.2.2.1 node I have a 13 character group name displaying and a resolvable 7 character long group name being displayed as its? GID? Kevin On Jan 20, 2017, at 9:06 AM, Phil Pishioneri > wrote: On 1/19/17 4:51 PM, Buterbaugh, Kevin L wrote: Hi All, Let me try to answer some questions that have been raised by various list members? 1. I am not using nscd. 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me Anything unique about the lengths of the names of the affected groups? (i.e., all a certain value, all greater than some value, etc.) -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jan 23 15:35:41 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 23 Jan 2017 10:35:41 -0500 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: Regarding back level file systems and testing... 1. Did you know that the mmcrfs command supports --version which allows you to create a back level file system? 2. If your concern is restoring from a SOBAR backup that was made a long while ago with an old version of GPFS/sobar... I'd say that should work... BUT I don't know for sure AND I'd caution that AFAIK (someone may correct me) Sobar is not intended for long term archiving of file systems. Personally ( IBM hat off ;-) ), for that I'd choose a standard, vendor-neutral archival format that is likely to be supported in the future.... My current understanding: Spectrum Scal SOBAR is for "disaster recovery" or "migrate/upgrade entire file system" -- where presumably you do Sobar backups on a regular schedule... and/or do one just before you begin an upgrade or migration to new hardware. --marc From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 01/23/2017 05:17 AM Subject: Re: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Mark, Thanks. I get that using it to move to a new FS version is probably beyond design. But equally, I could easily see that having to support implementing the latest FS version is a strong requirement. I.e. In a DR situation say three years down the line, it would be a new FS of (say) 5.1.1, we wouldn't want to have to go back and find 4.1.1 code, nor would we necessarily be able to even run that version (as kernels and OSes move forward). That?s sorta also the situation where you don't want to suddenly have to run back to IBM support because your DR solution suddenly doesn't work like it says on the tin ;-) I can test 1 and 2 relatively easily, but 3 is a bit more difficult for us to test out as the FS we want to use SOBAR on is 4.2 already. Simon From: on behalf of Marc A Kaplan Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Friday, 20 January 2017 at 16:57 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] SOBAR questions I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" < S.J.Thompson at bham.ac.uk> To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jan 23 22:04:25 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 23 Jan 2017 22:04:25 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: Hi, you either need to request access to GPFS 4.2.1.0 efix16 via your PMR or need to upgrade to 4.2.2.1 both contain the fixes required. Sven On Mon, Jan 23, 2017 at 6:27 AM Sven Oehme wrote: > Aaron, > > hold a bit with the upgrade , i just got word that while 4.2.1+ most > likely addresses the issues i mentioned, there was a defect in the initial > release of the parallel log recovery code. i will get the exact minimum > version you need to deploy and send another update to this thread. > > sven > > On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme wrote: > > Then i would suggest to move up to at least 4.2.1.LATEST , there is a high > chance your problem might already be fixed. > > i see 2 potential area that got significant improvements , Token Manager > recovery and Log Recovery, both are in latest 4.2.1 code enabled : > > 2 significant improvements on Token Recovery in 4.2.1 : > > 1. Extendible hashing for token hash table. This speeds up token lookup > and thereby reduce tcMutex hold times for configurations with a large ratio > of clients to token servers. > 2. Cleaning up tokens held by failed nodes was making multiple passes > over the whole token table, one for each failed node. The loops are now > inverted, so it makes a single pass over the able, and for each token > found, does cleanup for all failed nodes. > > there are multiple smaller enhancements beyond 4.2.1 but thats the minimum > level you want to be. i have seen token recovery of 10's of minutes similar > to what you described going down to a minute with this change. > > on Log Recovery - in case of an unclean unmount/shutdown of a node prior > 4.2.1 the Filesystem manager would only recover one Log file at a time, > using a single thread, with 4.2.1 this is now done with multiple threads > and multiple log files in parallel . > > Sven > > On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister > wrote: > > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > ------------------------------------------ > > Sven Oehme > > Scalable Storage Research > > email: oehmes at us.ibm.com > > Phone: +1 (408) 824-8904 <(408)%20824-8904> > > IBM Almaden Research Lab > > ------------------------------------------ > > > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > > was afraid someone would ask :) One possible use would beAaron Knister > > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > > possible use would be testing how monitoring reacts to and/or > > > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > ------------------------------------------------------------------------ > > > > > > > > I was afraid someone would ask :) > > > > One possible use would be testing how monitoring reacts to and/or > > corrects stale filesystems. > > > > The use in my case is there's an issue we see quite often where a > > filesystem won't unmount when trying to shut down gpfs. Linux insists > > its still busy despite every process being killed on the node just about > > except init. It's a real pain because it complicates maintenance, > > requiring a reboot of some nodes prior to patching for example. > > > > I dug into it and it appears as though when this happens the > > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > > trying to debug it further but I need to actually be able to make the > > condition happen a few more times to debug it. A stripegroup panic isn't > > a surefire way but it's the only way I've found so far to trigger this > > behavior somewhat on demand. > > > > One way I've found to trigger a mass stripegroup panic is to induce what > > I call a "301 error": > > > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > > by the system with return code 301 reason code 0 > > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > > > and tickle a known race condition between nodes being expelled from the > > cluster and a manager node joining the cluster. When this happens it > > seems to cause a mass stripe group panic that's over in a few minutes. > > The trick there is that it doesn't happen every time I go through the > > exercise and when it does there's no guarantee the filesystem that > > panics is the one in use. If it's not an fs in use then it doesn't help > > me reproduce the error condition. I was trying to use the "mmfsadm test > > panic" command to try a more direct approach. > > > > Hope that helps shed some light. > > > > -Aaron > > > > On 1/22/17 8:16 PM, Andrew Beattie wrote: > >> Out of curiosity -- why would you want to? > >> Andrew Beattie > >> Software Defined Storage - IT Specialist > >> Phone: 614-2133-7927 > >> E-mail: abeattie at au1.ibm.com > >> > >> > >> > >> ----- Original message ----- > >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> To: gpfsug main discussion list > >> Cc: > >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > >> Date: Mon, Jan 23, 2017 11:11 AM > >> > >> This is going to sound like a ridiculous request, but, is there a > way to > >> cause a filesystem to panic everywhere in one "swell foop"? I'm > assuming > >> the answer will come with an appropriate disclaimer of "don't ever > do > >> this, we don't support it, it might eat your data, summon cthulu, > etc.". > >> I swear I've seen the fs manager initiate this type of operation > before. > >> > >> I can seem to do it on a per-node basis with "mmfsadm test panic > > >> " but if I do that over all 1k nodes in my test cluster > at > >> once it results in about 45 minutes of almost total deadlock while > each > >> panic is processed by the fs manager. > >> > >> -Aaron > >> > >> -- > >> Aaron Knister > >> NASA Center for Climate Simulation (Code 606.2) > >> Goddard Space Flight Center > >> (301) 286-2776 > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jan 24 10:00:42 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 24 Jan 2017 10:00:42 +0000 Subject: [gpfsug-discuss] Manager nodes Message-ID: We are looking at moving manager processes off our NSD nodes and on to dedicated quorum/manager nodes. Are there some broad recommended hardware specs for the function of these nodes. I assume they benefit from having high memory (for some value of high, probably a function of number of clients, files, expected open files?, and probably completely incalculable, so some empirical evidence may be useful here?) (I'm going to ignore the docs that say you should have twice as much swap as RAM!) What about cores, do they benefit from high core counts or high clock rates? For example would I benefit more form a high core count, low clock speed, or going for higher clock speeds and reducing core count? Or is memory bandwidth more important for manager nodes? Connectivity, does token management run over IB or only over Ethernet/admin network? I.e. Should I bother adding IB cards, or just have fast Ethernet on them (my clients/NSDs all have IB). I'm looking for some hints on what I would most benefit in investing in vs keeping to budget. Thanks Simon From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jan 24 15:18:09 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 24 Jan 2017 15:18:09 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: References: Message-ID: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From janfrode at tanso.net Tue Jan 24 15:51:05 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 24 Jan 2017 15:51:05 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> Message-ID: Just some datapoints, in hope that it helps.. I've seen metadata performance improvements by turning down hyperthreading from 8/core to 4/core on Power8. Also it helped distributing the token managers over multiple nodes (6+) instead of fewer. I would expect this to flow over IP, not IB. -jf tir. 24. jan. 2017 kl. 16.18 skrev Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu>: Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jan 24 16:34:16 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 24 Jan 2017 16:34:16 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu>, Message-ID: Thanks both. I was thinking of adding 4 (we have a storage cluster over two DC's, so was planning to put two in each and use them as quorum nodes as well plus one floating VM to guarantee only one sitr is quorate in the event of someone cutting a fibre...) We pretty much start at 128GB ram and go from there, so this sounds fine. Would be good if someone could comment on if token traffic goes via IB or Ethernet, maybe I can save myself a few EDR cards... Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net] Sent: 24 January 2017 15:51 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Manager nodes Just some datapoints, in hope that it helps.. I've seen metadata performance improvements by turning down hyperthreading from 8/core to 4/core on Power8. Also it helped distributing the token managers over multiple nodes (6+) instead of fewer. I would expect this to flow over IP, not IB. -jf tir. 24. jan. 2017 kl. 16.18 skrev Buterbaugh, Kevin L >: Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) > wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Tue Jan 24 16:53:24 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 24 Jan 2017 16:53:24 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu>, Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06544544@CHI-EXCHANGEW1.w2k.jumptrading.com> It goes over IP, and that could be IPoIB if you have the daemon interface or subnets configured that way, but it will go over native IB VERBS if you have rdmaVerbsSend enabled (not recommended for large clusters). verbsRdmaSend Enables or disables the use of InfiniBand RDMA rather than TCP for most GPFS daemon-to-daemon communication. When disabled, only data transfers between an NSD client and NSD server are eligible for RDMA. Valid values are enable or disable. The default value is disable. The verbsRdma option must be enabled for verbsRdmaSend to have any effect. HTH, -B -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, January 24, 2017 10:34 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Manager nodes Thanks both. I was thinking of adding 4 (we have a storage cluster over two DC's, so was planning to put two in each and use them as quorum nodes as well plus one floating VM to guarantee only one sitr is quorate in the event of someone cutting a fibre...) We pretty much start at 128GB ram and go from there, so this sounds fine. Would be good if someone could comment on if token traffic goes via IB or Ethernet, maybe I can save myself a few EDR cards... Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net] Sent: 24 January 2017 15:51 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Manager nodes Just some datapoints, in hope that it helps.. I've seen metadata performance improvements by turning down hyperthreading from 8/core to 4/core on Power8. Also it helped distributing the token managers over multiple nodes (6+) instead of fewer. I would expect this to flow over IP, not IB. -jf tir. 24. jan. 2017 kl. 16.18 skrev Buterbaugh, Kevin L >: Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ... ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU's and 64 GB RAM. They've got 10 GbE, but we don't use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don't have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL... Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) > wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From UWEFALKE at de.ibm.com Tue Jan 24 17:36:22 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 24 Jan 2017 18:36:22 +0100 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> Message-ID: Hi, Kevin, I'd look for more cores on the expense of clock speed. You send data over routes involving much higher latencies than your CPU-memory combination has even in the slowest available clock rate, but GPFS with its multi-threaded appoach is surely happy if it can start a few more threads. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 01/24/2017 04:18 PM Subject: Re: [gpfsug-discuss] Manager nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Tue Jan 24 19:48:02 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 24 Jan 2017 19:48:02 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes Message-ID: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. From Achim.Rehor at de.ibm.com Wed Jan 25 08:58:58 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 25 Jan 2017 09:58:58 +0100 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06544544@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu>, <21BC488F0AEA2245B2C3E83FC0B33DBB06544544@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Jan 25 11:30:00 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 25 Jan 2017 12:30:00 +0100 Subject: [gpfsug-discuss] snapshots Message-ID: <20170125113000.lwvzpekzjsjvghx5@ics.muni.cz> Hello, is there a way to get number of inodes consumed by a particular snapshot? I have a fileset with separate inodespace: Filesets in file system 'vol1': Name Status Path InodeSpace MaxInodes AllocInodes UsedInodes export Linked /gpfs/vol1/export 1 300000256 300000256 157515747 and it reports no space left on device. It seems that inodes consumed by fileset snapshots are not accounted under usedinodes. So can I somehow check how many inodes are consumed by snapshots? The 'no space left on device' IS caused by exhausted inodes, I can store more data into existing files and if I increase the inode limit, I can create new files. -- Luk?? Hejtm?nek From r.sobey at imperial.ac.uk Wed Jan 25 16:08:27 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 25 Jan 2017 16:08:27 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors Message-ID: Hoping someone can show me what should be obvious. I've got an LROC device configured but I want to see stats for it in the GUI: 1) On the CES node itself I've modified ZIMonSensors.cfg and under the GPFSLROC section changed it to 10: { name = "GPFSLROC" period = 10 }, 2) On the CES node restarted pmsensors. 3) On the collector node restarted pmcollector. But I can't find anywhere in the GUI that lets me look at anything LROC related. Anyone got this working? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jan 25 20:25:19 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 20:25:19 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: References: Message-ID: Richard, there are no exposures of LROC counters in the Scale GUI. you need to use the grafana bridge to get graphs or the command line tools to query the data in text format. Sven On Wed, Jan 25, 2017 at 5:08 PM Sobey, Richard A wrote: > Hoping someone can show me what should be obvious. I?ve got an LROC device > configured but I want to see stats for it in the GUI: > > > > 1) On the CES node itself I?ve modified ZIMonSensors.cfg and under > the GPFSLROC section changed it to 10: > > > > { > > name = "GPFSLROC" > > period = 10 > > }, > > > > 2) On the CES node restarted pmsensors. > > 3) On the collector node restarted pmcollector. > > > > But I can?t find anywhere in the GUI that lets me look at anything LROC > related. > > > > Anyone got this working? > > > > Cheers > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jan 25 20:45:05 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 25 Jan 2017 20:45:05 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors Message-ID: <0CDC969E-7CB9-4B4E-9AAA-1BF9193BF7E2@nuance.com> For the Zimon ?GPFSLROC?, what metrics can Grafana query, I don?t see them documented or exposed anywhere: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adv_listofmetricsPMT.htm Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Wednesday, January 25, 2017 at 2:25 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] LROC Zimon sensors Richard, there are no exposures of LROC counters in the Scale GUI. you need to use the grafana bridge to get graphs or the command line tools to query the data in text format. Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 25 20:50:28 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 25 Jan 2017 14:50:28 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: Message-ID: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From oehmes at gmail.com Wed Jan 25 21:00:03 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:00:03 +0000 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: Message-ID: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil wrote: > Hello all, > > We are having an issue where the LROC on a CES node gets overrun 100% > utilized. Processes then start to backup waiting for the LROC to > return data. Any way to have the GPFS client go direct if LROC gets to > busy? > > Thanks > Matt > > ________________________________ > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 25 21:01:11 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 25 Jan 2017 21:01:11 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: References: , Message-ID: Ok Sven thanks, looks like I'll be checking out grafana. Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: 25 January 2017 20:25 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] LROC Zimon sensors Richard, there are no exposures of LROC counters in the Scale GUI. you need to use the grafana bridge to get graphs or the command line tools to query the data in text format. Sven On Wed, Jan 25, 2017 at 5:08 PM Sobey, Richard A > wrote: Hoping someone can show me what should be obvious. I've got an LROC device configured but I want to see stats for it in the GUI: 1) On the CES node itself I've modified ZIMonSensors.cfg and under the GPFSLROC section changed it to 10: { name = "GPFSLROC" period = 10 }, 2) On the CES node restarted pmsensors. 3) On the collector node restarted pmcollector. But I can't find anywhere in the GUI that lets me look at anything LROC related. Anyone got this working? Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jan 25 21:06:12 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:06:12 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: <0CDC969E-7CB9-4B4E-9AAA-1BF9193BF7E2@nuance.com> References: <0CDC969E-7CB9-4B4E-9AAA-1BF9193BF7E2@nuance.com> Message-ID: Hi, i guess thats a docu gap, i will send a email trying to get this fixed. here is the list of sensors : [image: pasted1] i hope most of them are self explaining given the others are documented , if not let me know and i clarify . sven On Wed, Jan 25, 2017 at 9:45 PM Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > For the Zimon ?GPFSLROC?, what metrics can Grafana query, I don?t see them > documented or exposed anywhere: > > > > > http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adv_listofmetricsPMT.htm > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > > > *From: * on behalf of Sven > Oehme > *Reply-To: *gpfsug main discussion list > *Date: *Wednesday, January 25, 2017 at 2:25 PM > *To: *"gpfsug-discuss at spectrumscale.org" > > *Subject: *[EXTERNAL] Re: [gpfsug-discuss] LROC Zimon sensors > > > > Richard, > > > > there are no exposures of LROC counters in the Scale GUI. you need to use > the grafana bridge to get graphs or the command line tools to query the > data in text format. > > > > Sven > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pasted1 Type: image/png Size: 283191 bytes Desc: not available URL: From oehmes at gmail.com Wed Jan 25 21:08:02 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:08:02 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: References: Message-ID: start here : https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/IBM%20Spectrum%20Scale%20Performance%20Monitoring%20Bridge On Wed, Jan 25, 2017 at 10:01 PM Sobey, Richard A wrote: > Ok Sven thanks, looks like I'll be checking out grafana. > > > Richard > > > ------------------------------ > *From:* gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < > oehmes at gmail.com> > *Sent:* 25 January 2017 20:25 > *To:* gpfsug-discuss at spectrumscale.org > *Subject:* Re: [gpfsug-discuss] LROC Zimon sensors > > Richard, > > there are no exposures of LROC counters in the Scale GUI. you need to use > the grafana bridge to get graphs or the command line tools to query the > data in text format. > > Sven > > > On Wed, Jan 25, 2017 at 5:08 PM Sobey, Richard A > wrote: > > Hoping someone can show me what should be obvious. I?ve got an LROC device > configured but I want to see stats for it in the GUI: > > > > 1) On the CES node itself I?ve modified ZIMonSensors.cfg and under > the GPFSLROC section changed it to 10: > > > > { > > name = "GPFSLROC" > > period = 10 > > }, > > > > 2) On the CES node restarted pmsensors. > > 3) On the collector node restarted pmcollector. > > > > But I can?t find anywhere in the GUI that lets me look at anything LROC > related. > > > > Anyone got this working? > > > > Cheers > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 25 21:20:21 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 25 Jan 2017 15:20:21 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: Message-ID: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jan 25 21:29:50 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:29:50 +0000 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> Message-ID: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil wrote: > > > On 1/25/17 3:00 PM, Sven Oehme wrote: > > Matt, > > the assumption was that the remote devices are slower than LROC. there is > some attempts in the code to not schedule more than a maximum numbers of > outstanding i/os to the LROC device, but this doesn't help in all cases and > is depending on what kernel level parameters for the device are set. the > best way is to reduce the max size of data to be cached into lroc. > > I just turned LROC file caching completely off. most if not all of the IO > is metadata. Which is what I wanted to keep fast. It is amazing once you > drop the latency the IO's go up way more than they ever where before. I > guess we will need another nvme. > > > sven > > > On Wed, Jan 25, 2017 at 9:50 PM Matt Weil wrote: > > Hello all, > > We are having an issue where the LROC on a CES node gets overrun 100% > utilized. Processes then start to backup waiting for the LROC to > return data. Any way to have the GPFS client go direct if LROC gets to > busy? > > Thanks > Matt > > ________________________________ > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 25 21:51:43 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 25 Jan 2017 15:51:43 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> Message-ID: [ces1,ces2,ces3] maxStatCache 80000 worker1Threads 2000 maxFilesToCache 500000 pagepool 100G maxStatCache 80000 lrocData no 378G system memory. On 1/25/17 3:29 PM, Sven Oehme wrote: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? yes data I just turned off also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil > wrote: On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Jan 26 15:37:54 2017 From: mweil at wustl.edu (Matt Weil) Date: Thu, 26 Jan 2017 09:37:54 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> Message-ID: <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> 100% utilized are bursts above 200,000 IO's. Any way to tell ganesha.nfsd to cache more? On 1/25/17 3:51 PM, Matt Weil wrote: [ces1,ces2,ces3] maxStatCache 80000 worker1Threads 2000 maxFilesToCache 500000 pagepool 100G maxStatCache 80000 lrocData no 378G system memory. On 1/25/17 3:29 PM, Sven Oehme wrote: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? yes data I just turned off also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil > wrote: On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 26 17:15:56 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 26 Jan 2017 17:15:56 +0000 Subject: [gpfsug-discuss] mmlsquota output question Message-ID: <73AC6907-90BD-447F-9F72-4B7CBBFE2321@vanderbilt.edu> Hi All, We had 3 local GPFS filesystems on our cluster ? let?s call them gpfs0, gpfs1, and gpfs2. gpfs0 is for project space (i.e. groups can buy quota in 1 TB increments there). gpfs1 is scratch and gpfs2 is home. We are combining gpfs0 and gpfs1 into one new filesystem (gpfs3) ? we?re doing this for multiple reasons that aren?t really pertinent to my question here, but suffice it to say I have discussed our plan with some of IBM?s GPFS people and they agree that it?s the thing for us to do. gpfs3 will have a scratch fileset with no fileset quota, but user and group quotas (just like the gpfs1 filesystem currently has). We will also move all the filesets from gpfs0 over to gpfs3 - those use fileset quotas only - no user or group quotas. I have created the new gpfs3 filesystem, the scratch fileset within it, and one of the project filesets coming over from gpfs0. I?ve also moved my scratch directory to the gpfs3 scratch fileset. When I run mmlsquota I see (please note, I?ve changed names of things to protect the guilty): kevin at gateway: mmlsquota -u kevin --block-size auto Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs0 USR no limits Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs1 USR 2.008G 50G 200G 0 none | 3 100000 1000000 0 none Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs2 USR 11.69G 25G 35G 0 none | 8453 100000 200000 0 none Block Limits | File Limits Filesystem Fileset type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs3 root USR no limits gpfs3 scratch USR 31.04G 50G 200G 0 none | 2134 200000 1000000 0 none gpfs3 fakegroup USR no limits kevin at gateway: My question is this ? why am I seeing the ?root? and ?fakegroup? filesets listed in the output for gpfs3? They don?t show up for gpfs0 and the also exist there. Is it possibly because there are no user quotas whatsoever for gpfs0 and there are user quotas on the gpfs3:scratch fileset? If so, that still doesn?t make sense as to why mmlsquota would think it needs to show the filesets within that filesystem that don?t have user quotas. In fact, we don?t *want* that to happen, as we have certain groups that deal with various types of restricted data and we?d prefer that their existence not be advertised to everyone on the cluster. Oh, we?re still in the process of upgrading clients on our cluster, but this output is from a client running 4.2.2.1, in case that matters. Thanks all... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Jan 26 20:20:00 2017 From: mweil at wustl.edu (Matt Weil) Date: Thu, 26 Jan 2017 14:20:00 -0600 Subject: [gpfsug-discuss] LROC nvme small IO size 4 k In-Reply-To: <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> Message-ID: I still see small 4k IO's going to the nvme device after changing the max_sectors_kb. Writes did increase from 64 to 512. Is that a nvme limitation. > [root at ces1 system]# cat /sys/block/nvme0n1/queue/read_ahead_kb > 8192 > [root at ces1 system]# cat /sys/block/nvme0n1/queue/nr_requests > 512 > [root at ces1 system]# cat /sys/block/nvme0n1/queue/max_sectors_kb > 8192 > [root at ces1 system]# collectl -sD --dskfilt=nvme0n1 > waiting for 1 second sample... > > # DISK STATISTICS (/sec) > # > <---------reads---------><---------writes---------><--------averages--------> > Pct > #Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize > QLen Wait SvcTim Util > nvme0n1 47187 0 11K 4 30238 0 59 512 > 6 8 0 0 34 > nvme0n1 61730 0 15K 4 14321 0 28 512 > 4 9 0 0 45 ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From Robert.Oesterlin at nuance.com Fri Jan 27 00:57:05 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 27 Jan 2017 00:57:05 +0000 Subject: [gpfsug-discuss] Waiter identification help - Quota related Message-ID: OK, I have a sick cluster, and it seems to be tied up with quota related RPCs like this. Any help in narrowing down what the issue is? Waiting 3.8729 sec since 19:54:09, monitored, thread 32786 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.3158 sec since 19:54:08, monitored, thread 32771 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.3173 sec since 19:54:08, monitored, thread 35829 Msg handler quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.4619 sec since 19:54:08, monitored, thread 9694 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.4967 sec since 19:54:08, monitored, thread 32357 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.6885 sec since 19:54:08, monitored, thread 32305 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.7123 sec since 19:54:08, monitored, thread 32261 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.7932 sec since 19:54:08, monitored, thread 53409 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.2954 sec since 19:54:07, monitored, thread 32905 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3058 sec since 19:54:07, monitored, thread 32573 Msg handler quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3207 sec since 19:54:07, monitored, thread 32397 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3274 sec since 19:54:07, monitored, thread 32897 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3343 sec since 19:54:07, monitored, thread 32691 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3347 sec since 19:54:07, monitored, thread 32364 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3348 sec since 19:54:07, monitored, thread 32522 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jan 27 01:26:49 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 26 Jan 2017 20:26:49 -0500 Subject: [gpfsug-discuss] Waiter identification help - Quota related In-Reply-To: References: Message-ID: <49f984fc-4881-60fd-88a0-29701ce4ea73@nasa.gov> This might be a stretch but do you happen to have a user/fileset/group over it's hard quota or soft quota + grace period? We've had this really upset our cluster before. At least with 3.5 each op that's done against an over quota user/group/fileset results in at least one rpc from the fs manager to every node in the cluster. Are those waiters from an fs manager node? If so perhaps briefly fire up tracing (/usr/lpp/mmfs/bin/mmtrace start) let it run for ~10 seconds then stop it (/usr/lpp/mmfs/bin/mmtrace stop) then grep for "TRACE_QUOTA" out of the resulting trcrpt file. If you see a bunch of lines that contain: TRACE_QUOTA: qu.server revoke reply type that might be what's going on. You can also see the behavior if you look at the output of mmdiag --network on your fs manager nodes and see a bunch of RPC's with all of your cluster node listed as the recipients. Can't recall what the RPC is called that you're looking for, though. Hope that helps! -Aaron On 1/26/17 7:57 PM, Oesterlin, Robert wrote: > OK, I have a sick cluster, and it seems to be tied up with quota related > RPCs like this. Any help in narrowing down what the issue is? > > > > Waiting 3.8729 sec since 19:54:09, monitored, thread 32786 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.3158 sec since 19:54:08, monitored, thread 32771 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.3173 sec since 19:54:08, monitored, thread 35829 Msg handler > quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.4619 sec since 19:54:08, monitored, thread 9694 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.4967 sec since 19:54:08, monitored, thread 32357 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.6885 sec since 19:54:08, monitored, thread 32305 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.7123 sec since 19:54:08, monitored, thread 32261 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.7932 sec since 19:54:08, monitored, thread 53409 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.2954 sec since 19:54:07, monitored, thread 32905 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3058 sec since 19:54:07, monitored, thread 32573 Msg handler > quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3207 sec since 19:54:07, monitored, thread 32397 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3274 sec since 19:54:07, monitored, thread 32897 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3343 sec since 19:54:07, monitored, thread 32691 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3347 sec since 19:54:07, monitored, thread 32364 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3348 sec since 19:54:07, monitored, thread 32522 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From r.sobey at imperial.ac.uk Fri Jan 27 11:12:25 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 27 Jan 2017 11:12:25 +0000 Subject: [gpfsug-discuss] Nodeclasses question Message-ID: All, Can it be clarified whether specifying "-N ces" (for example, I have a custom nodeclass called ces containing CES nodes of course) will then apply changes to future nodes that join the same nodeclass? For example, "mmchconfig maxFilesToCache=100000 -N ces" will give existing nodes that new config. I then add a 5th node to the nodeclass. Will it inherit the cache value or will I need to set it again? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 27 12:43:40 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 27 Jan 2017 12:43:40 +0000 Subject: [gpfsug-discuss] ?spam? Nodeclasses question Message-ID: I think this depends on you FS min version. We had some issues where ours was still set to 3.5 I think even though we have 4.x clients. The nodeclasses in mmlsconfig were expanded to individual nodes. But adding a node to a node class would apply the config to the node, though I'd expect you to have to stop/restart GPFS on the node and not expect it to work like "mmchconfig -I" Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 27 January 2017 at 11:12 To: "gpfsug-discuss at spectrumscale.org" > Subject: ?spam? [gpfsug-discuss] Nodeclasses question All, Can it be clarified whether specifying ?-N ces? (for example, I have a custom nodeclass called ces containing CES nodes of course) will then apply changes to future nodes that join the same nodeclass? For example, ?mmchconfig maxFilesToCache=100000 ?N ces? will give existing nodes that new config. I then add a 5th node to the nodeclass. Will it inherit the cache value or will I need to set it again? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From gil at us.ibm.com Fri Jan 27 13:08:06 2017 From: gil at us.ibm.com (Gil Sharon) Date: Fri, 27 Jan 2017 08:08:06 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 60, Issue 72 In-Reply-To: References: Message-ID: yes, node-classes are updated across all nodes, so if you add a node to an existing class it will be included from then on. But for CES nodes there is already a 'built-in' system class: cesNodes. why not use that? you can see all system nodeclasses by: mmlsnodeclass --system Regards, GIL SHARON Spectrum Scale (GPFS) Development Mobile: 978-302-9355 E-mail: gil at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/27/2017 07:00 AM Subject: gpfsug-discuss Digest, Vol 60, Issue 72 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Nodeclasses question (Sobey, Richard A) ---------------------------------------------------------------------- Message: 1 Date: Fri, 27 Jan 2017 11:12:25 +0000 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] Nodeclasses question Message-ID: Content-Type: text/plain; charset="us-ascii" All, Can it be clarified whether specifying "-N ces" (for example, I have a custom nodeclass called ces containing CES nodes of course) will then apply changes to future nodes that join the same nodeclass? For example, "mmchconfig maxFilesToCache=100000 -N ces" will give existing nodes that new config. I then add a 5th node to the nodeclass. Will it inherit the cache value or will I need to set it again? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170127/0d841ddb/attachment-0001.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 60, Issue 72 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at wustl.edu Fri Jan 27 15:49:12 2017 From: mweil at wustl.edu (Matt Weil) Date: Fri, 27 Jan 2017 09:49:12 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> Message-ID: <0ad3735a-77d4-6d98-6e8a-135479f3f594@wustl.edu> turning off data seems to have helped this issue Thanks all On 1/26/17 9:37 AM, Matt Weil wrote: 100% utilized are bursts above 200,000 IO's. Any way to tell ganesha.nfsd to cache more? On 1/25/17 3:51 PM, Matt Weil wrote: [ces1,ces2,ces3] maxStatCache 80000 worker1Threads 2000 maxFilesToCache 500000 pagepool 100G maxStatCache 80000 lrocData no 378G system memory. On 1/25/17 3:29 PM, Sven Oehme wrote: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? yes data I just turned off also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil > wrote: On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Fri Jan 27 17:17:53 2017 From: laurence at qsplace.co.uk (laurence at qsplace.co.uk) Date: Fri, 27 Jan 2017 17:17:53 +0000 Subject: [gpfsug-discuss] ?spam? Nodeclasses question In-Reply-To: References: Message-ID: Richard, As Simon notes in 3.5 they were expanded and where a pain; however this has since been tidied up and now works as it "should". So any further node added to a group will inherit the relevant parts of the config. i.e. (I've snipped the boring bits out) mmlsnodeclass Node Class Name Members --------------------- ----------------------------------------------------------- site2 s2gpfs1.site2,s2gpfs2.site2 mmchconfig pagepool=2G -N site2 mmshutdown -a mmstartup -a mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs3.site2: pagepool 1073741824 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 mmchnodeclass site2 add -N s2gpfs3.site2 mmshutdown -N s2gpfs3.site2 mmstartup -N s2gpfs3.site2 mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 s2gpfs3.site2: ! pagepool 2147483648 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 -- Lauz On 2017-01-27 12:43, Simon Thompson (Research Computing - IT Services) wrote: > I think this depends on you FS min version. > > We had some issues where ours was still set to 3.5 I think even though > we have 4.x clients. The nodeclasses in mmlsconfig were expanded to > individual nodes. But adding a node to a node class would apply the > config to the node, though I'd expect you to have to stop/restart GPFS > on the node and not expect it to work like "mmchconfig -I" > > Simon > > From: on behalf of "Sobey, > Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > > Date: Friday, 27 January 2017 at 11:12 > To: "gpfsug-discuss at spectrumscale.org" > > Subject: ?spam? [gpfsug-discuss] Nodeclasses question > > All, > > Can it be clarified whether specifying ?-N ces? (for example, I > have a custom nodeclass called ces containing CES nodes of course) > will then apply changes to future nodes that join the same nodeclass? > > For example, ?mmchconfig maxFilesToCache=100000 ?N ces? will > give existing nodes that new config. I then add a 5th node to the > nodeclass. Will it inherit the cache value or will I need to set it > again? > > Thanks > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From r.sobey at imperial.ac.uk Fri Jan 27 21:13:28 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 27 Jan 2017 21:13:28 +0000 Subject: [gpfsug-discuss] ?spam? Nodeclasses question In-Reply-To: References: , Message-ID: Thanks Lauz and Simon. Next question and I presume the answer is "yes": if you specify a node explicitly that already has a certain config applied through a nodeclass, the value that has been set specific to that node should override the nodeclass setting. Correct? Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of laurence at qsplace.co.uk Sent: 27 January 2017 17:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] ?spam? Nodeclasses question Richard, As Simon notes in 3.5 they were expanded and where a pain; however this has since been tidied up and now works as it "should". So any further node added to a group will inherit the relevant parts of the config. i.e. (I've snipped the boring bits out) mmlsnodeclass Node Class Name Members --------------------- ----------------------------------------------------------- site2 s2gpfs1.site2,s2gpfs2.site2 mmchconfig pagepool=2G -N site2 mmshutdown -a mmstartup -a mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs3.site2: pagepool 1073741824 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 mmchnodeclass site2 add -N s2gpfs3.site2 mmshutdown -N s2gpfs3.site2 mmstartup -N s2gpfs3.site2 mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 s2gpfs3.site2: ! pagepool 2147483648 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 -- Lauz On 2017-01-27 12:43, Simon Thompson (Research Computing - IT Services) wrote: > I think this depends on you FS min version. > > We had some issues where ours was still set to 3.5 I think even though > we have 4.x clients. The nodeclasses in mmlsconfig were expanded to > individual nodes. But adding a node to a node class would apply the > config to the node, though I'd expect you to have to stop/restart GPFS > on the node and not expect it to work like "mmchconfig -I" > > Simon > > From: on behalf of "Sobey, > Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > > Date: Friday, 27 January 2017 at 11:12 > To: "gpfsug-discuss at spectrumscale.org" > > Subject: ?spam? [gpfsug-discuss] Nodeclasses question > > All, > > Can it be clarified whether specifying "-N ces" (for example, I > have a custom nodeclass called ces containing CES nodes of course) > will then apply changes to future nodes that join the same nodeclass? > > For example, "mmchconfig maxFilesToCache=100000 -N ces" will > give existing nodes that new config. I then add a 5th node to the > nodeclass. Will it inherit the cache value or will I need to set it > again? > > Thanks > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jan 27 22:54:51 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 27 Jan 2017 17:54:51 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: References: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> Message-ID: <239473a0-a8b7-0f13-f55d-a9e85948ce19@nasa.gov> This is rather disconcerting. We just finished upgrading our nsd servers from 3.5.0.31 to 4.1.1.10 (All clients were previously migrated from 3.5.0.31 to 4.1.1.10). After finishing that upgrade I'm now seeing these errors with some frequency (a couple every few minutes). Anyone have insight? On 1/18/17 11:58 AM, Brian Marshall wrote: > As background, we recently upgraded GPFS from 4.2.0 to 4.2.1 and > updated the Mellanox OFED on our compute cluster to allow it to move > from CentOS 7.1 to 7.2 > > We do some transient warnings from the Mellanox switch gear about > various port counters that we are tracking down with them. > > Jobs and filesystem seem stable, but the logs are concerning. > > On Wed, Jan 18, 2017 at 10:22 AM, Aaron Knister > > wrote: > > I'm curious about this too. We see these messages sometimes when > things have gone horribly wrong but also sometimes during recovery > events. Here's a recent one: > > loremds20 (manager/nsd node): > Mon Jan 16 14:19:02.048 2017: [E] VERBS RDMA rdma read error > IBV_WC_REM_ACCESS_ERR to 10.101.11.6 (lorej006) on mlx5_0 port 1 > fabnum 3 vendor_err 136 > Mon Jan 16 14:19:02.049 2017: [E] VERBS RDMA closed connection to > 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 due to RDMA read > error IBV_WC_REM_ACCESS_ERR index 11 > > lorej006 (client): > Mon Jan 16 14:19:01.990 2017: [N] VERBS RDMA closed connection to > 10.101.53.18 (loremds18) on mlx5_0 port 1 fabnum 3 index 2 > Mon Jan 16 14:19:01.995 2017: [N] VERBS RDMA closed connection to > 10.101.53.19 (loremds19) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:01.997 2017: [I] Recovering nodes: 10.101.53.18 > 10.101.53.19 > Mon Jan 16 14:19:02.047 2017: [W] VERBS RDMA async event > IBV_EVENT_QP_ACCESS_ERR on mlx5_0 qp 0x7fffe550f1c8. > Mon Jan 16 14:19:02.051 2017: [E] VERBS RDMA closed connection to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 error 733 index 1 > Mon Jan 16 14:19:02.071 2017: [I] Recovered 2 nodes for file system > tnb32. > Mon Jan 16 14:19:02.140 2017: [I] VERBS RDMA connecting to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:02.160 2017: [I] VERBS RDMA connected to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 sl 0 index 0 > > I had just shut down loremds18 and loremds19 so there was certainly > recovery taking place and during that time is when the error seems > to have occurred. > > I looked up the meaning of IBV_WC_REM_ACCESS_ERR here > (http://www.rdmamojo.com/2013/02/15/ibv_poll_cq/ > ) and see this: > > IBV_WC_REM_ACCESS_ERR (10) - Remote Access Error: a protection error > occurred on a remote data buffer to be read by an RDMA Read, written > by an RDMA Write or accessed by an atomic operation. This error is > reported only on RDMA operations or atomic operations. Relevant for > RC QPs. > > my take on it during recovery it seems like one end of the > connection more or less hanging up on the other end (e.g. Connection > reset by peer > /ECONNRESET). > > But like I said at the start, we also see this when there something > has gone awfully wrong. > > -Aaron > > On 1/18/17 3:59 AM, Simon Thompson (Research Computing - IT > Services) wrote: > > I'd be inclined to look at something like: > > ibqueryerrors -s > PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors > -c > > And see if you have a high number of symbol errors, might be a cable > needs replugging or replacing. > > Simon > > From: > >> on behalf of > "J. Eric > Wonderley" > >> > Reply-To: "gpfsug-discuss at spectrumscale.org > > >" > > >> > Date: Tuesday, 17 January 2017 at 21:16 > To: "gpfsug-discuss at spectrumscale.org > > >" > > >> > Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs > > I have messages like these frequent my logs: > Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error > IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 > fabnum 0 > vendor_err 136 > Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to > 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error > IBV_WC_REM_ACCESS_ERR index 23 > > Any ideas on cause..? > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From jonathon.anderson at colorado.edu Mon Jan 30 22:10:25 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Mon, 30 Jan 2017 22:10:25 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. From olaf.weiser at de.ibm.com Tue Jan 31 08:30:19 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 09:30:19 +0100 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 15:13:34 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 15:13:34 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Jan 31 15:42:33 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 16:42:33 +0100 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 16:32:18 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 16:32:18 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 16:35:23 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 16:35:23 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: <1515B2FC-1B1B-4A8B-BB7B-CD7C815B662A@colorado.edu> > [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa Just to head-off any concerns that this problem is a result of the ces-ip in this command not being one of the ces ips added in my earlier examples, this is just an artifact of changing configuration during the troubleshooting process. I realized that while 10.225.71.{104,105} were allocated to this node, they were to be used for something else, and shouldn?t be under CES control; so I changed our CES addresses to 10.225.71.{102,103}. On 1/30/17, 3:10 PM, "Jonathon A Anderson" wrote: In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. From olaf.weiser at de.ibm.com Tue Jan 31 17:45:17 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 17:45:17 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: Message-ID: I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von:"Jonathon A Anderson" An:"gpfsug main discussion list" Datum:Di. 31.01.2017 17:32Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 17:47:12 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 17:47:12 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: Message-ID: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jan 31 20:07:14 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 31 Jan 2017 20:07:14 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> References: , <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> Message-ID: We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes. According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 17:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Tue Jan 31 20:11:31 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 20:11:31 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> Message-ID: Simon, This is what I?d usually do, and I?m pretty sure it?d fix the problem; but we only have two protocol nodes, so no good way to do quorum in a separate cluster of just those two. Plus, I?d just like to see the bug fixed. I suppose we could move the compute nodes to a separate cluster, and keep the protocol nodes together with the NSD servers; but then I?m back to the age-old question of ?do I technically violate the GPFS license in order to do the right thing architecturally?? (Since you have to nominate GPFS servers in the client-only cluster to manage quorum, for nodes that only have client licenses.) So far, we?re 100% legit, and it?d be better to stay that way. ~jonathon On 1/31/17, 1:07 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services)" wrote: We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes. According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 17:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jan 31 20:21:10 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 31 Jan 2017 20:21:10 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> , Message-ID: Ah we have separate server licensed nodes in the hpc cluster (typically we have some stuff for config management, monitoring etc, so we license those as servers). Agreed the bug should be fixed, I was meaning that we probably don't see it as the CES cluster is 4 nodes serving protocols (plus some other data access boxes). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 20:11 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Simon, This is what I?d usually do, and I?m pretty sure it?d fix the problem; but we only have two protocol nodes, so no good way to do quorum in a separate cluster of just those two. Plus, I?d just like to see the bug fixed. I suppose we could move the compute nodes to a separate cluster, and keep the protocol nodes together with the NSD servers; but then I?m back to the age-old question of ?do I technically violate the GPFS license in order to do the right thing architecturally?? (Since you have to nominate GPFS servers in the client-only cluster to manage quorum, for nodes that only have client licenses.) So far, we?re 100% legit, and it?d be better to stay that way. ~jonathon On 1/31/17, 1:07 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services)" wrote: We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes. According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 17:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Tue Jan 31 22:47:23 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 22:47:23 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: Message-ID: Yeah... depending on the #nodes you 're affected or not. ..... So if your remote ces cluster is small enough in terms of the #nodes ... you'll neuer hit into this issue Gesendet von IBM Verse Simon Thompson (Research Computing - IT Services) --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von:"Simon Thompson (Research Computing - IT Services)" An:"gpfsug main discussion list" Datum:Di. 31.01.2017 21:07Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes.According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken.Simon________________________________________From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu]Sent: 31 January 2017 17:47To: gpfsug main discussion listSubject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesYeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment.~jonathonFrom: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesI ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi kGesendet von IBM VerseJonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ---Von:"Jonathon A Anderson" An:"gpfsug main discussion list" Datum:Di. 31.01.2017 17:32Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes________________________________No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort.I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR?Thanks.~jonathonFrom: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesok.. so obviously ... it seems , that we have several issues..the 3983 characters is obviously a defecthave you already raised a PMR , if so , can you send me the number ?From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PMSubject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesSent by: gpfsug-discuss-bounces at spectrumscale.org________________________________The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread.The actual command istsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefileBut you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster.[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l120[root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l403Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters.[root at sgate2 ~]# tsctl shownodes up | wc -c3983Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete.[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1shas0260-opa.rc.int.col[root at sgate2 ~]#I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :)I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though.For the record:[root at sgate2 ~]# rpm -qa | grep -i gpfsgpfs.base-4.2.1-2.x86_64gpfs.msg.en_US-4.2.1-2.noarchgpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64gpfs.gskit-8.0.50-57.x86_64gpfs.gpl-4.2.1-2.noarchnfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64gpfs.ext-4.2.1-2.x86_64gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64gpfs.docs-4.2.1-2.noarch~jonathonFrom: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesHi ...same thing here.. everything after 10 nodes will be truncated..though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-)the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items...should be easy to fix..cheersolafFrom: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PMSubject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesSent by: gpfsug-discuss-bounces at spectrumscale.org________________________________In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm?Here are the details of my investigation:## GPFS is up on sgate2[root at sgate2 ~]# mmgetstateNode number Node name GPFS state------------------------------------------ 414 sgate2-opa active## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down[root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opammces address move: GPFS is down on this node.mmces address move: Command failed. Examine previous error messages to determine cause.## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs[root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\"%s: GPFS is down on this node."## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList[root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddressdownNodeList=$(getDownCesNodeList)for downNode in $downNodeListdo if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd"## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up`[root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncsfunction getDownCesNodeList{typeset sourceFile="mmcesfuncs.sh"[[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] &&set -x$mmTRACE_ENTER "$*"typeset upnodefile=${cmdTmpDir}upnodefiletypeset downNodeList# get all CES nodes$sort -o $nodefile $mmfsCesNodes.dae$tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefiledownNodeList=$($comm -23 $nodefile $upnodefile)print -- $downNodeList} #----- end of function getDownCesNodeList --------------------## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tailshas0251-opa.rc.int.colorado.edushas0252-opa.rc.int.colorado.edushas0253-opa.rc.int.colorado.edushas0254-opa.rc.int.colorado.edushas0255-opa.rc.int.colorado.edushas0256-opa.rc.int.colorado.edushas0257-opa.rc.int.colorado.edushas0258-opa.rc.int.colorado.edushas0259-opa.rc.int.colorado.edushas0260-opa.rc.int.col[root at sgate2 ~]### I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`.On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far._______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Tue Jan 3 16:12:26 2017 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Tue, 3 Jan 2017 17:12:26 +0100 Subject: [gpfsug-discuss] SAVE THE DATE - IBM Spectrum Scale (GPFS) Strategy Days 2017, Stuttgart/Ehningen, Germany In-Reply-To: References: Message-ID: Save the Date - as there is a large request for a German speaking Spectrum Scale event here is the next event. Am 8. - 9. M?rz 2017 finden die #WhatsUp IBM - Spectrum Scale Strategy Days Expertentage 2017 statt. Sehr geehrte Damen und Herren, das Team der Konferenz l?dt Sie herzlich ein, an dieser kostenfreien Veranstaltung auf dem IBM Campus in Ehningen (bei Stuttgart) teilzunehmen. Die Expertentage stehen unter dem Leitgedanken, sowohl technische Neuerungen und Funktionen im Detail zu erl?utern, als auch praktische Tipps und Erfahrungen aus Projekten auszutauschen. Aufgrund der regen Nachfrage und vorliegenden Themen werden sich auch die diesj?hrigen Expertentage ?ber zwei Tage erstrecken, um somit den komplexen neuen Funktionen sowie auch dem Erfahrungsaustausch unter Kollegen und anwesenden Experten zwischen den Vortr?gen entsprechend gerecht zu werden.? Die Veranstaltung richtet sich an alle, die die M?glichkeiten von Spectrum Scale innerhalb kurzer Zeit besser nutzen m?chten und/oder sich ?ber die mit Spectrum Scale gemachten Erfahrungen austauschen wollen. Das zweit?gige Programm der Expertentage informiert neben Produktupdates, technischen Details und Serviceangeboten auch ?ber zuk?nftige Releases. Die genaue Programm?bersicht kommt ab Mitte Januar 2017 auf die Registrierungsseite. Anmeldung ist aber schon m?glich unter: 1) Anmeldelink f?r Expertentage 2017 https://www.ibm.com/events/wwe/grp/grp312.nsf/Registration.xsp?openform&seminar=Z9AH7POE&locale=de_DE Beginn 8. M?rz 2017 um 10:00 Uhr, Ende am 9. M?rz gegen 16:00 Uhr Sie oder ihre Kollegen besch?ftigen sich erstmalig mit Spectrum Scale oder m?chten ihr Basis Wissen auffrischen ? F?r Spectrum Scale Einsteiger bieten wir am 7. M?rz zus?tzlich einen Tag an, an dem die Grundlagen von Spectrum Scale und Elastic Storage Server vermittelt werden. 2) Anmeldelink f?r Einsteigertag 2017 https://www.ibm.com/events/wwe/grp/grp312.nsf/Registration.xsp?openform&seminar=3ACDRTOE&locale=de_DE Beginn am 7. M?rz 2017 um 10:00 Uhr, Ende gegen 17:00 Uhr TEILNEHMERKREIS: Kunden, IBM Vertriebspartner und IBM Mitarbeiter mit fundiertem Spectrum Scale (GPFS) Basiswissen. Es ist ein Workshop von Experten f?r Experten. Die Teilnahme an dem Workshop ist kostenfrei. Sprache ist Deutsch. Ort der Veranstaltung: IBM Deutschland GmbH , IBM-Allee 1 (Navigationssystem: Am Keltenwald 1), 71139 Ehningen (bei Stuttgart) IBM Spectrum Scale (GPFS) ist eine bew?hrte, skalierbare und hochleistungsf?hige L?sung f?r Daten-, Objekt- und Dateimanagement, die in vielen Branchen weltweit intensiv eingesetzt wird. Spectrum Scale bietet vereinfachtes Datenmanagement und integrierte Tools f?r den Informationslebenszyklus, die mehrere Petabytes an Daten und Milliarden Dateien verwalten k?nnen. IBM Spectrum Scale Version 4, das softwaredefinierte Speichersystem f?r die Cloud, f?r Big Data, High Performance Computing und Analysen, bietet erweiterte Sicherheit, Leistungsverbesserungen durch Flash-Speicher Integration und h?here Benutzerfreundlichkeit f?r weltweit operierende Unternehmen, die mit anspruchsvollen und datenintensiven Anwendungen arbeiten. Das Konferenz Team freut sich auf Sie: - Heiko Lehmann, mailto:heiko.lehmann at de.ibm.com - Olaf Weiser, mailto:olaf.weiser at de.ibm.com - Ulf Troppens, mailto:troppens at de.ibm.com - Frank Kraemer, mailto:kraemerf at de.ibm.com - Goetz Mensel, mailto:goetz.mensel at de.ibm.com Appendix: Redbooks/Redpapers Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/redpapers/pdfs/redp5411.pdf IBM Spectrum Scale Security http://www.redbooks.ibm.com/redpieces/pdfs/redp5426.pdf IBM Spectrum Archive Enterprise Edition V1.2.2: Installation and Configuration Guide http://www.redbooks.ibm.com/redpieces/pdfs/sg248333.pdf Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach mailto:kraemerf at de.ibm.com voice: +49-(0)171-3043699 / +4970342741078 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Jan 3 20:27:17 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 3 Jan 2017 14:27:17 -0600 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> this follows the IP what ever node the ip lands on. the ganesha.nfsd process seems to stop working. any ideas? there is nothing helpful in the logs. time mount ces200:/vol/aggr14/temp403 /mnt/test mount.nfs: mount system call failed real 1m0.000s user 0m0.000s sys 0m0.010s From Valdis.Kletnieks at vt.edu Tue Jan 3 21:00:44 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Tue, 03 Jan 2017 16:00:44 -0500 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> Message-ID: <177090.1483477244@turing-police.cc.vt.edu> On Tue, 03 Jan 2017 14:27:17 -0600, Matt Weil said: > this follows the IP what ever node the ip lands on. the ganesha.nfsd > process seems to stop working. any ideas? there is nothing helpful in > the logs. Does it in fact "stop working", or are you just having a mount issue? Do already existing mounts work? Does 'ps' report the process running? Any log messages? > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s Check the obvious stuff first. Is temp403 exported to your test box? Does tcpdump/wireshark show the expected network activity? Does wireshark flag any issues? Is there a firewall issue (remember to check *both* ends :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From abeattie at au1.ibm.com Tue Jan 3 22:19:20 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Tue, 3 Jan 2017 22:19:20 +0000 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu>, <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov><28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu><4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov><0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov><5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu><5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu><45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Tue Jan 3 22:40:48 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Tue, 03 Jan 2017 22:40:48 +0000 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu>, <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov><28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu><4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov><0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov><5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu><5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu><45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Andrew, You may have been stung by: 2.34 What considerations are there when running on SELinux? https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux I've see this issue on a customer site myself. Matt, Could you increase the logging verbosity and check the logs further? As per http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm -- Lauz On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie wrote: >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Tue Jan 3 22:56:48 2017 From: Valdis.Kletnieks at vt.edu (Valdis Kletnieks) Date: Tue, 03 Jan 2017 17:56:48 -0500 Subject: [gpfsug-discuss] What is LTFS/EE now called, and what version should I be on? Message-ID: <186951.1483484208@turing-police.cc.vt.edu> So we have GPFS Advanced 4.2.1 installed, and the following RPMs: % rpm -qa 'ltfs*' | sort ltfsle-2.1.6.0-9706.x86_64 ltfsle-library-2.1.6.0-9706.x86_64 ltfsle-library-plus-2.1.6.0-9706.x86_64 ltfs-license-2.1.0-20130412_2702.x86_64 ltfs-mig-1.2.1.1-10232.x86_64 What release of "Spectrum Archive" does this correspond to, and what release do we need to be on if I upgrade GPFS to 4.2.2.1? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From janfrode at tanso.net Tue Jan 3 23:14:21 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 4 Jan 2017 00:14:21 +0100 Subject: [gpfsug-discuss] What is LTFS/EE now called, and what version should I be on? In-Reply-To: <186951.1483484208@turing-police.cc.vt.edu> References: <186951.1483484208@turing-police.cc.vt.edu> Message-ID: This looks like Spectrum Archive v1.2.1.0 (Build 10230). Newest version available on fixcentral is v1.2.2.0, but it doesn't support GPFS v4.2.2.x yet. -jf On Tue, Jan 3, 2017 at 11:56 PM, Valdis Kletnieks wrote: > So we have GPFS Advanced 4.2.1 installed, and the following RPMs: > > % rpm -qa 'ltfs*' | sort > ltfsle-2.1.6.0-9706.x86_64 > ltfsle-library-2.1.6.0-9706.x86_64 > ltfsle-library-plus-2.1.6.0-9706.x86_64 > ltfs-license-2.1.0-20130412_2702.x86_64 > ltfs-mig-1.2.1.1-10232.x86_64 > > What release of "Spectrum Archive" does this correspond to, > and what release do we need to be on if I upgrade GPFS to 4.2.2.1? > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 4 01:21:34 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 3 Jan 2017 19:21:34 -0600 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> Message-ID: nsds and ces nodes are RHEL 7.3 nfsv3 clients are old ubuntu lucid. we finally just removed the IP that seemed to... when moved to a ces node caused it to stop responding. it hung up a few more times but has been working fine now for the last few hours. maybe a bad client apple out there finally gave up ;-) PMR 50787 122 000 waiting on IBM. On 1/3/17 4:19 PM, Andrew Beattie wrote: > Matt > > What Operating system are you running? > > I have an open PMR at present with something very similar > when ever we publish an NFS export via the protocol nodes the nfs > service stops, although we have no issues publishing SMB exports. > > I"m waiting on some testing by the customer but L3 support have > indicated that they think there is a bug in the SElinux code, which is > causing this issue, and have suggested that we disable SElinux and try > again. > > My clients environment is currently deployed on Centos 7. > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Matt Weil > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding > Date: Wed, Jan 4, 2017 6:27 AM > > this follows the IP what ever node the ip lands on. the ganesha.nfsd > process seems to stop working. any ideas? there is nothing > helpful in > the logs. > > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s > user 0m0.000s > sys 0m0.010s > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 4 01:29:36 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 3 Jan 2017 19:29:36 -0600 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Message-ID: On 1/3/17 4:40 PM, Laurence Horrocks-Barlow wrote: > Andrew, > > You may have been stung by: > > 2.34 What considerations are there when running on SELinux? > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux se is disabled here. Also if you strace the parent ganesha.nfsd process it dies. Is that a bug? > > I've see this issue on a customer site myself. > > > Matt, > > Could you increase the logging verbosity and check the logs further? > As per > http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm yes bumped it to the max of 3 not much help. > > -- Lauz > > On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie > wrote: > > Matt > > What Operating system are you running? > > I have an open PMR at present with something very similar > when ever we publish an NFS export via the protocol nodes the nfs > service stops, although we have no issues publishing SMB exports. > > I"m waiting on some testing by the customer but L3 support have > indicated that they think there is a bug in the SElinux code, > which is causing this issue, and have suggested that we disable > SElinux and try again. > > My clients environment is currently deployed on Centos 7. > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Matt Weil > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding > Date: Wed, Jan 4, 2017 6:27 AM > > this follows the IP what ever node the ip lands on. the > ganesha.nfsd > process seems to stop working. any ideas? there is nothing > helpful in > the logs. > > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s > user 0m0.000s > sys 0m0.010s > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Wed Jan 4 02:16:54 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Tue, 03 Jan 2017 21:16:54 -0500 Subject: [gpfsug-discuss] What is LTFS/EE now called, and what version should I be on? In-Reply-To: References: <186951.1483484208@turing-police.cc.vt.edu> Message-ID: <200291.1483496214@turing-police.cc.vt.edu> On Wed, 04 Jan 2017 00:14:21 +0100, Jan-Frode Myklebust said: > This looks like Spectrum Archive v1.2.1.0 (Build 10230). Newest version > available on fixcentral is v1.2.2.0, but it doesn't support GPFS v4.2.2.x > yet. That's what I was afraid of. OK, shelve that option, and call IBM for the efix. (The backstory: IBM announced a security issue in GPFS: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009639&myns=s033&mynp=OCSTXKQY&mynp=OCSWJ00&mync=E&cm_sp=s033-_-OCSTXKQY-OCSWJ00-_-E A security vulnerability has been identified in IBM Spectrum Scale (GPFS) that could allow a remote authenticated attacker to overflow a buffer and execute arbitrary code on the system with root privileges or cause the server to crash. This vulnerability is only applicable if: - file encryption is being used - the key management infrastructure has been compromised -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From rkomandu at in.ibm.com Wed Jan 4 07:17:25 2017 From: rkomandu at in.ibm.com (Ravi K Komanduri) Date: Wed, 4 Jan 2017 12:47:25 +0530 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu><28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu><4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov><0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov><5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu><5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu><45b19a50-bb70-1025-71ea-80a260623712@wustl.edu><0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Message-ID: My two cents, Have the SELinux enabled on my RH7.3 cluster (where CES nodes are RH 7,3). GPFS latest version(4.2.2) is on the cluster. Non SELinux env, should mount w/o issues as well Tried mounting for 50 iters as V3 for 2 different mounts from 4 client nodes. Ran successfully. My client nodes are RH/SLES clients Could you elaborate further. With Regards, Ravi K Komanduri From: Matt Weil To: Date: 01/04/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES nodes mount nfsv3 not responding Sent by: gpfsug-discuss-bounces at spectrumscale.org On 1/3/17 4:40 PM, Laurence Horrocks-Barlow wrote: Andrew, You may have been stung by: 2.34 What considerations are there when running on SELinux? https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux se is disabled here. Also if you strace the parent ganesha.nfsd process it dies. Is that a bug? I've see this issue on a customer site myself. Matt, Could you increase the logging verbosity and check the logs further? As per http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm yes bumped it to the max of 3 not much help. -- Lauz On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie wrote: Matt What Operating system are you running? I have an open PMR at present with something very similar when ever we publish an NFS export via the protocol nodes the nfs service stops, although we have no issues publishing SMB exports. I"m waiting on some testing by the customer but L3 support have indicated that they think there is a bug in the SElinux code, which is causing this issue, and have suggested that we disable SElinux and try again. My clients environment is currently deployed on Centos 7. Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Matt Weil Sent by: gpfsug-discuss-bounces at spectrumscale.org To: Cc: Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding Date: Wed, Jan 4, 2017 6:27 AM this follows the IP what ever node the ip lands on. the ganesha.nfsd process seems to stop working. any ideas? there is nothing helpful in the logs. time mount ces200:/vol/aggr14/temp403 /mnt/test mount.nfs: mount system call failed real 1m0.000s user 0m0.000s sys 0m0.010s _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 4 09:06:29 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 4 Jan 2017 09:06:29 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: , Message-ID: Simon, Is this PMR still open or was the issue resolved? I'm very interested to know as 4.2.2 is on my roadmap. Thanks Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 20 December 2016 17:14 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB issues Nope, just lots of messages with the same error, but different folders. I've opened a pmr with IBM and supplied the usual logs. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Christof Schmitt [christof.schmitt at us.ibm.com] Sent: 19 December 2016 17:31 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB issues >From this message, it does not look like a known problem. Are there other messages leading up to the one you mentioned? I would suggest reporting this through a PMR. Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/19/2016 08:37 AM Subject: [gpfsug-discuss] SMB issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Wed Jan 4 10:20:30 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 4 Jan 2017 10:20:30 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: Its still open. I can say we are happily running 4.2.2, just not the SMB packages that go with it. So the GPFS part, I wouldn't have thought would be a problem to upgrade. Simon On 04/01/2017, 09:06, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A" wrote: >Simon, > >Is this PMR still open or was the issue resolved? I'm very interested to >know as 4.2.2 is on my roadmap. > >Thanks >Richard > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon >Thompson (Research Computing - IT Services) >Sent: 20 December 2016 17:14 >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] SMB issues > > >Nope, just lots of messages with the same error, but different folders. > >I've opened a pmr with IBM and supplied the usual logs. > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org >[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Christof Schmitt >[christof.schmitt at us.ibm.com] >Sent: 19 December 2016 17:31 >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] SMB issues > >From this message, it does not look like a known problem. Are there other >messages leading up to the one you mentioned? > >I would suggest reporting this through a PMR. > >Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ >christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) > > > >From: "Simon Thompson (Research Computing - IT Services)" > >To: "gpfsug-discuss at spectrumscale.org" > >Date: 12/19/2016 08:37 AM >Subject: [gpfsug-discuss] SMB issues >Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > >Hi All, > >We upgraded to 4.2.2.0 last week as well as to >gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. > >We've since been getting random users reporting that they get access >denied errors when trying to access folders. Some seem to work fine and >others not, but it seems to vary and change by user (for example this >morning, I could see all my folders fine, but later I could only see >some). From my Mac connecting to the SMB shares, I could connect fine to >the share, but couldn't list files in the folder (I guess this is what >users were seeing from Windows as access denied). > >In the log.smbd, we are seeing errors such as this: > >[2016/12/19 15:20:40.649580, 0] >../source3/lib/sysquotas.c:457(sys_get_quota) > sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! > > > >Reverting to the previous version of SMB we were running >(gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. > >Before I log a PMR, has anyone else seen this behaviour or have any >suggestions? > >Thanks > >Simon > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From laurence at qsplace.co.uk Wed Jan 4 17:13:50 2017 From: laurence at qsplace.co.uk (laurence at qsplace.co.uk) Date: Wed, 04 Jan 2017 17:13:50 +0000 Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding In-Reply-To: References: <5ea87228-339b-c457-097e-1a5ccf073d55@wustl.edu> <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> <0CEDE53A-B89F-4070-A681-49BC7B93D152@qsplace.co.uk> Message-ID: Hi Matt, The only time I've seen strace "crash" ganesha is when having selinux enabled which ofc was related to selinux. Have you also changed NFS's logging level (also in the link given)? Check the current level with: mmnfs configuration list | grep LOG_LEVEL I find INFO or DEBUG enough to get just that little extra nugget of information you need, however if that's already at FULL_DEBUG and your still not finding anything helpful it might be time to log a PMR. --Lauz On 2017-01-04 01:29, Matt Weil wrote: > On 1/3/17 4:40 PM, Laurence Horrocks-Barlow wrote: > >> Andrew, >> >> You may have been stung by: >> >> 2.34 What considerations are there when running on SELinux? >> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux [1] > se is disabled here. > Also if you strace the parent ganesha.nfsd process it dies. Is that a bug? > >> I've see this issue on a customer site myself. >> >> Matt, >> >> Could you increase the logging verbosity and check the logs further? As per >> http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm [2] > yes bumped it to the max of 3 not much help. > > -- Lauz > > On 3 January 2017 22:19:20 GMT+00:00, Andrew Beattie wrote: > > Matt > > What Operating system are you running? > > I have an open PMR at present with something very similar > when ever we publish an NFS export via the protocol nodes the nfs service stops, although we have no issues publishing SMB exports. > > I"m waiting on some testing by the customer but L3 support have indicated that they think there is a bug in the SElinux code, which is causing this issue, and have suggested that we disable SElinux and try again. > > My clients environment is currently deployed on Centos 7. > > Andrew Beattie > Software Defined Storage - IT Specialist > > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > ----- Original message ----- > From: Matt Weil > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [gpfsug-discuss] CES nodes mount nfsv3 not responding > Date: Wed, Jan 4, 2017 6:27 AM > > this follows the IP what ever node the ip lands on. the ganesha.nfsd > process seems to stop working. any ideas? there is nothing helpful in > the logs. > > time mount ces200:/vol/aggr14/temp403 /mnt/test > mount.nfs: mount system call failed > > real 1m0.000s > user 0m0.000s > sys 0m0.010s > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss [3] -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [3] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [3] Links: ------ [1] https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html?view=kc#selinux [2] http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.pdg.doc/bl1pdg_CESNFSserverlog.htm [3] http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Wed Jan 4 17:55:13 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Wed, 4 Jan 2017 12:55:13 -0500 Subject: [gpfsug-discuss] strange mmchnsd error? Message-ID: [root at cl001 ~]# cat chnsd_home_flh %nsd: nsd=r10f1e5 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e5 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 %nsd: nsd=r10f1e6 servers=cl006,cl007,cl008,cl001,cl002,cl003,cl004,cl005 %nsd: nsd=r10f6e6 servers=cl005,cl006,cl007,cl008,cl001,cl002,cl003,cl004 %nsd: nsd=r10f1e7 servers=cl004,cl005,cl006,cl007,cl008,cl001,cl002,cl003 %nsd: nsd=r10f6e7 servers=cl003,cl004,cl005,cl006,cl007,cl008,cl001,cl002 %nsd: nsd=r10f1e8 servers=cl002,cl003,cl004,cl005,cl006,cl007,cl008,cl001 %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 %nsd: nsd=r10f1e9 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e9 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 [root at cl001 ~]# mmchnsd -F chnsd_home_flh mmchnsd: Processing disk r10f6e5 mmchnsd: Processing disk r10f6e6 mmchnsd: Processing disk r10f6e7 mmchnsd: Processing disk r10f6e8 mmchnsd: Node cl005.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl006.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl007.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl008.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Error found while processing stanza %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 mmchnsd: Processing disk r10f1e9 mmchnsd: Processing disk r10f6e9 mmchnsd: Command failed. Examine previous error messages to determine cause. I comment out the r10f6e8 line and then it completes? I have some sort of fabric san issue: [root at cl005 ~]# for i in {1..8}; do ssh cl00$i lsscsi -s | grep 38xx | grep 1.97 | wc -l; done 80 80 80 80 68 72 70 72 but i'm suprised removing one line allows it to complete. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Jan 4 17:58:25 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 4 Jan 2017 17:58:25 +0000 Subject: [gpfsug-discuss] strange mmchnsd error? In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DCF61@CHI-EXCHANGEW1.w2k.jumptrading.com> ENODEV usually means that the disk device was not found on the server(s) in the server list. In this case c100[5-8] do not apparently have access to r10f6e8, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Wednesday, January 04, 2017 11:55 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] strange mmchnsd error? [root at cl001 ~]# cat chnsd_home_flh %nsd: nsd=r10f1e5 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e5 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 %nsd: nsd=r10f1e6 servers=cl006,cl007,cl008,cl001,cl002,cl003,cl004,cl005 %nsd: nsd=r10f6e6 servers=cl005,cl006,cl007,cl008,cl001,cl002,cl003,cl004 %nsd: nsd=r10f1e7 servers=cl004,cl005,cl006,cl007,cl008,cl001,cl002,cl003 %nsd: nsd=r10f6e7 servers=cl003,cl004,cl005,cl006,cl007,cl008,cl001,cl002 %nsd: nsd=r10f1e8 servers=cl002,cl003,cl004,cl005,cl006,cl007,cl008,cl001 %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 %nsd: nsd=r10f1e9 servers=cl008,cl001,cl002,cl003,cl004,cl005,cl006,cl007 %nsd: nsd=r10f6e9 servers=cl007,cl008,cl001,cl002,cl003,cl004,cl005,cl006 [root at cl001 ~]# mmchnsd -F chnsd_home_flh mmchnsd: Processing disk r10f6e5 mmchnsd: Processing disk r10f6e6 mmchnsd: Processing disk r10f6e7 mmchnsd: Processing disk r10f6e8 mmchnsd: Node cl005.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl006.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl007.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Node cl008.cl.arc.internal returned ENODEV for disk r10f6e8. mmchnsd: Error found while processing stanza %nsd: nsd=r10f6e8 servers=cl001,cl002,cl003,cl004,cl005,cl006,cl007,cl008 mmchnsd: Processing disk r10f1e9 mmchnsd: Processing disk r10f6e9 mmchnsd: Command failed. Examine previous error messages to determine cause. I comment out the r10f6e8 line and then it completes? I have some sort of fabric san issue: [root at cl005 ~]# for i in {1..8}; do ssh cl00$i lsscsi -s | grep 38xx | grep 1.97 | wc -l; done 80 80 80 80 68 72 70 72 but i'm suprised removing one line allows it to complete. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Jan 4 19:57:07 2017 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 4 Jan 2017 19:57:07 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server Message-ID: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> I?m getting stumped trying to test out TCT on a centos based 4.2.2.0 cluster and getting the following error when I?m trying to install the gpfs.tct.server rpm. rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm error: Failed dependencies: redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 I realize that Centos isn?t ?officially? supported but this is kind of lame to check for the redhat-release package instead of whatever library (ssl) or some such that is installed instead. Anyone able to do this or know a workaround? I did a quick search on the wiki and in previous posts on this list and didn?t see anything obvious. Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jan 4 20:00:50 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 4 Jan 2017 20:00:50 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server Message-ID: Just add ??nodeps? to the rpm install line, it will go just fine. Been working just fine on my CentOS system using this method. rpm -ivh --nodeps gpfs.tct.server-1.1.2_987.x86_64.rpm Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Mark.Bush at siriuscom.com" Reply-To: gpfsug main discussion list Date: Wednesday, January 4, 2017 at 1:57 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] TCT and redhat-release-server I?m getting stumped trying to test out TCT on a centos based 4.2.2.0 cluster and getting the following error when I?m trying to install the gpfs.tct.server rpm. rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm error: Failed dependencies: redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 I realize that Centos isn?t ?officially? supported but this is kind of lame to check for the redhat-release package instead of whatever library (ssl) or some such that is installed instead. Anyone able to do this or know a workaround? I did a quick search on the wiki and in previous posts on this list and didn?t see anything obvious. Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevindjo at us.ibm.com Wed Jan 4 20:04:23 2017 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Wed, 4 Jan 2017 20:04:23 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> References: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Wed Jan 4 20:10:11 2017 From: orichards at pixitmedia.com (Orlando Richards) Date: Wed, 4 Jan 2017 20:10:11 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: References: <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> Message-ID: This is an RPM dependency check, rather than checking anything about the system state (such as the contents of /etc/redhat-release). In the past, I've built a dummy rpm with no contents to work around these. I don't think you can do a "--force" on a yum install - so you can't "yum install gpfs.tct.server" unless you do something like that. Would be great to get it removed from the rpm dependencies if possible. On 04/01/2017 20:04, Kevin D Johnson wrote: > I believe it's checking /etc/redhat-release --- if you create that > file with the appropriate red hat version number (like /etc/issue for > CentOS), it should work. > > ----- Original message ----- > From: "Mark.Bush at siriuscom.com" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] TCT and redhat-release-server > Date: Wed, Jan 4, 2017 2:58 PM > > I?m getting stumped trying to test out TCT on a centos based > 4.2.2.0 cluster and getting the following error when I?m trying to > install the gpfs.tct.server rpm. > > rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm > > error: Failed dependencies: > > redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 > > I realize that Centos isn?t ?officially? supported but this is > kind of lame to check for the redhat-release package instead of > whatever library (ssl) or some such that is installed instead. > > Anyone able to do this or know a workaround? I did a quick search > on the wiki and in previous posts on this list and didn?t see > anything obvious. > > Mark > > This message (including any attachments) is intended only for the > use of the individual or entity to which it is addressed and may > contain information that is non-public, proprietary, privileged, > confidential, and exempt from disclosure under applicable law. If > you are not the intended recipient, you are hereby notified that > any use, dissemination, distribution, or copying of this > communication is strictly prohibited. This message may be viewed > by parties at Sirius Computer Solutions other than those named in > the message header. This message does not contain an official > representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions > immediately and (i) destroy this message if a facsimile or (ii) > delete this message immediately if this is an electronic > communication. Thank you. > > Sirius Computer Solutions > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Orlando Richards* VP Product Development, Pixit Media 07930742808|orichards at pixitmedia.com www.pixitmedia.com |Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevindjo at us.ibm.com Wed Jan 4 20:15:19 2017 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Wed, 4 Jan 2017 20:15:19 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: References: , <76A8A489-C46E-441C-9C9A-0E515200F325@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Jan 4 20:16:37 2017 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 4 Jan 2017 20:16:37 +0000 Subject: [gpfsug-discuss] TCT and redhat-release-server In-Reply-To: References: Message-ID: <3EBE8846-7757-4957-9F01-DE4CAE558106@siriuscom.com> Success! Thanks Robert. From: "Oesterlin, Robert" Reply-To: gpfsug main discussion list Date: Wednesday, January 4, 2017 at 2:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] TCT and redhat-release-server Just add ??nodeps? to the rpm install line, it will go just fine. Been working just fine on my CentOS system using this method. rpm -ivh --nodeps gpfs.tct.server-1.1.2_987.x86_64.rpm Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Mark.Bush at siriuscom.com" Reply-To: gpfsug main discussion list Date: Wednesday, January 4, 2017 at 1:57 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] TCT and redhat-release-server I?m getting stumped trying to test out TCT on a centos based 4.2.2.0 cluster and getting the following error when I?m trying to install the gpfs.tct.server rpm. rpm -ivh --force gpfs.tct.server-1.1.2_987.x86_64.rpm error: Failed dependencies: redhat-release-server >= 6.0 is needed by gpfs.tct.server-1-1.2.x86_64 I realize that Centos isn?t ?officially? supported but this is kind of lame to check for the redhat-release package instead of whatever library (ssl) or some such that is installed instead. Anyone able to do this or know a workaround? I did a quick search on the wiki and in previous posts on this list and didn?t see anything obvious. Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Jan 5 20:00:36 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 5 Jan 2017 15:00:36 -0500 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? Message-ID: I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jan 5 20:06:18 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 5 Jan 2017 20:06:18 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> There may be an issue with one of the other NSDs in the file system according to the ?mmadddisk: File system home has some disks that are in a non-ready state.? message in our output. Best to check the status of the NSDs in the file system using the `mmlsdisk home` and if any disks are not ?up? then run the `mmchdisk home start -a` command after confirming that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes tspreparedisk ?s | dshbak ?c` for that. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:01 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] nsd not adding with one quorum node down? I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Jan 5 20:13:28 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 5 Jan 2017 15:13:28 -0500 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Bryan: Have you ever attempted to do this knowing that one quorum server is down? *all* nsdservers will not see the nsd about to be added? How about temporarily removing quorum from a nsd server...? Thanks On Thu, Jan 5, 2017 at 3:06 PM, Bryan Banister wrote: > There may be an issue with one of the other NSDs in the file system > according to the ?mmadddisk: File system home has some disks that are in > a non-ready state.? message in our output. Best to check the status of > the NSDs in the file system using the `mmlsdisk home` and if any disks are > not ?up? then run the `mmchdisk home start -a` command after confirming > that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes > tspreparedisk ?s | dshbak ?c` for that. > > > > Hope that helps, > > -Bryan > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss- > bounces at spectrumscale.org] *On Behalf Of *J. Eric Wonderley > *Sent:* Thursday, January 05, 2017 2:01 PM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] nsd not adding with one quorum node down? > > > > I have one quorum node down and attempting to add a nsd to a fs: > [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee > /root/adddisk_flh_home.out > Verifying file system configuration information ... > > The following disks of home will be formatted on node cl003: > r10f1e5: size 1879610 MB > Extending Allocation Map > Checking Allocation Map for storage pool fc_ssd400G > 55 % complete on Thu Jan 5 14:43:31 2017 > Lost connection to file system daemon. > mmadddisk: tsadddisk failed. > Verifying file system configuration information ... > mmadddisk: File system home has some disks that are in a non-ready state. > mmadddisk: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > mmadddisk: Command failed. Examine previous error messages to determine > cause. > > Had to use -v no (this failed once before). Anyhow I next see: > [root at cl002 ~]# mmgetstate -aL > > Node number Node name Quorum Nodes up Total nodes GPFS state > Remarks > ------------------------------------------------------------ > ------------------------ > 1 cl001 0 0 8 down > quorum node > 2 cl002 5 6 8 active > quorum node > 3 cl003 5 0 8 arbitrating > quorum node > 4 cl004 5 6 8 active > quorum node > 5 cl005 5 6 8 active > quorum node > 6 cl006 5 6 8 active > quorum node > 7 cl007 5 6 8 active > quorum node > 8 cl008 5 6 8 active > quorum node > [root at cl002 ~]# mmlsdisk home > disk driver sector failure holds > holds storage > name type size group metadata data status > availability pool > ------------ -------- ------ ----------- -------- ----- ------------- > ------------ ------------ > r10f1e5 nsd 512 1001 No Yes allocmap add > up fc_ssd400G > r6d2e8 nsd 512 1001 No Yes ready > up fc_8T > r6d3e8 nsd 512 1001 No Yes ready > up fc_8T > > Do all quorum node have to be up and participating to do these admin type > operations? > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jan 5 20:27:24 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 5 Jan 2017 20:27:24 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> Removing the quorum designation is an option. However I believe the file system manager must be assigned to the file system in order for the mmadddisk to work. If the file system manager is not assigned (mmlsmgr to check) or continuously is reassigned to nodes but that fails (check /var/adm/ras/mmfs.log.latest on all nodes) or is blocked from being assigned due to the apparent node recovery in the cluster indicated by the one node in the ?arbitrating? state, then the mmadddisk will not succeed. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] nsd not adding with one quorum node down? Bryan: Have you ever attempted to do this knowing that one quorum server is down? *all* nsdservers will not see the nsd about to be added? How about temporarily removing quorum from a nsd server...? Thanks On Thu, Jan 5, 2017 at 3:06 PM, Bryan Banister > wrote: There may be an issue with one of the other NSDs in the file system according to the ?mmadddisk: File system home has some disks that are in a non-ready state.? message in our output. Best to check the status of the NSDs in the file system using the `mmlsdisk home` and if any disks are not ?up? then run the `mmchdisk home start -a` command after confirming that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes tspreparedisk ?s | dshbak ?c` for that. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:01 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] nsd not adding with one quorum node down? I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jan 5 20:44:33 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 5 Jan 2017 20:44:33 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> Looking at this further, the output says the ?The following disks of home will be formatted on node cl003:? however that node is the node in ?arbitrating? state, so I don?t see how that would work, -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister Sent: Thursday, January 05, 2017 2:27 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] nsd not adding with one quorum node down? Removing the quorum designation is an option. However I believe the file system manager must be assigned to the file system in order for the mmadddisk to work. If the file system manager is not assigned (mmlsmgr to check) or continuously is reassigned to nodes but that fails (check /var/adm/ras/mmfs.log.latest on all nodes) or is blocked from being assigned due to the apparent node recovery in the cluster indicated by the one node in the ?arbitrating? state, then the mmadddisk will not succeed. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:13 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] nsd not adding with one quorum node down? Bryan: Have you ever attempted to do this knowing that one quorum server is down? *all* nsdservers will not see the nsd about to be added? How about temporarily removing quorum from a nsd server...? Thanks On Thu, Jan 5, 2017 at 3:06 PM, Bryan Banister > wrote: There may be an issue with one of the other NSDs in the file system according to the ?mmadddisk: File system home has some disks that are in a non-ready state.? message in our output. Best to check the status of the NSDs in the file system using the `mmlsdisk home` and if any disks are not ?up? then run the `mmchdisk home start -a` command after confirming that all nsdservers can see the disks. I typically use `mmdsh -N nsdnodes tspreparedisk ?s | dshbak ?c` for that. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of J. Eric Wonderley Sent: Thursday, January 05, 2017 2:01 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] nsd not adding with one quorum node down? I have one quorum node down and attempting to add a nsd to a fs: [root at cl005 ~]# mmadddisk home -F add_1_flh_home -v no |& tee /root/adddisk_flh_home.out Verifying file system configuration information ... The following disks of home will be formatted on node cl003: r10f1e5: size 1879610 MB Extending Allocation Map Checking Allocation Map for storage pool fc_ssd400G 55 % complete on Thu Jan 5 14:43:31 2017 Lost connection to file system daemon. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: File system home has some disks that are in a non-ready state. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Had to use -v no (this failed once before). Anyhow I next see: [root at cl002 ~]# mmgetstate -aL Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ------------------------------------------------------------------------------------ 1 cl001 0 0 8 down quorum node 2 cl002 5 6 8 active quorum node 3 cl003 5 0 8 arbitrating quorum node 4 cl004 5 6 8 active quorum node 5 cl005 5 6 8 active quorum node 6 cl006 5 6 8 active quorum node 7 cl007 5 6 8 active quorum node 8 cl008 5 6 8 active quorum node [root at cl002 ~]# mmlsdisk home disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ r10f1e5 nsd 512 1001 No Yes allocmap add up fc_ssd400G r6d2e8 nsd 512 1001 No Yes ready up fc_8T r6d3e8 nsd 512 1001 No Yes ready up fc_8T Do all quorum node have to be up and participating to do these admin type operations? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Thu Jan 5 21:38:39 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Thu, 05 Jan 2017 16:38:39 -0500 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <28063.1483652319@turing-police.cc.vt.edu> On Thu, 05 Jan 2017 20:44:33 +0000, Bryan Banister said: > Looking at this further, the output says the ???The following disks of home > will be formatted on node cl003:??? however that node is the node in > ???arbitrating??? state, so I don???t see how that would work, The bigger question: If it was in "arbitrating", why was it selected as the node to do the formatting? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Thu Jan 5 21:53:17 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 05 Jan 2017 16:53:17 -0500 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? Message-ID: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. % getfacl -R /path/to/source > /root/perms.ac % setfacl --restore=/root/perms.acl * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. Any ideas? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 5 22:01:18 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 5 Jan 2017 22:01:18 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Message-ID: Hi Jaime, IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? Kevin > On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: > > Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? > > * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. > > * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. > % getfacl -R /path/to/source > /root/perms.ac > % setfacl --restore=/root/perms.acl > > * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. > > Any ideas? > > Thanks > Jaime > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From laurence at qsplace.co.uk Thu Jan 5 22:03:53 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Thu, 5 Jan 2017 22:03:53 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Message-ID: <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> Are you talking about the GPFSUG github? https://github.com/gpfsug/gpfsug-tools The patched rsync there I believe was done by Orlando. -- Lauz On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: > Hi Jaime, > > IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? > > Kevin > >> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >> >> Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? >> >> * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. >> >> * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. >> % getfacl -R /path/to/source > /root/perms.ac >> % setfacl --restore=/root/perms.acl >> >> * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. >> >> Any ideas? >> >> Thanks >> Jaime >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From robbyb at us.ibm.com Thu Jan 5 22:18:08 2017 From: robbyb at us.ibm.com (Rob Basham) Date: Thu, 5 Jan 2017 22:18:08 +0000 Subject: [gpfsug-discuss] TCT and CentOS Message-ID: An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Thu Jan 5 22:42:28 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Thu, 05 Jan 2017 17:42:28 -0500 Subject: [gpfsug-discuss] TCT and CentOS In-Reply-To: References: Message-ID: <32702.1483656148@turing-police.cc.vt.edu> On Thu, 05 Jan 2017 22:18:08 +0000, "Rob Basham" said: > By way of introduction, I am TCT architect across all of IBM's storage > products, including Spectrum Scale. There have been queries as to whether or > not CentOS is supported with TCT Server on Spectrum Scale. It is not currently > supported and should not be used as a TCT Server. Is that a "we haven't qualified it and you're on your own" not supported, or "there be known dragons" not supported? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From gmcpheeters at anl.gov Thu Jan 5 23:34:04 2017 From: gmcpheeters at anl.gov (McPheeters, Gordon) Date: Thu, 5 Jan 2017 23:34:04 +0000 Subject: [gpfsug-discuss] nsd not adding with one quorum node down? In-Reply-To: <28063.1483652319@turing-police.cc.vt.edu> References: <21BC488F0AEA2245B2C3E83FC0B33DBB064DF28B@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF328@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB064DF398@CHI-EXCHANGEW1.w2k.jumptrading.com> <28063.1483652319@turing-police.cc.vt.edu> Message-ID: You might want to check the gpfs logs on the node cl003. Often the message "Lost connection to file system daemon.? means that the daemon asserted while it was doing something... hence the lost connection. If you are checking the state and seeing it in arbitrating mode immed after the command fails that also makes sense as it?s now re-joining the cluster. If you aren?t watching carefully you can miss these events due to way mmfsd will resume the old mounts, hence you check the node with ?df? and see the file system is still mounted, then assume all is well, but in fact mmfsd has died and restarted. Gordon McPheeters ALCF Storage (630) 252-6430 gmcpheeters at anl.gov On Jan 5, 2017, at 3:38 PM, Valdis.Kletnieks at vt.edu wrote: On Thu, 05 Jan 2017 20:44:33 +0000, Bryan Banister said: Looking at this further, the output says the ?The following disks of home will be formatted on node cl003:? however that node is the node in ?arbitrating? state, so I don?t see how that would work, The bigger question: If it was in "arbitrating", why was it selected as the node to do the formatting? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbyb at us.ibm.com Fri Jan 6 00:28:47 2017 From: robbyb at us.ibm.com (Rob Basham) Date: Fri, 6 Jan 2017 00:28:47 +0000 Subject: [gpfsug-discuss] TCT and CentOS Message-ID: An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri Jan 6 02:16:04 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 05 Jan 2017 21:16:04 -0500 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> Message-ID: <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Great guys!!! Just what I was looking for. Everyone is always so helpful on this forum. Thanks a lot. Jaime Quoting "Laurence Horrocks-Barlow" : > Are you talking about the GPFSUG github? > > https://github.com/gpfsug/gpfsug-tools > > The patched rsync there I believe was done by Orlando. > > -- Lauz > > > On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >> Hi Jaime, >> >> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >> used it and it works great ? can?t remember where we downloaded it >> from, though. Maybe someone else on the list who *isn?t* having a >> senior moment can point you to it? >> >> Kevin >> >>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >>> >>> Does anyone know of a functional standard alone tool to >>> systematically and recursively find and replicate ACLs that works >>> well with GPFS? >>> >>> * We're currently using rsync, which will replicate permissions >>> fine, however it leaves the ACL's behind. The --perms option for >>> rsync is blind to ACLs. >>> >>> * The native linux trick below works well with ext4 after an >>> rsync, but makes a mess on GPFS. >>> % getfacl -R /path/to/source > /root/perms.ac >>> % setfacl --restore=/root/perms.acl >>> >>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>> recursive option. >>> >>> Any ideas? >>> >>> Thanks >>> Jaime >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University >>> of Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From S.J.Thompson at bham.ac.uk Fri Jan 6 07:17:46 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 6 Jan 2017 07:17:46 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk>, <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Message-ID: Just a cautionary note, it doesn't work with symlinks as it fails to get the acl and so doesn't copy the symlink. So you may want to run a traditional rsync after just to get all your symlinks on place. (having been using this over the Christmas period to merge some filesets with acls...) Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jaime Pinto [pinto at scinet.utoronto.ca] Sent: 06 January 2017 02:16 To: gpfsug main discussion list; Laurence Horrocks-Barlow Cc: support at scinet.utoronto.ca Subject: Re: [gpfsug-discuss] replicating ACLs across GPFS's? Great guys!!! Just what I was looking for. Everyone is always so helpful on this forum. Thanks a lot. Jaime Quoting "Laurence Horrocks-Barlow" : > Are you talking about the GPFSUG github? > > https://github.com/gpfsug/gpfsug-tools > > The patched rsync there I believe was done by Orlando. > > -- Lauz > > > On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >> Hi Jaime, >> >> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >> used it and it works great ? can?t remember where we downloaded it >> from, though. Maybe someone else on the list who *isn?t* having a >> senior moment can point you to it? >> >> Kevin >> >>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >>> >>> Does anyone know of a functional standard alone tool to >>> systematically and recursively find and replicate ACLs that works >>> well with GPFS? >>> >>> * We're currently using rsync, which will replicate permissions >>> fine, however it leaves the ACL's behind. The --perms option for >>> rsync is blind to ACLs. >>> >>> * The native linux trick below works well with ext4 after an >>> rsync, but makes a mess on GPFS. >>> % getfacl -R /path/to/source > /root/perms.ac >>> % setfacl --restore=/root/perms.acl >>> >>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>> recursive option. >>> >>> Any ideas? >>> >>> Thanks >>> Jaime >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University >>> of Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jtucker at pixitmedia.com Fri Jan 6 08:29:53 2017 From: jtucker at pixitmedia.com (Jez Tucker) Date: Fri, 6 Jan 2017 08:29:53 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> Message-ID: <4a934973-691c-977a-1d19-81102ecb3d37@pixitmedia.com> Hi, Here: https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync For those of you with Pixit Media / ArcaStream support, just install our maintained ap-rsync which has this patch and additional fixes for other 'fun' things that break between GPFS and rsync. If anyone wants to contribute to the git repo wave your arms. Jez On 05/01/17 22:01, Buterbaugh, Kevin L wrote: > Hi Jaime, > > IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? > > Kevin > >> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >> >> Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? >> >> * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. >> >> * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. >> % getfacl -R /path/to/source > /root/perms.ac >> % setfacl --restore=/root/perms.acl >> >> * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. >> >> Any ideas? >> >> Thanks >> Jaime >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Fri Jan 6 08:31:16 2017 From: jtucker at pixitmedia.com (Jez Tucker) Date: Fri, 6 Jan 2017 08:31:16 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Message-ID: <6928a73b-a8fa-4255-813a-0ddd6c9579f7@pixitmedia.com> Some of the 'fun things' being such as that very issue. On 06/01/17 07:17, Simon Thompson (Research Computing - IT Services) wrote: > Just a cautionary note, it doesn't work with symlinks as it fails to get the acl and so doesn't copy the symlink. > > So you may want to run a traditional rsync after just to get all your symlinks on place. (having been using this over the Christmas period to merge some filesets with acls...) > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jaime Pinto [pinto at scinet.utoronto.ca] > Sent: 06 January 2017 02:16 > To: gpfsug main discussion list; Laurence Horrocks-Barlow > Cc: support at scinet.utoronto.ca > Subject: Re: [gpfsug-discuss] replicating ACLs across GPFS's? > > Great guys!!! > Just what I was looking for. > Everyone is always so helpful on this forum. > Thanks a lot. > Jaime > > Quoting "Laurence Horrocks-Barlow" : > >> Are you talking about the GPFSUG github? >> >> https://github.com/gpfsug/gpfsug-tools >> >> The patched rsync there I believe was done by Orlando. >> >> -- Lauz >> >> >> On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >>> Hi Jaime, >>> >>> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >>> used it and it works great ? can?t remember where we downloaded it >>> from, though. Maybe someone else on the list who *isn?t* having a >>> senior moment can point you to it? >>> >>> Kevin >>> >>>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto wrote: >>>> >>>> Does anyone know of a functional standard alone tool to >>>> systematically and recursively find and replicate ACLs that works >>>> well with GPFS? >>>> >>>> * We're currently using rsync, which will replicate permissions >>>> fine, however it leaves the ACL's behind. The --perms option for >>>> rsync is blind to ACLs. >>>> >>>> * The native linux trick below works well with ext4 after an >>>> rsync, but makes a mess on GPFS. >>>> % getfacl -R /path/to/source > /root/perms.ac >>>> % setfacl --restore=/root/perms.acl >>>> >>>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>>> recursive option. >>>> >>>> Any ideas? >>>> >>>> Thanks >>>> Jaime >>>> >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University >>>> of Toronto. >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Fri Jan 6 08:50:43 2017 From: orichards at pixitmedia.com (Orlando Richards) Date: Fri, 6 Jan 2017 08:50:43 +0000 Subject: [gpfsug-discuss] (Re)Introduction Message-ID: Hi folks, Since I've re-joined this list with my new identity, I thought I'd ping over a brief re-intro email. Some of you will know me from my past life working for the University of Edinburgh, but in November last year I joined the team at Pixit Media / ArcaStream. For those I've not met before - I've been working with GPFS since 2007 in a University environment, initially as an HPC storage engine but quickly realised the benefits that GPFS could offer as a general file/NAS storage platform as well, and developed its use in the University of Edinburgh (and for the national UKRDF service) in that vein. These days I'm spending a lot of my time looking at the deployment, operations and support processes around GPFS - which means I get to play with all sorts of hip and trendy buzzwords :) -- *Orlando Richards* VP Product Development, Pixit Media 07930742808|orichards at pixitmedia.com www.pixitmedia.com |Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Fri Jan 6 08:51:19 2017 From: orichards at pixitmedia.com (Orlando Richards) Date: Fri, 6 Jan 2017 08:51:19 +0000 Subject: [gpfsug-discuss] replicating ACLs across GPFS's? In-Reply-To: <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> References: <20170105165317.18905nokp87ue98t@support.scinet.utoronto.ca> <3098c044-7785-4631-6161-7f7e513029a4@qsplace.co.uk> <20170105211604.65451rh7z4l2ko9w@support.scinet.utoronto.ca> Message-ID: Glad to see it's still doing good work out there! :) On 06/01/2017 02:16, Jaime Pinto wrote: > Great guys!!! > Just what I was looking for. > Everyone is always so helpful on this forum. > Thanks a lot. > Jaime > > Quoting "Laurence Horrocks-Barlow" : > >> Are you talking about the GPFSUG github? >> >> https://github.com/gpfsug/gpfsug-tools >> >> The patched rsync there I believe was done by Orlando. >> >> -- Lauz >> >> >> On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: >>> Hi Jaime, >>> >>> IBM developed a patch for rsync that can replicate ACL?s ? we?ve >>> used it and it works great ? can?t remember where we downloaded it >>> from, though. Maybe someone else on the list who *isn?t* having a >>> senior moment can point you to it? >>> >>> Kevin >>> >>>> On Jan 5, 2017, at 3:53 PM, Jaime Pinto >>>> wrote: >>>> >>>> Does anyone know of a functional standard alone tool to >>>> systematically and recursively find and replicate ACLs that works >>>> well with GPFS? >>>> >>>> * We're currently using rsync, which will replicate permissions >>>> fine, however it leaves the ACL's behind. The --perms option for >>>> rsync is blind to ACLs. >>>> >>>> * The native linux trick below works well with ext4 after an >>>> rsync, but makes a mess on GPFS. >>>> % getfacl -R /path/to/source > /root/perms.ac >>>> % setfacl --restore=/root/perms.acl >>>> >>>> * The native GPFS mmgetacl/mmputacl pair does not have a built-in >>>> recursive option. >>>> >>>> Any ideas? >>>> >>>> Thanks >>>> Jaime >>>> >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University >>>> of Toronto. >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Orlando Richards* VP Product Development, Pixit Media 07930742808|orichards at pixitmedia.com www.pixitmedia.com |Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From erich at uw.edu Fri Jan 6 19:07:22 2017 From: erich at uw.edu (Eric Horst) Date: Fri, 6 Jan 2017 11:07:22 -0800 Subject: [gpfsug-discuss] undo fileset inode allocation Message-ID: Greetings all, I've been setting up and migrating to a new 225TB filesystem on 4.2.1. Separate data and metadata disks. There are about 20 independent filesets as second level directories which have all the files. One of the independent filesets hit its inode limit of 28M. Without carefully checking my work I accidentally changed the limit to 3.2B inodes instead of 32M inodes. This ran for 15 minutes and when it was done I see mmdf shows that I had 0% metadata space free. There was previously 72% free. Thinking about it I reasoned that as independent filesets I might get that metadata space back if I unlinked and deleted that fileset. After doing so I find I have metadata 11% free. A far cry from the 72% I used to have. Are there other options for undoing this mistake? Or should I not worry that I'm at 11% and assume that whatever was preallocated will be productively used over the life of this filesystem? Thanks, -Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Jan 6 20:08:17 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 6 Jan 2017 20:08:17 +0000 Subject: [gpfsug-discuss] undo fileset inode allocation In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB064E1624@CHI-EXCHANGEW1.w2k.jumptrading.com> Honestly this sounds like you may be in a very dangerous situation and would HIGHLY recommend opening a PMR immediately to get direct, authoritative instruction from IBM, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Eric Horst Sent: Friday, January 06, 2017 1:07 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] undo fileset inode allocation Greetings all, I've been setting up and migrating to a new 225TB filesystem on 4.2.1. Separate data and metadata disks. There are about 20 independent filesets as second level directories which have all the files. One of the independent filesets hit its inode limit of 28M. Without carefully checking my work I accidentally changed the limit to 3.2B inodes instead of 32M inodes. This ran for 15 minutes and when it was done I see mmdf shows that I had 0% metadata space free. There was previously 72% free. Thinking about it I reasoned that as independent filesets I might get that metadata space back if I unlinked and deleted that fileset. After doing so I find I have metadata 11% free. A far cry from the 72% I used to have. Are there other options for undoing this mistake? Or should I not worry that I'm at 11% and assume that whatever was preallocated will be productively used over the life of this filesystem? Thanks, -Eric ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Tomlinson at awe.co.uk Mon Jan 9 15:09:43 2017 From: Paul.Tomlinson at awe.co.uk (Paul.Tomlinson at awe.co.uk) Date: Mon, 9 Jan 2017 15:09:43 +0000 Subject: [gpfsug-discuss] AFM Migration Issue Message-ID: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR From janfrode at tanso.net Mon Jan 9 15:29:45 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 09 Jan 2017 15:29:45 +0000 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: <201701091501.v09F1i5A015912@msw1.awe.co.uk> References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: Untested, and I have no idea if it will work on the number of files and directories you have, but maybe you can fix it by rsyncing just the directories? rsync -av --dry-run --include='*/' --exclude='*' source/ destination/ -jf man. 9. jan. 2017 kl. 16.09 skrev : > Hi All, > > We have just completed the first data move from our old cluster to the new > one using AFM Local Update as per the guide, however we have noticed that > all date stamps on the directories have the date they were created on(e.g. > 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), > whereas all the files have the correct dates. > > Has anyone else seen this issue as we now have to convert all the > directory dates to their original dates ! > > > > > The information in this email and in any attachment(s) is > commercial in confidence. If you are not the named addressee(s) > or > if you receive this email in error then any distribution, copying or > use of this communication or the information in it is strictly > prohibited. Please notify us immediately by email at > admin.internet(at)awe.co.uk, and then delete this message from > your computer. While attachments are virus checked, AWE plc > does not accept any liability in respect of any virus which is not > detected. > > AWE Plc > Registered in England and Wales > Registration No 02763902 > AWE, Aldermaston, Reading, RG7 4PR > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jan 9 15:48:43 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 9 Jan 2017 15:48:43 +0000 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: <201701091501.v09F1i5A015912@msw1.awe.co.uk> References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on..... (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Tomlinson at awe.co.uk Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Tomlinson at awe.co.uk Mon Jan 9 16:00:04 2017 From: Paul.Tomlinson at awe.co.uk (Paul.Tomlinson at awe.co.uk) Date: Mon, 9 Jan 2017 16:00:04 +0000 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: <201701091552.v09Fq4kj012315@msw1.awe.co.uk> Hi, We have already come across the issues you have seen below, and worked around them. If you run the pre-fetch with just the --meta-data-only, then all the date stamps are correct for the dirs., as soon as you run --list-only all the directory times change to now. We have tried rsync but this did not appear to work. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs Sent: 09 January 2017 15:49 To: gpfsug-discuss at spectrumscale.org Subject: EXTERNAL: Re: [gpfsug-discuss] AFM Migration Issue Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on..... (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Tomlinson at awe.co.uk Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR From YARD at il.ibm.com Mon Jan 9 19:12:08 2017 From: YARD at il.ibm.com (Yaron Daniel) Date: Mon, 9 Jan 2017 21:12:08 +0200 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> Message-ID: Hi Do u have nfsv4 acl's ? Try to ask from IBM support to get Sonas rsync in order to migrate the data. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Jan-Frode Myklebust To: gpfsug main discussion list Date: 01/09/2017 05:30 PM Subject: Re: [gpfsug-discuss] AFM Migration Issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Untested, and I have no idea if it will work on the number of files and directories you have, but maybe you can fix it by rsyncing just the directories? rsync -av --dry-run --include='*/' --exclude='*' source/ destination/ -jf man. 9. jan. 2017 kl. 16.09 skrev : Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From mimarsh2 at vt.edu Mon Jan 9 20:16:55 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 9 Jan 2017 15:16:55 -0500 Subject: [gpfsug-discuss] replication and no failure groups Message-ID: All, If I have a filesystem with replication set to 2 and 1 failure group: 1) I assume replication won't actually happen, correct? 2) Will this impact performance i.e cut write performance in half even though it really only keeps 1 copy? End goal - I would like a single storage pool within the filesystem to be replicated without affecting the performance of all other pools(which only have a single failure group) Thanks, Brian Marshall VT - ARC -------------- next part -------------- An HTML attachment was scrubbed... URL: From YARD at il.ibm.com Mon Jan 9 20:34:29 2017 From: YARD at il.ibm.com (Yaron Daniel) Date: Mon, 9 Jan 2017 22:34:29 +0200 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi 1) Yes in case u have only 1 Failure group - replication will not work. 2) Do you have 2 Storage Systems ? When using GPFS replication write stay the same - but read can be double - since it read from 2 Storage systems Hope this help - what do you try to achive , can you share your env setup ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Brian Marshall To: gpfsug main discussion list Date: 01/09/2017 10:17 PM Subject: [gpfsug-discuss] replication and no failure groups Sent by: gpfsug-discuss-bounces at spectrumscale.org All, If I have a filesystem with replication set to 2 and 1 failure group: 1) I assume replication won't actually happen, correct? 2) Will this impact performance i.e cut write performance in half even though it really only keeps 1 copy? End goal - I would like a single storage pool within the filesystem to be replicated without affecting the performance of all other pools(which only have a single failure group) Thanks, Brian Marshall VT - ARC_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From eric.wonderley at vt.edu Mon Jan 9 20:47:12 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 9 Jan 2017 15:47:12 -0500 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi Yaron: This is the filesystem: [root at cl005 net]# mmlsdisk work disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ nsd_a_7 nsd 512 -1 No Yes ready up system nsd_b_7 nsd 512 -1 No Yes ready up system nsd_c_7 nsd 512 -1 No Yes ready up system nsd_d_7 nsd 512 -1 No Yes ready up system nsd_a_8 nsd 512 -1 No Yes ready up system nsd_b_8 nsd 512 -1 No Yes ready up system nsd_c_8 nsd 512 -1 No Yes ready up system nsd_d_8 nsd 512 -1 No Yes ready up system nsd_a_9 nsd 512 -1 No Yes ready up system nsd_b_9 nsd 512 -1 No Yes ready up system nsd_c_9 nsd 512 -1 No Yes ready up system nsd_d_9 nsd 512 -1 No Yes ready up system nsd_a_10 nsd 512 -1 No Yes ready up system nsd_b_10 nsd 512 -1 No Yes ready up system nsd_c_10 nsd 512 -1 No Yes ready up system nsd_d_10 nsd 512 -1 No Yes ready up system nsd_a_11 nsd 512 -1 No Yes ready up system nsd_b_11 nsd 512 -1 No Yes ready up system nsd_c_11 nsd 512 -1 No Yes ready up system nsd_d_11 nsd 512 -1 No Yes ready up system nsd_a_12 nsd 512 -1 No Yes ready up system nsd_b_12 nsd 512 -1 No Yes ready up system nsd_c_12 nsd 512 -1 No Yes ready up system nsd_d_12 nsd 512 -1 No Yes ready up system work_md_pf1_1 nsd 512 200 Yes No ready up system jbf1z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf1z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z4 nsd 4096 2034 No Yes ready up sas_ssd4T work_md_pf1_2 nsd 512 200 Yes No ready up system work_md_pf1_3 nsd 512 200 Yes No ready up system work_md_pf1_4 nsd 512 200 Yes No ready up system work_md_pf2_5 nsd 512 199 Yes No ready up system work_md_pf2_6 nsd 512 199 Yes No ready up system work_md_pf2_7 nsd 512 199 Yes No ready up system work_md_pf2_8 nsd 512 199 Yes No ready up system [root at cl005 net]# mmlsfs work -R -r -M -m -K flag value description ------------------- ------------------------ ----------------------------------- -R 2 Maximum number of data replicas -r 2 Default number of data replicas -M 2 Maximum number of metadata replicas -m 2 Default number of metadata replicas -K whenpossible Strict replica allocation option On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel wrote: > Hi > > 1) Yes in case u have only 1 Failure group - replication will not work. > > 2) Do you have 2 Storage Systems ? When using GPFS replication write stay > the same - but read can be double - since it read from 2 Storage systems > > Hope this help - what do you try to achive , can you share your env setup ? > > > > Regards > > > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Server, **Storage and Data Services* > *- > Team Leader* Petach Tiqva, 49527 > *Global Technology Services* Israel > Phone: +972-3-916-5672 <+972%203-916-5672> > Fax: +972-3-916-5672 <+972%203-916-5672> > Mobile: +972-52-8395593 <+972%2052-839-5593> > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/09/2017 10:17 PM > Subject: [gpfsug-discuss] replication and no failure groups > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > All, > > If I have a filesystem with replication set to 2 and 1 failure group: > > 1) I assume replication won't actually happen, correct? > > 2) Will this impact performance i.e cut write performance in half even > though it really only keeps 1 copy? > > End goal - I would like a single storage pool within the filesystem to be > replicated without affecting the performance of all other pools(which only > have a single failure group) > > Thanks, > Brian Marshall > VT - ARC_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From YARD at il.ibm.com Mon Jan 9 20:53:38 2017 From: YARD at il.ibm.com (Yaron Daniel) Date: Mon, 9 Jan 2017 20:53:38 +0000 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi So - do u able to have GPFS replication for the MD Failure Groups ? I can see that u have 3 Failure Groups for Data -1, 2012,2034 , how many Storage Subsystems you have ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "J. Eric Wonderley" To: gpfsug main discussion list Date: 01/09/2017 10:48 PM Subject: Re: [gpfsug-discuss] replication and no failure groups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Yaron: This is the filesystem: [root at cl005 net]# mmlsdisk work disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ nsd_a_7 nsd 512 -1 No Yes ready up system nsd_b_7 nsd 512 -1 No Yes ready up system nsd_c_7 nsd 512 -1 No Yes ready up system nsd_d_7 nsd 512 -1 No Yes ready up system nsd_a_8 nsd 512 -1 No Yes ready up system nsd_b_8 nsd 512 -1 No Yes ready up system nsd_c_8 nsd 512 -1 No Yes ready up system nsd_d_8 nsd 512 -1 No Yes ready up system nsd_a_9 nsd 512 -1 No Yes ready up system nsd_b_9 nsd 512 -1 No Yes ready up system nsd_c_9 nsd 512 -1 No Yes ready up system nsd_d_9 nsd 512 -1 No Yes ready up system nsd_a_10 nsd 512 -1 No Yes ready up system nsd_b_10 nsd 512 -1 No Yes ready up system nsd_c_10 nsd 512 -1 No Yes ready up system nsd_d_10 nsd 512 -1 No Yes ready up system nsd_a_11 nsd 512 -1 No Yes ready up system nsd_b_11 nsd 512 -1 No Yes ready up system nsd_c_11 nsd 512 -1 No Yes ready up system nsd_d_11 nsd 512 -1 No Yes ready up system nsd_a_12 nsd 512 -1 No Yes ready up system nsd_b_12 nsd 512 -1 No Yes ready up system nsd_c_12 nsd 512 -1 No Yes ready up system nsd_d_12 nsd 512 -1 No Yes ready up system work_md_pf1_1 nsd 512 200 Yes No ready up system jbf1z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z1 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf2z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf3z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf4z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf5z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf6z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf7z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf8z2 nsd 4096 2012 No Yes ready up sas_ssd4T jbf1z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z3 nsd 4096 2034 No Yes ready up sas_ssd4T jbf1z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf2z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf3z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf4z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf5z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf6z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf7z4 nsd 4096 2034 No Yes ready up sas_ssd4T jbf8z4 nsd 4096 2034 No Yes ready up sas_ssd4T work_md_pf1_2 nsd 512 200 Yes No ready up system work_md_pf1_3 nsd 512 200 Yes No ready up system work_md_pf1_4 nsd 512 200 Yes No ready up system work_md_pf2_5 nsd 512 199 Yes No ready up system work_md_pf2_6 nsd 512 199 Yes No ready up system work_md_pf2_7 nsd 512 199 Yes No ready up system work_md_pf2_8 nsd 512 199 Yes No ready up system [root at cl005 net]# mmlsfs work -R -r -M -m -K flag value description ------------------- ------------------------ ----------------------------------- -R 2 Maximum number of data replicas -r 2 Default number of data replicas -M 2 Maximum number of metadata replicas -m 2 Default number of metadata replicas -K whenpossible Strict replica allocation option On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel wrote: Hi 1) Yes in case u have only 1 Failure group - replication will not work. 2) Do you have 2 Storage Systems ? When using GPFS replication write stay the same - but read can be double - since it read from 2 Storage systems Hope this help - what do you try to achive , can you share your env setup ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services- Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Brian Marshall To: gpfsug main discussion list Date: 01/09/2017 10:17 PM Subject: [gpfsug-discuss] replication and no failure groups Sent by: gpfsug-discuss-bounces at spectrumscale.org All, If I have a filesystem with replication set to 2 and 1 failure group: 1) I assume replication won't actually happen, correct? 2) Will this impact performance i.e cut write performance in half even though it really only keeps 1 copy? End goal - I would like a single storage pool within the filesystem to be replicated without affecting the performance of all other pools(which only have a single failure group) Thanks, Brian Marshall VT - ARC_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From eric.wonderley at vt.edu Mon Jan 9 21:01:14 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 9 Jan 2017 16:01:14 -0500 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Hi Yuran: We have 5...4x md3860fs and 1x if150. the if150 requires data replicas=2 to get the ha and protection they recommend. we have it presented in a fileset that appears in a users work area. On Mon, Jan 9, 2017 at 3:53 PM, Yaron Daniel wrote: > Hi > > So - do u able to have GPFS replication for the MD Failure Groups ? > > I can see that u have 3 Failure Groups for Data -1, 2012,2034 , how many > Storage Subsystems you have ? > > > > > Regards > > > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Server, **Storage and Data Services* > *- > Team Leader* Petach Tiqva, 49527 > *Global Technology Services* Israel > Phone: +972-3-916-5672 <+972%203-916-5672> > Fax: +972-3-916-5672 <+972%203-916-5672> > Mobile: +972-52-8395593 <+972%2052-839-5593> > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > > > > From: "J. Eric Wonderley" > To: gpfsug main discussion list > Date: 01/09/2017 10:48 PM > Subject: Re: [gpfsug-discuss] replication and no failure groups > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Yaron: > > This is the filesystem: > > [root at cl005 net]# mmlsdisk work > disk driver sector failure holds > holds storage > name type size group metadata data status > availability pool > ------------ -------- ------ ----------- -------- ----- ------------- > ------------ ------------ > nsd_a_7 nsd 512 -1 No Yes ready > up system > nsd_b_7 nsd 512 -1 No Yes ready > up system > nsd_c_7 nsd 512 -1 No Yes ready > up system > nsd_d_7 nsd 512 -1 No Yes ready > up system > nsd_a_8 nsd 512 -1 No Yes ready > up system > nsd_b_8 nsd 512 -1 No Yes ready > up system > nsd_c_8 nsd 512 -1 No Yes ready > up system > nsd_d_8 nsd 512 -1 No Yes ready > up system > nsd_a_9 nsd 512 -1 No Yes ready > up system > nsd_b_9 nsd 512 -1 No Yes ready > up system > nsd_c_9 nsd 512 -1 No Yes ready > up system > nsd_d_9 nsd 512 -1 No Yes ready > up system > nsd_a_10 nsd 512 -1 No Yes ready > up system > nsd_b_10 nsd 512 -1 No Yes ready > up system > nsd_c_10 nsd 512 -1 No Yes ready > up system > nsd_d_10 nsd 512 -1 No Yes ready > up system > nsd_a_11 nsd 512 -1 No Yes ready > up system > nsd_b_11 nsd 512 -1 No Yes ready > up system > nsd_c_11 nsd 512 -1 No Yes ready > up system > nsd_d_11 nsd 512 -1 No Yes ready > up system > nsd_a_12 nsd 512 -1 No Yes ready > up system > nsd_b_12 nsd 512 -1 No Yes ready > up system > nsd_c_12 nsd 512 -1 No Yes ready > up system > nsd_d_12 nsd 512 -1 No Yes ready > up system > work_md_pf1_1 nsd 512 200 Yes No ready > up system > jbf1z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf2z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf3z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf4z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf5z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf6z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf7z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf8z1 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf1z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf2z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf3z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf4z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf5z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf6z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf7z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf8z2 nsd 4096 2012 No Yes ready > up sas_ssd4T > jbf1z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf2z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf3z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf4z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf5z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf6z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf7z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf8z3 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf1z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf2z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf3z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf4z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf5z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf6z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf7z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > jbf8z4 nsd 4096 2034 No Yes ready > up sas_ssd4T > work_md_pf1_2 nsd 512 200 Yes No ready > up system > work_md_pf1_3 nsd 512 200 Yes No ready > up system > work_md_pf1_4 nsd 512 200 Yes No ready > up system > work_md_pf2_5 nsd 512 199 Yes No ready > up system > work_md_pf2_6 nsd 512 199 Yes No ready > up system > work_md_pf2_7 nsd 512 199 Yes No ready > up system > work_md_pf2_8 nsd 512 199 Yes No ready > up system > [root at cl005 net]# mmlsfs work -R -r -M -m -K > flag value description > ------------------- ------------------------ ------------------------------ > ----- > -R 2 Maximum number of data > replicas > -r 2 Default number of data > replicas > -M 2 Maximum number of metadata > replicas > -m 2 Default number of metadata > replicas > -K whenpossible Strict replica allocation > option > > > On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel <*YARD at il.ibm.com* > > wrote: > Hi > > 1) Yes in case u have only 1 Failure group - replication will not work. > > 2) Do you have 2 Storage Systems ? When using GPFS replication write stay > the same - but read can be double - since it read from 2 Storage systems > > Hope this help - what do you try to achive , can you share your env setup ? > > > Regards > > > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Server, **Storage and Data Services* > *- > Team Leader* Petach Tiqva, 49527 > *Global Technology Services* Israel > Phone: *+972-3-916-5672* <+972%203-916-5672> > Fax: *+972-3-916-5672* <+972%203-916-5672> > Mobile: *+972-52-8395593* <+972%2052-839-5593> > e-mail: *yard at il.ibm.com* > *IBM Israel* > > > > > > > > From: Brian Marshall <*mimarsh2 at vt.edu* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 01/09/2017 10:17 PM > Subject: [gpfsug-discuss] replication and no failure groups > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > > ------------------------------ > > > > > All, > > If I have a filesystem with replication set to 2 and 1 failure group: > > 1) I assume replication won't actually happen, correct? > > 2) Will this impact performance i.e cut write performance in half even > though it really only keeps 1 copy? > > End goal - I would like a single storage pool within the filesystem to be > replicated without affecting the performance of all other pools(which only > have a single failure group) > > Thanks, > Brian Marshall > VT - ARC_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From janfrode at tanso.net Mon Jan 9 22:24:45 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 09 Jan 2017 22:24:45 +0000 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: Yaron, doesn't "-1" make each of these disk an independent failure group? >From 'man mmcrnsd': "The default is -1, which indicates this disk has no point of failure in common with any other disk." -jf man. 9. jan. 2017 kl. 21.54 skrev Yaron Daniel : > Hi > > So - do u able to have GPFS replication > > for the MD Failure Groups ? > > I can see that u have 3 Failure Groups > > for Data -1, 2012,2034 , how many Storage Subsystems you have ? > > > > > Regards > > > > ------------------------------ > > > > > > *YaronDaniel* 94 > > Em Ha'Moshavot Rd > > > *Server,* > > *Storageand Data Services* > *- > Team Leader* > > Petach > > Tiqva, 49527 > > > *GlobalTechnology Services* Israel > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > > > Mobile: +972-52-8395593 > > > e-mail: yard at il.ibm.com > > > > > *IBMIsrael* > > > > > > > > > > From: > > "J. Eric Wonderley" > > > > > To: > > gpfsug main discussion > > list > > Date: > > 01/09/2017 10:48 PM > Subject: > > Re: [gpfsug-discuss] > > replication and no failure groups > Sent by: > > gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Yaron: > > This is the filesystem: > > [root at cl005 net]# mmlsdisk work > disk driver > > sector failure holds holds > > storage > name type > > size group metadata data status > > availability pool > ------------ -------- ------ ----------- -------- ----- ------------- > ------------ > > ------------ > nsd_a_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_7 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_8 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_9 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_10 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_11 nsd > > 512 -1 No > > Yes ready up > > system > nsd_a_12 nsd > > 512 -1 No > > Yes ready up > > system > nsd_b_12 nsd > > 512 -1 No > > Yes ready up > > system > nsd_c_12 nsd > > 512 -1 No > > Yes ready up > > system > nsd_d_12 nsd > > 512 -1 No > > Yes ready up > > system > work_md_pf1_1 nsd 512 > > 200 Yes No ready > > up system > > > jbf1z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf2z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf3z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf4z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf5z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf6z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf7z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf8z1 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf1z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf2z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf3z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf4z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf5z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf6z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf7z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf8z2 nsd > > 4096 2012 No > > Yes ready up > > sas_ssd4T > jbf1z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf2z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf3z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf4z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf5z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf6z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf7z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf8z3 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf1z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf2z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf3z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf4z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf5z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf6z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf7z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > jbf8z4 nsd > > 4096 2034 No > > Yes ready up > > sas_ssd4T > work_md_pf1_2 nsd 512 > > 200 Yes No ready > > up system > > > work_md_pf1_3 nsd 512 > > 200 Yes No ready > > up system > > > work_md_pf1_4 nsd 512 > > 200 Yes No ready > > up system > > > work_md_pf2_5 nsd 512 > > 199 Yes No ready > > up system > > > work_md_pf2_6 nsd 512 > > 199 Yes No ready > > up system > > > work_md_pf2_7 nsd 512 > > 199 Yes No ready > > up system > > > work_md_pf2_8 nsd 512 > > 199 Yes No ready > > up system > > > [root at cl005 net]# mmlsfs work -R -r -M -m -K > flag > > value > > description > ------------------- ------------------------ > ----------------------------------- > -R > > 2 > > Maximum number of data replicas > -r > > 2 > > Default number of data replicas > -M > > 2 > > Maximum number of metadata replicas > -m > > 2 > > Default number of metadata replicas > -K > > whenpossible > > Strict replica allocation option > > > On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel <*YARD at il.ibm.com* > > > > wrote: > Hi > > 1) Yes in case u have only 1 Failure group - replication will not work. > > 2) Do you have 2 Storage Systems ? When using GPFS replication write > > stay the same - but read can be double - since it read from 2 Storage > systems > > Hope this help - what do you try to achive , can you share your env setup > > ? > > > Regards > > > > ------------------------------ > > > > > > *YaronDaniel* 94 > > Em Ha'Moshavot Rd > > > *Server,* > > *Storageand Data Services* > > > *-Team Leader* Petach > > Tiqva, 49527 > > > *GlobalTechnology Services* Israel > Phone: *+972-3-916-5672* <+972%203-916-5672> > Fax: *+972-3-916-5672* <+972%203-916-5672> > > > Mobile: *+972-52-8395593* <+972%2052-839-5593> > > > e-mail: *yard at il.ibm.com* > > > > > *IBMIsrael* > > > > > > > > > > From: Brian > > Marshall <*mimarsh2 at vt.edu* > > To: gpfsug > > main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 01/09/2017 > > 10:17 PM > Subject: [gpfsug-discuss] > > replication and no failure groups > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > > ------------------------------ > > > > > All, > > If I have a filesystem with replication set to 2 and 1 failure group: > > 1) I assume replication won't actually happen, correct? > > 2) Will this impact performance i.e cut write performance in half even > > though it really only keeps 1 copy? > > End goal - I would like a single storage pool within the filesystem to > > be replicated without affecting the performance of all other pools(which > > only have a single failure group) > > Thanks, > Brian Marshall > VT - ARC_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From vpuvvada at in.ibm.com Tue Jan 10 08:44:19 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Tue, 10 Jan 2017 14:14:19 +0530 Subject: [gpfsug-discuss] AFM Migration Issue In-Reply-To: <201701091552.v09Fq4kj012315@msw1.awe.co.uk> References: <201701091501.v09F1i5A015912@msw1.awe.co.uk> <201701091552.v09Fq4kj012315@msw1.awe.co.uk> Message-ID: AFM cannot keep directory mtime in sync. Directory mtime changes during readdir when files are created inside it after initial lookup. This is a known limitation today. ~Venkat (vpuvvada at in.ibm.com) From: To: Date: 01/09/2017 09:30 PM Subject: Re: [gpfsug-discuss] AFM Migration Issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We have already come across the issues you have seen below, and worked around them. If you run the pre-fetch with just the --meta-data-only, then all the date stamps are correct for the dirs., as soon as you run --list-only all the directory times change to now. We have tried rsync but this did not appear to work. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs Sent: 09 January 2017 15:49 To: gpfsug-discuss at spectrumscale.org Subject: EXTERNAL: Re: [gpfsug-discuss] AFM Migration Issue Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on..... (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Tomlinson at awe.co.uk Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Jan 10 13:24:33 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 10 Jan 2017 08:24:33 -0500 Subject: [gpfsug-discuss] replication and no failure groups In-Reply-To: References: Message-ID: That`s the answer. We hadn`t read deep enough and just assumed that -1 meant default failure group or no failure groups at all. Thanks, Brian On Mon, Jan 9, 2017 at 5:24 PM, Jan-Frode Myklebust wrote: > Yaron, doesn't "-1" make each of these disk an independent failure group? > > From 'man mmcrnsd': > > "The default is -1, which indicates this disk has no point of failure in > common with any other disk." > > > -jf > > > man. 9. jan. 2017 kl. 21.54 skrev Yaron Daniel : > >> Hi >> >> So - do u able to have GPFS replication >> >> for the MD Failure Groups ? >> >> I can see that u have 3 Failure Groups >> >> for Data -1, 2012,2034 , how many Storage Subsystems you have ? >> >> >> >> >> Regards >> >> >> >> ------------------------------ >> >> >> >> >> >> *YaronDaniel* 94 >> >> Em Ha'Moshavot Rd >> >> >> *Server,* >> >> *Storageand Data Services* >> *- >> Team Leader* >> >> Petach >> >> Tiqva, 49527 >> >> >> *GlobalTechnology Services* Israel >> Phone: +972-3-916-5672 <+972%203-916-5672> >> Fax: +972-3-916-5672 <+972%203-916-5672> >> >> >> Mobile: +972-52-8395593 <+972%2052-839-5593> >> >> >> e-mail: yard at il.ibm.com >> >> >> >> >> *IBMIsrael* >> >> >> >> >> >> >> >> >> >> From: >> >> "J. Eric Wonderley" >> >> >> >> >> To: >> >> gpfsug main discussion >> >> list >> >> Date: >> >> 01/09/2017 10:48 PM >> Subject: >> >> Re: [gpfsug-discuss] >> >> replication and no failure groups >> Sent by: >> >> gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> Hi Yaron: >> >> This is the filesystem: >> >> [root at cl005 net]# mmlsdisk work >> disk driver >> >> sector failure holds holds >> >> storage >> name type >> >> size group metadata data status >> >> availability pool >> ------------ -------- ------ ----------- -------- ----- ------------- >> ------------ >> >> ------------ >> nsd_a_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_7 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_8 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_9 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_10 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_11 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_a_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_b_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_c_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> nsd_d_12 nsd >> >> 512 -1 No >> >> Yes ready up >> >> system >> work_md_pf1_1 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> jbf1z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z1 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf1z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z2 nsd >> >> 4096 2012 No >> >> Yes ready up >> >> sas_ssd4T >> jbf1z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z3 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf1z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf2z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf3z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf4z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf5z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf6z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf7z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> jbf8z4 nsd >> >> 4096 2034 No >> >> Yes ready up >> >> sas_ssd4T >> work_md_pf1_2 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> work_md_pf1_3 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> work_md_pf1_4 nsd 512 >> >> 200 Yes No ready >> >> up system >> >> >> work_md_pf2_5 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> work_md_pf2_6 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> work_md_pf2_7 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> work_md_pf2_8 nsd 512 >> >> 199 Yes No ready >> >> up system >> >> >> [root at cl005 net]# mmlsfs work -R -r -M -m -K >> flag >> >> value >> >> description >> ------------------- ------------------------ >> ----------------------------------- >> -R >> >> 2 >> >> Maximum number of data replicas >> -r >> >> 2 >> >> Default number of data replicas >> -M >> >> 2 >> >> Maximum number of metadata replicas >> -m >> >> 2 >> >> Default number of metadata replicas >> -K >> >> whenpossible >> >> Strict replica allocation option >> >> >> On Mon, Jan 9, 2017 at 3:34 PM, Yaron Daniel <*YARD at il.ibm.com* >> > >> >> wrote: >> Hi >> >> 1) Yes in case u have only 1 Failure group - replication will not work. >> >> 2) Do you have 2 Storage Systems ? When using GPFS replication write >> >> stay the same - but read can be double - since it read from 2 Storage >> systems >> >> Hope this help - what do you try to achive , can you share your env setup >> >> ? >> >> >> Regards >> >> >> >> ------------------------------ >> >> >> >> >> >> *YaronDaniel* 94 >> >> Em Ha'Moshavot Rd >> >> >> *Server,* >> >> *Storageand Data Services* >> >> >> *-Team Leader* Petach >> >> Tiqva, 49527 >> >> >> *GlobalTechnology Services* Israel >> Phone: *+972-3-916-5672* <+972%203-916-5672> >> Fax: *+972-3-916-5672* <+972%203-916-5672> >> >> >> Mobile: *+972-52-8395593* <+972%2052-839-5593> >> >> >> e-mail: *yard at il.ibm.com* >> >> >> >> >> *IBMIsrael* >> >> >> >> >> >> >> >> >> >> From: Brian >> >> Marshall <*mimarsh2 at vt.edu* > >> To: gpfsug >> >> main discussion list <*gpfsug-discuss at spectrumscale.org* >> > >> Date: 01/09/2017 >> >> 10:17 PM >> Subject: [gpfsug-discuss] >> >> replication and no failure groups >> Sent by: *gpfsug-discuss-bounces at spectrumscale.org* >> >> >> ------------------------------ >> >> >> >> >> All, >> >> If I have a filesystem with replication set to 2 and 1 failure group: >> >> 1) I assume replication won't actually happen, correct? >> >> 2) Will this impact performance i.e cut write performance in half even >> >> though it really only keeps 1 copy? >> >> End goal - I would like a single storage pool within the filesystem to >> >> be replicated without affecting the performance of all other pools(which >> >> only have a single failure group) >> >> Thanks, >> Brian Marshall >> VT - ARC_______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> _______________________________________________ >> >> gpfsug-discuss mailing list >> >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Jan 10 17:59:52 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 10 Jan 2017 11:59:52 -0600 Subject: [gpfsug-discuss] CES nodes Hyper threads or no Message-ID: <5376d22b-abdc-7ead-5ea8-ae9da3073c4f@wustl.edu> All, I typically turn Hyper threading off on storage nodes. So I did on our CES nodes as well. Now they are running at a load of over 100 and have 25% cpu idle. With two 8 cores I am now wondering if hyper threading would help or did we just under size them :-(. These are nfs v3 servers only with lroc enabled. Load average: 156.13 160.40 158.97 any opinions on if it would help. Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From jtolson at us.ibm.com Tue Jan 10 20:17:01 2017 From: jtolson at us.ibm.com (John T Olson) Date: Tue, 10 Jan 2017 13:17:01 -0700 Subject: [gpfsug-discuss] Updated whitepaper published In-Reply-To: References: Message-ID: An updated white paper has been published which shows integration of the Varonis UNIX agent in Spectrum Scale for audit logging. This version of the paper is updated to include test results from new capabilities provided in Spectrum Scale version 4.2.2.1. Here is a link to the paper: https://www.ibm.com/developerworks/community/wikis/form/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 11 09:27:06 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 11 Jan 2017 09:27:06 +0000 Subject: [gpfsug-discuss] CES log files Message-ID: Which files do I need to look in to determine what's happening with CES... supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Jan 11 09:54:39 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 11 Jan 2017 09:54:39 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 11:21:00 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 12:21:00 +0100 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: I also struggle with where to look for CES log files.. but maybe the new "mmprotocoltrace" command can be useful? # mmprotocoltrace start smb ### reproduce problem # mmprotocoltrace stop smb Check log files it has collected. -jf On Wed, Jan 11, 2017 at 10:27 AM, Sobey, Richard A wrote: > Which files do I need to look in to determine what?s happening with CES? > supposing for example a load of domain controllers were shut down and CES > had no clue how to handle this and stopped working until the DCs were > switched back on again. > > > > Mmfs.log.latest said everything was fine btw. > > > > Thanks > > Richard > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 11 13:59:30 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 11 Jan 2017 13:59:30 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" > Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 11 14:29:39 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 11 Jan 2017 14:29:39 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: What did the smb log claim on the nodes? Should be in /var/adm/ras, for example if SMB failed, then I could see that CES would mark the node as degraded. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 11 January 2017 at 13:59 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] CES log files Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" > Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Jan 11 14:39:13 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 11 Jan 2017 14:39:13 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster Message-ID: We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are connected via Infiniband (FDR14). At the time of implementation of ESS, we were instructed to enable RDMA in addition to IPoIB. Previously we only ran IPoIB on our GPFS3.5 cluster. Every since the implementation (sometime back in July of 2016) we see a lot of compute nodes being ejected. What usually precedes the ejection are following messages: Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 1 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 400 Even our ESS IO server sometimes ends up being ejected (case in point - yesterday morning): Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3001 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2671 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2495 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3077 Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease renewal is overdue. Pinging to check if it is alive I've had multiple PMRs open for this issue, and I am told that our ESS needs code level upgrades in order to fix this issue. Looking at the errors, I think the issue is Infiniband related, and I am wondering if anyone on this list has seen similar issues? Thanks for your help in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Jan 11 15:03:13 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 11 Jan 2017 16:03:13 +0100 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 15:10:03 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 16:10:03 +0100 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: My first guess would also be rdmaSend, which the gssClientConfig.sh enables by default, but isn't scalable to large clusters. It fits with your error message: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Best%20Practices%20RDMA%20Tuning - """For GPFS version 3.5.0.11 and later, IB error IBV_WC_RNR_RETRY_EXC_ERR may occur if the cluster is too large when verbsRdmaSend is enabled Idf these errors are observed in the mmfs log, disable verbsRdmaSend on all nodes.. Additionally, out of memory errors may occur if verbsRdmaSend is enabled on very large clusters. If out of memory errors are observed, disabled verbsRdmaSend on all nodes in the cluster.""" Otherwise it would be nice if you could post your mmlsconfig to see if something else sticks out.. -jf On Wed, Jan 11, 2017 at 4:03 PM, Olaf Weiser wrote: > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jan 11 15:15:52 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Wed, 11 Jan 2017 15:15:52 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster References: [gpfsug-discuss] nodes being ejected out of the cluster Message-ID: <5F910253243E6A47B81A9A2EB424BBA101E91A4A@NDMSMBX404.ndc.nasa.gov> The RDMA errors I think are secondary to what's going on with either your IPoIB or Ethernet fabrics that's causing I assume IPoIB communication breakdowns and expulsions. We've had entire IB fabrics go offline and if the nodes werent depending on it for daemon communication nobody got expelled. Do you have a subnet defined for your IPoIB network or are your nodes daemon interfaces already set to their IPoIB interface? Have you checked your SM logs? From: Damir Krstic Sent: 1/11/17, 9:39 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] nodes being ejected out of the cluster We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are connected via Infiniband (FDR14). At the time of implementation of ESS, we were instructed to enable RDMA in addition to IPoIB. Previously we only ran IPoIB on our GPFS3.5 cluster. Every since the implementation (sometime back in July of 2016) we see a lot of compute nodes being ejected. What usually precedes the ejection are following messages: Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 1 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 vendor_err 135 Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR index 400 Even our ESS IO server sometimes ends up being ejected (case in point - yesterday morning): Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3001 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2671 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 vendor_err 135 Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 2495 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 vendor_err 135 Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error IBV_WC_RNR_RETRY_EXC_ERR index 3077 Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease renewal is overdue. Pinging to check if it is alive I've had multiple PMRs open for this issue, and I am told that our ESS needs code level upgrades in order to fix this issue. Looking at the errors, I think the issue is Infiniband related, and I am wondering if anyone on this list has seen similar issues? Thanks for your help in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Jan 11 15:16:09 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 11 Jan 2017 15:16:09 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From syi at ca.ibm.com Wed Jan 11 17:30:08 2017 From: syi at ca.ibm.com (Yi Sun) Date: Wed, 11 Jan 2017 12:30:08 -0500 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: Sometime increasing CES debug level to get more info, e.g. "mmces log level 3". Here are two public wiki links (probably you already know). https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Protocol%20Node%20-%20Tuning%20and%20Analysis https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Protocols%20Problem%20Determination Yi. gpfsug-discuss-bounces at spectrumscale.org wrote on 01/11/2017 07:00:06 AM: > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 01/11/2017 07:00 AM > Subject: gpfsug-discuss Digest, Vol 60, Issue 26 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: CES log files (Jan-Frode Myklebust) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 11 Jan 2017 12:21:00 +0100 > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] CES log files > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > I also struggle with where to look for CES log files.. but maybe the new > "mmprotocoltrace" command can be useful? > > # mmprotocoltrace start smb > ### reproduce problem > # mmprotocoltrace stop smb > > Check log files it has collected. > > > -jf > > > On Wed, Jan 11, 2017 at 10:27 AM, Sobey, Richard A > wrote: > > > Which files do I need to look in to determine what?s happening with CES? > > supposing for example a load of domain controllers were shut down and CES > > had no clue how to handle this and stopped working until the DCs were > > switched back on again. > > > > > > > > Mmfs.log.latest said everything was fine btw. > > > > > > > > Thanks > > > > Richard > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20170111/4ea25ddf/attachment-0001.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 60, Issue 26 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Jan 11 17:53:50 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 11 Jan 2017 17:53:50 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: Thanks for all the suggestions. Here is our mmlsconfig file. We just purchased another GL6. During the installation of the new GL6 IBM will upgrade our existing GL6 up to the latest code levels. This will happen during the week of 23rd of Jan. I am skeptical that the upgrade is going to fix the issue. On our IO servers we are running in connected mode (please note that IB interfaces are bonded) [root at gssio1 ~]# cat /sys/class/net/ib0/mode connected [root at gssio1 ~]# cat /sys/class/net/ib1/mode connected [root at gssio1 ~]# cat /sys/class/net/ib2/mode connected [root at gssio1 ~]# cat /sys/class/net/ib3/mode connected [root at gssio2 ~]# cat /sys/class/net/ib0/mode connected [root at gssio2 ~]# cat /sys/class/net/ib1/mode connected [root at gssio2 ~]# cat /sys/class/net/ib2/mode connected [root at gssio2 ~]# cat /sys/class/net/ib3/mode connected Our login nodes are also running connected mode as well. However, all of our compute nodes are running in datagram: [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode qnode0758: datagram qnode0763: datagram qnode0760: datagram qnode0772: datagram qnode0773: datagram ....etc. Here is our mmlsconfig: [root at gssio1 ~]# mmlsconfig Configuration data for cluster ess-qstorage.it.northwestern.edu: ---------------------------------------------------------------- clusterName ess-qstorage.it.northwestern.edu clusterId 17746506346828356609 dmapiFileHandleSize 32 minReleaseLevel 4.2.0.1 ccrEnabled yes cipherList AUTHONLY [gss_ppc64] nsdRAIDBufferPoolSizePct 80 maxBufferDescs 2m prefetchPct 5 nsdRAIDTracks 128k nsdRAIDSmallBufferSize 256k nsdMaxWorkerThreads 3k nsdMinWorkerThreads 3k nsdRAIDSmallThreadRatio 2 nsdRAIDThreadsPerQueue 16 nsdRAIDEventLogToConsole all nsdRAIDFastWriteFSDataLimit 256k nsdRAIDFastWriteFSMetadataLimit 1M nsdRAIDReconstructAggressiveness 1 nsdRAIDFlusherBuffersLowWatermarkPct 20 nsdRAIDFlusherBuffersLimitPct 80 nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDFlusherTracksLimitPct 80 nsdRAIDFlusherFWLogHighWatermarkMB 1000 nsdRAIDFlusherFWLogLimitMB 5000 nsdRAIDFlusherThreadsLowWatermark 1 nsdRAIDFlusherThreadsHighWatermark 512 nsdRAIDBlockDeviceMaxSectorsKB 8192 nsdRAIDBlockDeviceNrRequests 32 nsdRAIDBlockDeviceQueueDepth 16 nsdRAIDBlockDeviceScheduler deadline nsdRAIDMaxTransientStale2FT 1 nsdRAIDMaxTransientStale3FT 1 nsdMultiQueue 512 syncWorkerThreads 256 nsdInlineWriteMax 32k maxGeneralThreads 1280 maxReceiverThreads 128 nspdQueues 64 [common] maxblocksize 16m [ems1-fdr,compute,gss_ppc64] numaMemoryInterleave yes [gss_ppc64] maxFilesToCache 12k [ems1-fdr,compute] maxFilesToCache 128k [ems1-fdr,compute,gss_ppc64] flushedDataTarget 1024 flushedInodeTarget 1024 maxFileCleaners 1024 maxBufferCleaners 1024 logBufferCount 20 logWrapAmountPct 2 logWrapThreads 128 maxAllocRegionsPerNode 32 maxBackgroundDeletionThreads 16 maxInodeDeallocPrefetch 128 [gss_ppc64] maxMBpS 16000 [ems1-fdr,compute] maxMBpS 10000 [ems1-fdr,compute,gss_ppc64] worker1Threads 1024 worker3Threads 32 [gss_ppc64] ioHistorySize 64k [ems1-fdr,compute] ioHistorySize 4k [gss_ppc64] verbsRdmaMinBytes 16k [ems1-fdr,compute] verbsRdmaMinBytes 32k [ems1-fdr,compute,gss_ppc64] verbsRdmaSend yes [gss_ppc64] verbsRdmasPerConnection 16 [ems1-fdr,compute] verbsRdmasPerConnection 256 [gss_ppc64] verbsRdmasPerNode 3200 [ems1-fdr,compute] verbsRdmasPerNode 1024 [ems1-fdr,compute,gss_ppc64] verbsSendBufferMemoryMB 1024 verbsRdmasPerNodeOptimize yes verbsRdmaUseMultiCqThreads yes [ems1-fdr,compute] ignorePrefetchLUNCount yes [gss_ppc64] scatterBufferSize 256K [ems1-fdr,compute] scatterBufferSize 256k syncIntervalStrict yes [ems1-fdr,compute,gss_ppc64] nsdClientCksumTypeLocal ck64 nsdClientCksumTypeRemote ck64 [gss_ppc64] pagepool 72856M [ems1-fdr] pagepool 17544M [compute] pagepool 4g [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] verbsRdma enable [gss_ppc64] verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 [ems1-fdr] verbsPorts mlx5_0/1 mlx5_0/2 [qsched03-ib0,quser10-fdr,compute] verbsPorts mlx4_0/1 [common] autoload no [ems1-fdr,compute,gss_ppc64] maxStatCache 0 [common] envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 deadlockOverloadThreshold 0 deadlockDetectionThreshold 0 adminMode central File systems in cluster ess-qstorage.it.northwestern.edu: --------------------------------------------------------- /dev/home /dev/hpc /dev/projects /dev/tthome On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches wrote: > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 18:38:30 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 18:38:30 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: And there you have: [ems1-fdr,compute,gss_ppc64] verbsRdmaSend yes Try turning this off. -jf ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic : > Thanks for all the suggestions. Here is our mmlsconfig file. We just > purchased another GL6. During the installation of the new GL6 IBM will > upgrade our existing GL6 up to the latest code levels. This will happen > during the week of 23rd of Jan. > > I am skeptical that the upgrade is going to fix the issue. > > On our IO servers we are running in connected mode (please note that IB > interfaces are bonded) > > [root at gssio1 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib3/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib3/mode > > connected > > Our login nodes are also running connected mode as well. > > However, all of our compute nodes are running in datagram: > > [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode > > qnode0758: datagram > > qnode0763: datagram > > qnode0760: datagram > > qnode0772: datagram > > qnode0773: datagram > ....etc. > > Here is our mmlsconfig: > > [root at gssio1 ~]# mmlsconfig > > Configuration data for cluster ess-qstorage.it.northwestern.edu: > > ---------------------------------------------------------------- > > clusterName ess-qstorage.it.northwestern.edu > > clusterId 17746506346828356609 > > dmapiFileHandleSize 32 > > minReleaseLevel 4.2.0.1 > > ccrEnabled yes > > cipherList AUTHONLY > > [gss_ppc64] > > nsdRAIDBufferPoolSizePct 80 > > maxBufferDescs 2m > > prefetchPct 5 > > nsdRAIDTracks 128k > > nsdRAIDSmallBufferSize 256k > > nsdMaxWorkerThreads 3k > > nsdMinWorkerThreads 3k > > nsdRAIDSmallThreadRatio 2 > > nsdRAIDThreadsPerQueue 16 > > nsdRAIDEventLogToConsole all > > nsdRAIDFastWriteFSDataLimit 256k > > nsdRAIDFastWriteFSMetadataLimit 1M > > nsdRAIDReconstructAggressiveness 1 > > nsdRAIDFlusherBuffersLowWatermarkPct 20 > > nsdRAIDFlusherBuffersLimitPct 80 > > nsdRAIDFlusherTracksLowWatermarkPct 20 > > nsdRAIDFlusherTracksLimitPct 80 > > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > > nsdRAIDFlusherFWLogLimitMB 5000 > > nsdRAIDFlusherThreadsLowWatermark 1 > > nsdRAIDFlusherThreadsHighWatermark 512 > > nsdRAIDBlockDeviceMaxSectorsKB 8192 > > nsdRAIDBlockDeviceNrRequests 32 > > nsdRAIDBlockDeviceQueueDepth 16 > > nsdRAIDBlockDeviceScheduler deadline > > nsdRAIDMaxTransientStale2FT 1 > > nsdRAIDMaxTransientStale3FT 1 > > nsdMultiQueue 512 > > syncWorkerThreads 256 > > nsdInlineWriteMax 32k > > maxGeneralThreads 1280 > > maxReceiverThreads 128 > > nspdQueues 64 > > [common] > > maxblocksize 16m > > [ems1-fdr,compute,gss_ppc64] > > numaMemoryInterleave yes > > [gss_ppc64] > > maxFilesToCache 12k > > [ems1-fdr,compute] > > maxFilesToCache 128k > > [ems1-fdr,compute,gss_ppc64] > > flushedDataTarget 1024 > > flushedInodeTarget 1024 > > maxFileCleaners 1024 > > maxBufferCleaners 1024 > > logBufferCount 20 > > logWrapAmountPct 2 > > logWrapThreads 128 > > maxAllocRegionsPerNode 32 > > maxBackgroundDeletionThreads 16 > > maxInodeDeallocPrefetch 128 > > [gss_ppc64] > > maxMBpS 16000 > > [ems1-fdr,compute] > > maxMBpS 10000 > > [ems1-fdr,compute,gss_ppc64] > > worker1Threads 1024 > > worker3Threads 32 > > [gss_ppc64] > > ioHistorySize 64k > > [ems1-fdr,compute] > > ioHistorySize 4k > > [gss_ppc64] > > verbsRdmaMinBytes 16k > > [ems1-fdr,compute] > > verbsRdmaMinBytes 32k > > [ems1-fdr,compute,gss_ppc64] > > verbsRdmaSend yes > > [gss_ppc64] > > verbsRdmasPerConnection 16 > > [ems1-fdr,compute] > > verbsRdmasPerConnection 256 > > [gss_ppc64] > > verbsRdmasPerNode 3200 > > [ems1-fdr,compute] > > verbsRdmasPerNode 1024 > > [ems1-fdr,compute,gss_ppc64] > > verbsSendBufferMemoryMB 1024 > > verbsRdmasPerNodeOptimize yes > > verbsRdmaUseMultiCqThreads yes > > [ems1-fdr,compute] > > ignorePrefetchLUNCount yes > > [gss_ppc64] > > scatterBufferSize 256K > > [ems1-fdr,compute] > > scatterBufferSize 256k > > syncIntervalStrict yes > > [ems1-fdr,compute,gss_ppc64] > > nsdClientCksumTypeLocal ck64 > > nsdClientCksumTypeRemote ck64 > > [gss_ppc64] > > pagepool 72856M > > [ems1-fdr] > > pagepool 17544M > > [compute] > > pagepool 4g > > [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] > > verbsRdma enable > > [gss_ppc64] > > verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 > > [ems1-fdr] > > verbsPorts mlx5_0/1 mlx5_0/2 > > [qsched03-ib0,quser10-fdr,compute] > > verbsPorts mlx4_0/1 > > [common] > > autoload no > > [ems1-fdr,compute,gss_ppc64] > > maxStatCache 0 > > [common] > > envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 > > deadlockOverloadThreshold 0 > > deadlockDetectionThreshold 0 > > adminMode central > > > File systems in cluster ess-qstorage.it.northwestern.edu: > > --------------------------------------------------------- > > /dev/home > > /dev/hpc > > /dev/projects > > /dev/tthome > > On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches > wrote: > > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Jan 11 19:22:31 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 11 Jan 2017 19:22:31 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: Can this be done live? Meaning can GPFS remain up when I turn this off? Thanks, Damir On Wed, Jan 11, 2017 at 12:38 PM Jan-Frode Myklebust wrote: > And there you have: > > [ems1-fdr,compute,gss_ppc64] > verbsRdmaSend yes > > Try turning this off. > > > -jf > ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic : > > Thanks for all the suggestions. Here is our mmlsconfig file. We just > purchased another GL6. During the installation of the new GL6 IBM will > upgrade our existing GL6 up to the latest code levels. This will happen > during the week of 23rd of Jan. > > I am skeptical that the upgrade is going to fix the issue. > > On our IO servers we are running in connected mode (please note that IB > interfaces are bonded) > > [root at gssio1 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib3/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib3/mode > > connected > > Our login nodes are also running connected mode as well. > > However, all of our compute nodes are running in datagram: > > [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode > > qnode0758: datagram > > qnode0763: datagram > > qnode0760: datagram > > qnode0772: datagram > > qnode0773: datagram > ....etc. > > Here is our mmlsconfig: > > [root at gssio1 ~]# mmlsconfig > > Configuration data for cluster ess-qstorage.it.northwestern.edu: > > ---------------------------------------------------------------- > > clusterName ess-qstorage.it.northwestern.edu > > clusterId 17746506346828356609 > > dmapiFileHandleSize 32 > > minReleaseLevel 4.2.0.1 > > ccrEnabled yes > > cipherList AUTHONLY > > [gss_ppc64] > > nsdRAIDBufferPoolSizePct 80 > > maxBufferDescs 2m > > prefetchPct 5 > > nsdRAIDTracks 128k > > nsdRAIDSmallBufferSize 256k > > nsdMaxWorkerThreads 3k > > nsdMinWorkerThreads 3k > > nsdRAIDSmallThreadRatio 2 > > nsdRAIDThreadsPerQueue 16 > > nsdRAIDEventLogToConsole all > > nsdRAIDFastWriteFSDataLimit 256k > > nsdRAIDFastWriteFSMetadataLimit 1M > > nsdRAIDReconstructAggressiveness 1 > > nsdRAIDFlusherBuffersLowWatermarkPct 20 > > nsdRAIDFlusherBuffersLimitPct 80 > > nsdRAIDFlusherTracksLowWatermarkPct 20 > > nsdRAIDFlusherTracksLimitPct 80 > > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > > nsdRAIDFlusherFWLogLimitMB 5000 > > nsdRAIDFlusherThreadsLowWatermark 1 > > nsdRAIDFlusherThreadsHighWatermark 512 > > nsdRAIDBlockDeviceMaxSectorsKB 8192 > > nsdRAIDBlockDeviceNrRequests 32 > > nsdRAIDBlockDeviceQueueDepth 16 > > nsdRAIDBlockDeviceScheduler deadline > > nsdRAIDMaxTransientStale2FT 1 > > nsdRAIDMaxTransientStale3FT 1 > > nsdMultiQueue 512 > > syncWorkerThreads 256 > > nsdInlineWriteMax 32k > > maxGeneralThreads 1280 > > maxReceiverThreads 128 > > nspdQueues 64 > > [common] > > maxblocksize 16m > > [ems1-fdr,compute,gss_ppc64] > > numaMemoryInterleave yes > > [gss_ppc64] > > maxFilesToCache 12k > > [ems1-fdr,compute] > > maxFilesToCache 128k > > [ems1-fdr,compute,gss_ppc64] > > flushedDataTarget 1024 > > flushedInodeTarget 1024 > > maxFileCleaners 1024 > > maxBufferCleaners 1024 > > logBufferCount 20 > > logWrapAmountPct 2 > > logWrapThreads 128 > > maxAllocRegionsPerNode 32 > > maxBackgroundDeletionThreads 16 > > maxInodeDeallocPrefetch 128 > > [gss_ppc64] > > maxMBpS 16000 > > [ems1-fdr,compute] > > maxMBpS 10000 > > [ems1-fdr,compute,gss_ppc64] > > worker1Threads 1024 > > worker3Threads 32 > > [gss_ppc64] > > ioHistorySize 64k > > [ems1-fdr,compute] > > ioHistorySize 4k > > [gss_ppc64] > > verbsRdmaMinBytes 16k > > [ems1-fdr,compute] > > verbsRdmaMinBytes 32k > > [ems1-fdr,compute,gss_ppc64] > > verbsRdmaSend yes > > [gss_ppc64] > > verbsRdmasPerConnection 16 > > [ems1-fdr,compute] > > verbsRdmasPerConnection 256 > > [gss_ppc64] > > verbsRdmasPerNode 3200 > > [ems1-fdr,compute] > > verbsRdmasPerNode 1024 > > [ems1-fdr,compute,gss_ppc64] > > verbsSendBufferMemoryMB 1024 > > verbsRdmasPerNodeOptimize yes > > verbsRdmaUseMultiCqThreads yes > > [ems1-fdr,compute] > > ignorePrefetchLUNCount yes > > [gss_ppc64] > > scatterBufferSize 256K > > [ems1-fdr,compute] > > scatterBufferSize 256k > > syncIntervalStrict yes > > [ems1-fdr,compute,gss_ppc64] > > nsdClientCksumTypeLocal ck64 > > nsdClientCksumTypeRemote ck64 > > [gss_ppc64] > > pagepool 72856M > > [ems1-fdr] > > pagepool 17544M > > [compute] > > pagepool 4g > > [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] > > verbsRdma enable > > [gss_ppc64] > > verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 > > [ems1-fdr] > > verbsPorts mlx5_0/1 mlx5_0/2 > > [qsched03-ib0,quser10-fdr,compute] > > verbsPorts mlx4_0/1 > > [common] > > autoload no > > [ems1-fdr,compute,gss_ppc64] > > maxStatCache 0 > > [common] > > envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 > > deadlockOverloadThreshold 0 > > deadlockDetectionThreshold 0 > > adminMode central > > > File systems in cluster ess-qstorage.it.northwestern.edu: > > --------------------------------------------------------- > > /dev/home > > /dev/hpc > > /dev/projects > > /dev/tthome > > On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches > wrote: > > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jan 11 19:46:00 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 11 Jan 2017 19:46:00 +0000 Subject: [gpfsug-discuss] nodes being ejected out of the cluster In-Reply-To: References: Message-ID: Don't think you can change it without reloading gpfs. Also it should be turned off for all nodes.. So it's a big change, unfortunately.. -jf ons. 11. jan. 2017 kl. 20.22 skrev Damir Krstic : > Can this be done live? Meaning can GPFS remain up when I turn this off? > > Thanks, > Damir > > On Wed, Jan 11, 2017 at 12:38 PM Jan-Frode Myklebust > wrote: > > And there you have: > > [ems1-fdr,compute,gss_ppc64] > verbsRdmaSend yes > > Try turning this off. > > > -jf > ons. 11. jan. 2017 kl. 18.54 skrev Damir Krstic : > > Thanks for all the suggestions. Here is our mmlsconfig file. We just > purchased another GL6. During the installation of the new GL6 IBM will > upgrade our existing GL6 up to the latest code levels. This will happen > during the week of 23rd of Jan. > > I am skeptical that the upgrade is going to fix the issue. > > On our IO servers we are running in connected mode (please note that IB > interfaces are bonded) > > [root at gssio1 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio1 ~]# cat /sys/class/net/ib3/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib0/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib1/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib2/mode > > connected > > [root at gssio2 ~]# cat /sys/class/net/ib3/mode > > connected > > Our login nodes are also running connected mode as well. > > However, all of our compute nodes are running in datagram: > > [root at mgt ~]# psh compute cat /sys/class/net/ib0/mode > > qnode0758: datagram > > qnode0763: datagram > > qnode0760: datagram > > qnode0772: datagram > > qnode0773: datagram > ....etc. > > Here is our mmlsconfig: > > [root at gssio1 ~]# mmlsconfig > > Configuration data for cluster ess-qstorage.it.northwestern.edu: > > ---------------------------------------------------------------- > > clusterName ess-qstorage.it.northwestern.edu > > clusterId 17746506346828356609 > > dmapiFileHandleSize 32 > > minReleaseLevel 4.2.0.1 > > ccrEnabled yes > > cipherList AUTHONLY > > [gss_ppc64] > > nsdRAIDBufferPoolSizePct 80 > > maxBufferDescs 2m > > prefetchPct 5 > > nsdRAIDTracks 128k > > nsdRAIDSmallBufferSize 256k > > nsdMaxWorkerThreads 3k > > nsdMinWorkerThreads 3k > > nsdRAIDSmallThreadRatio 2 > > nsdRAIDThreadsPerQueue 16 > > nsdRAIDEventLogToConsole all > > nsdRAIDFastWriteFSDataLimit 256k > > nsdRAIDFastWriteFSMetadataLimit 1M > > nsdRAIDReconstructAggressiveness 1 > > nsdRAIDFlusherBuffersLowWatermarkPct 20 > > nsdRAIDFlusherBuffersLimitPct 80 > > nsdRAIDFlusherTracksLowWatermarkPct 20 > > nsdRAIDFlusherTracksLimitPct 80 > > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > > nsdRAIDFlusherFWLogLimitMB 5000 > > nsdRAIDFlusherThreadsLowWatermark 1 > > nsdRAIDFlusherThreadsHighWatermark 512 > > nsdRAIDBlockDeviceMaxSectorsKB 8192 > > nsdRAIDBlockDeviceNrRequests 32 > > nsdRAIDBlockDeviceQueueDepth 16 > > nsdRAIDBlockDeviceScheduler deadline > > nsdRAIDMaxTransientStale2FT 1 > > nsdRAIDMaxTransientStale3FT 1 > > nsdMultiQueue 512 > > syncWorkerThreads 256 > > nsdInlineWriteMax 32k > > maxGeneralThreads 1280 > > maxReceiverThreads 128 > > nspdQueues 64 > > [common] > > maxblocksize 16m > > [ems1-fdr,compute,gss_ppc64] > > numaMemoryInterleave yes > > [gss_ppc64] > > maxFilesToCache 12k > > [ems1-fdr,compute] > > maxFilesToCache 128k > > [ems1-fdr,compute,gss_ppc64] > > flushedDataTarget 1024 > > flushedInodeTarget 1024 > > maxFileCleaners 1024 > > maxBufferCleaners 1024 > > logBufferCount 20 > > logWrapAmountPct 2 > > logWrapThreads 128 > > maxAllocRegionsPerNode 32 > > maxBackgroundDeletionThreads 16 > > maxInodeDeallocPrefetch 128 > > [gss_ppc64] > > maxMBpS 16000 > > [ems1-fdr,compute] > > maxMBpS 10000 > > [ems1-fdr,compute,gss_ppc64] > > worker1Threads 1024 > > worker3Threads 32 > > [gss_ppc64] > > ioHistorySize 64k > > [ems1-fdr,compute] > > ioHistorySize 4k > > [gss_ppc64] > > verbsRdmaMinBytes 16k > > [ems1-fdr,compute] > > verbsRdmaMinBytes 32k > > [ems1-fdr,compute,gss_ppc64] > > verbsRdmaSend yes > > [gss_ppc64] > > verbsRdmasPerConnection 16 > > [ems1-fdr,compute] > > verbsRdmasPerConnection 256 > > [gss_ppc64] > > verbsRdmasPerNode 3200 > > [ems1-fdr,compute] > > verbsRdmasPerNode 1024 > > [ems1-fdr,compute,gss_ppc64] > > verbsSendBufferMemoryMB 1024 > > verbsRdmasPerNodeOptimize yes > > verbsRdmaUseMultiCqThreads yes > > [ems1-fdr,compute] > > ignorePrefetchLUNCount yes > > [gss_ppc64] > > scatterBufferSize 256K > > [ems1-fdr,compute] > > scatterBufferSize 256k > > syncIntervalStrict yes > > [ems1-fdr,compute,gss_ppc64] > > nsdClientCksumTypeLocal ck64 > > nsdClientCksumTypeRemote ck64 > > [gss_ppc64] > > pagepool 72856M > > [ems1-fdr] > > pagepool 17544M > > [compute] > > pagepool 4g > > [ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64] > > verbsRdma enable > > [gss_ppc64] > > verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2 > > [ems1-fdr] > > verbsPorts mlx5_0/1 mlx5_0/2 > > [qsched03-ib0,quser10-fdr,compute] > > verbsPorts mlx4_0/1 > > [common] > > autoload no > > [ems1-fdr,compute,gss_ppc64] > > maxStatCache 0 > > [common] > > envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1 > > deadlockOverloadThreshold 0 > > deadlockDetectionThreshold 0 > > adminMode central > > > File systems in cluster ess-qstorage.it.northwestern.edu: > > --------------------------------------------------------- > > /dev/home > > /dev/hpc > > /dev/projects > > /dev/tthome > > On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches > wrote: > > In addition to what Olaf has said > > ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact, > on those noes you should do not update those solo (unless support says so > in your PMR), so if that's been the recommendation, I suggest you look at > it. > > Changelog on ESS 4.0.4 (no idea what ESS level you are running) > > > c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1 > - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2) > - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x) > - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x) > - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E) > - Requires System FW level FW840.20 (SV840_104) > - No changes from ESS 4.0.3 > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 <+358%2050%203112585> > > "If you continually give you will continually have." Anonymous > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster > Date: Wed, Jan 11, 2017 5:03 PM > > most likely, there's smth wrong with your IB fabric ... > you say, you run ~ 700 nodes ? ... > Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to > disable - and discuss this within the PMR > another issue, you may check is - Are you running the IPoIB in connected > mode or datagram ... but as I said, please discuss this within the PMR .. > there are to much dependencies to discuss this here .. > > > cheers > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 <+49%20170%205794466> > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: Damir Krstic > To: gpfsug main discussion list > Date: 01/11/2017 03:39 PM > Subject: [gpfsug-discuss] nodes being ejected out of the cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our > storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are > connected via Infiniband (FDR14). At the time of implementation of ESS, we > were instructed to enable RDMA in addition to IPoIB. Previously we only ran > IPoIB on our GPFS3.5 cluster. > > Every since the implementation (sometime back in July of 2016) we see a > lot of compute nodes being ejected. What usually precedes the ejection are > following messages: > > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 1 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum > 0 vendor_err 135 > Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error > IBV_WC_WR_FLUSH_ERR index 400 > > Even our ESS IO server sometimes ends up being ejected (case in point - > yesterday morning): > > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3001 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2671 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum > 0 vendor_err 135 > Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 2495 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error > IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum > 0 vendor_err 135 > Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to > 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error > IBV_WC_RNR_RETRY_EXC_ERR index 3077 > Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease > renewal is overdue. Pinging to check if it is alive > > I've had multiple PMRs open for this issue, and I am told that our ESS > needs code level upgrades in order to fix this issue. Looking at the > errors, I think the issue is Infiniband related, and I am wondering if > anyone on this list has seen similar issues? > > Thanks for your help in advance. > > Damir_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jan 11 22:33:24 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 11 Jan 2017 15:33:24 -0700 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From r.sobey at imperial.ac.uk Thu Jan 12 09:51:12 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 12 Jan 2017 09:51:12 +0000 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not? This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible. Food for thought though! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 11 January 2017 22:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES log files A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aleciarm at us.ibm.com Thu Jan 12 14:54:12 2017 From: aleciarm at us.ibm.com (Alecia A Ramsay) Date: Thu, 12 Jan 2017 09:54:12 -0500 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: The Spectrum Scale Knowledge Center does have a topic on collecting CES log files. This might be helpful (4.2.2 version): http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1pdg_ces_monitor_admin.htm Alecia A. Ramsay, PMP? Program Manager, New Technology Introduction IBM Systems - Storage aleciarm at us.ibm.com work: 919-435-6494; mobile: 651-260-4928 https://www-01.ibm.com/marketing/iwm/iwmdocs/web/cc/earlyprograms/systems.shtml From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/12/2017 04:51 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not? This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible. Food for thought though! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 11 January 2017 22:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES log files A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From christof.schmitt at us.ibm.com Thu Jan 12 18:06:48 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 12 Jan 2017 11:06:48 -0700 Subject: [gpfsug-discuss] CES log files In-Reply-To: References: Message-ID: It looks like the patch for the mentioned bugzilla is in 4.2.2.0, but not in 4.2.1.2. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/12/2017 02:51 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not? This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible. Food for thought though! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 11 January 2017 22:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES log files A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105 Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix. To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 01/11/2017 07:00 AM Subject: Re: [gpfsug-discuss] CES log files Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks. Some of the node would just say ?failed? or ?degraded? with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it. Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all. JF ? I?ll look at the protocol tracing next time this happens. It?s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie Sent: 11 January 2017 09:55 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES log files mmhealth might be a good place to start CES should probably throw a message along the lines of the following: mmhealth shows something is wrong with AD server: ... CES DEGRADED ads_down ... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug-discuss at spectrumscale.org'" Cc: Subject: [gpfsug-discuss] CES log files Date: Wed, Jan 11, 2017 7:27 PM Which files do I need to look in to determine what?s happening with CES? supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again. Mmfs.log.latest said everything was fine btw. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mimarsh2 at vt.edu Fri Jan 13 19:50:10 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 13 Jan 2017 14:50:10 -0500 Subject: [gpfsug-discuss] Authorized Key Messages Message-ID: All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jan 13 20:14:03 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 13 Jan 2017 15:14:03 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: Brian, This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 02:50 PM Subject: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Fri Jan 13 20:19:25 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 13 Jan 2017 15:19:25 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: We are running 4.2.1 (there may be some point fixes we don't have) Any report of it being in this version? Brian On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop wrote: > Brian, > > This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. > > Regards, > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/13/2017 02:50 PM > Subject: [gpfsug-discuss] Authorized Key Messages > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > All, > > I just saw this message start popping up constantly on one our NSD Servers. > > [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist > > CCR Auth is disabled on all the NSD Servers. > > What other features/checks would look for the ccr keys? > > Thanks, > Brian Marshall > Virginia Tech_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jan 13 22:58:02 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 13 Jan 2017 17:58:02 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: Brian, I had to check again whether the fix in question was in 4.2.0.0 (as opposed to a newer mod release), but confirmed that it seems to be. So this could be a new or different problem than the one I was thinking about. Researching a bit further, I found another potential match (internal defect number 981469), but that should be fixed in 4.2.1 as well. I have not seen recent reports of this problem. Perhaps this could be pursued via a PMR. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 03:21 PM Subject: Re: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org We are running 4.2.1 (there may be some point fixes we don't have) Any report of it being in this version? Brian On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop wrote: Brian, This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 02:50 PM Subject: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Jan 13 23:30:05 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 13 Jan 2017 18:30:05 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: Our intent was to have ccr turned off since all nodes are quorum in the server cluster: Considering this: [root at cl001 ~]# mmfsadm dump config | grep -i ccr ! ccrEnabled 0 ccrMaxChallengeCheckRetries 4 ccr : 0 (cluster configuration repository) ccr : 1 (cluster configuration repository) Will this disable ccr? On Fri, Jan 13, 2017 at 5:58 PM, Felipe Knop wrote: > Brian, > > I had to check again whether the fix in question was in 4.2.0.0 (as > opposed to a newer mod release), but confirmed that it seems to be. So > this could be a new or different problem than the one I was thinking about. > > Researching a bit further, I found another potential match (internal > defect number 981469), but that should be fixed in 4.2.1 as well. I have > not seen recent reports of this problem. > > Perhaps this could be pursued via a PMR. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/13/2017 03:21 PM > Subject: Re: [gpfsug-discuss] Authorized Key Messages > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We are running 4.2.1 (there may be some point fixes we don't have) > > Any report of it being in this version? > > Brian > > On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop <*knop at us.ibm.com* > > wrote: > Brian, > > This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. > > Regards, > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > *(845) 433-9314* <(845)%20433-9314> T/L 293-9314 > > > > > > From: Brian Marshall <*mimarsh2 at vt.edu* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 01/13/2017 02:50 PM > Subject: [gpfsug-discuss] Authorized Key Messages > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > All, > > I just saw this message start popping up constantly on one our NSD Servers. > > [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist > > CCR Auth is disabled on all the NSD Servers. > > What other features/checks would look for the ccr keys? > > Thanks, > Brian Marshall > Virginia Tech_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jan 13 23:48:37 2017 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 13 Jan 2017 18:48:37 -0500 Subject: [gpfsug-discuss] Authorized Key Messages In-Reply-To: References: Message-ID: "! ccrEnabled 0" does indicate that CCR is disabled on the (server) cluster. In fact, instances of this '/var/mmfs/ssl/authorized_ccr_keys' does not exist message have been seen in clusters where CCR was disabled. It's just somewhat puzzling that the error message is appears in 4.2.1 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "J. Eric Wonderley" To: gpfsug main discussion list Date: 01/13/2017 06:30 PM Subject: Re: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org Our intent was to have ccr turned off since all nodes are quorum in the server cluster: Considering this: [root at cl001 ~]# mmfsadm dump config | grep -i ccr ! ccrEnabled 0 ccrMaxChallengeCheckRetries 4 ccr : 0 (cluster configuration repository) ccr : 1 (cluster configuration repository) Will this disable ccr? On Fri, Jan 13, 2017 at 5:58 PM, Felipe Knop wrote: Brian, I had to check again whether the fix in question was in 4.2.0.0 (as opposed to a newer mod release), but confirmed that it seems to be. So this could be a new or different problem than the one I was thinking about. Researching a bit further, I found another potential match (internal defect number 981469), but that should be fixed in 4.2.1 as well. I have not seen recent reports of this problem. Perhaps this could be pursued via a PMR. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 03:21 PM Subject: Re: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org We are running 4.2.1 (there may be some point fixes we don't have) Any report of it being in this version? Brian On Fri, Jan 13, 2017 at 3:14 PM, Felipe Knop wrote: Brian, This seems to match a problem which was fixed in 4.1.1.7 and 4.2.0.0. Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Brian Marshall To: gpfsug main discussion list Date: 01/13/2017 02:50 PM Subject: [gpfsug-discuss] Authorized Key Messages Sent by: gpfsug-discuss-bounces at spectrumscale.org All, I just saw this message start popping up constantly on one our NSD Servers. [N] Auth: '/var/mmfs/ssl/authorized_ccr_keys' does not exist CCR Auth is disabled on all the NSD Servers. What other features/checks would look for the ccr keys? Thanks, Brian Marshall Virginia Tech_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Sun Jan 15 21:18:31 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Sun, 15 Jan 2017 21:18:31 +0000 Subject: [gpfsug-discuss] GUI "maintenance mode" Message-ID: Is there a way, perhaps through the CLI, to set a node in maintenance mode so the GUI alerting doesn't flag it up as being down? Pretty sure the option isn't available through the GUI's GUI if you'll pardon the expression. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Jan 17 21:50:53 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 17 Jan 2017 16:50:53 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM Message-ID: UG, I have a GPFS filesystem. I have a OpenStack private cloud. What is the best way for Nova Compute VMs to have access to data inside the GPFS filesystem? 1)Should VMs mount GPFS directly with a GPFS client? 2) Should the hypervisor mount GPFS and share to nova computes? 3) Should I create GPFS protocol servers that allow nova computes to mount of NFS? All advice is welcome. Best, Brian Marshall Virginia Tech -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Tue Jan 17 21:16:20 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Tue, 17 Jan 2017 16:16:20 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs Message-ID: I have messages like these frequent my logs: Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 vendor_err 136 Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error IBV_WC_REM_ACCESS_ERR index 23 Any ideas on cause..? -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jan 18 00:47:04 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Tue, 17 Jan 2017 19:47:04 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> I think the 1st option creates the challenges both with security (e.g. do you fully trust the users of your VMs not to do bad things as root either maliciously or accidentally? how do you ensure userids are properly mapped inside the guest?) and logistically (as VMs come and go how do you automate adding them/removing them to/from the GPFS cluster). I think the 2nd option is ideal perhaps using something like 9p (http://www.linux-kvm.org/page/9p_virtio) to export filesystems from the hypervisor to the guest. I'm not sure how you would integrate this with Nova and I've heard from others that there are stability issues, but I can't comment first hand. Another option might be to NFS/CIFS export the filesystems from the hypervisor to the guests via the 169.254.169.254 metadata address although I don't know how feasible that may or may not be. The advantage to using the metadata address is it should scale well and it should take the pain out of a guest mapping an IP address to its local hypervisor using an external method. Perhaps number 3 is the best way to go, especially (arguably only) if you use kerberized NFS or SMB. That way you don't have to trust anything about the guest and you theoretically should get decent performance. I'm really curious what other folks have done on this front. -Aaron On 1/17/17 4:50 PM, Brian Marshall wrote: > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to > mount of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From malone12 at illinois.edu Wed Jan 18 03:05:15 2017 From: malone12 at illinois.edu (Maloney, John Daniel) Date: Wed, 18 Jan 2017 03:05:15 +0000 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> References: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> Message-ID: <6CADE9B2-3691-4F44-B241-DABA02385B42@illinois.edu> I agree with Aaron on option 1, trusting users to do nothing malicious would be quite a stretch for most people?s use cases. Even if they do, if their user?s credentials getting stolen, and then used by someone else it could be a real issue as the hacker wouldn?t have to get lucky and find a VM with an un-patched root escalation vulnerability. Security aside, you?ll probably want to make sure your VMs have an external IP that is able to be reached by the GPFS cluster. We found having GPFS route through the Openstack NAT to be possible, but tricky (though this was an older version of Openstack?could be better now?). Using the external IP may be the natural way for most folks, but wanted to point it out none-the-less. We haven?t done much in regards to option 2, we?ve done work using native clients on the hypervisors to provide cinder/glance storage, but not to share other data into the VM?s. Currently use option 3 to export group?s project directories to their VMs using the CES protocol nodes. It?s getting the job done right now (have close to 100 VMs mounting from it). I would definitely recommend giving your maxFilesToCache and maxStatCache parameters a big bump from defaults on the export nodes if you weren?t planning to already (set mine at 1,000,000 on each of those). We saw that become a point of contention with our user?s workloads. That change was implemented fairly recently and so far, so good. Aaron?s point about logistics from his answer to option 1 is relevant here too, especially if you have high VM turnover rate where IP addresses are recycled and different projects are getting exported. You?ll want to keep track of VM?s and exports to prevent a new VM from picking up an old IP that has access on an export it isn?t supposed to because it hasn?t been flushed out. In our situation there are 30-40 projects, all names of them known to users who ls the project directory, wouldn?t take much for them to spin up a new VM and give them all a try. I agree this is a really interesting topic, there?s a lot of ways to come at this so hopefully more folks chime in on what they?re doing. Best, J.D. Maloney Storage Engineer | Storage Enabling Technologies Group National Center for Supercomputing Applications (NCSA) On 1/17/17, 6:47 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister" wrote: I think the 1st option creates the challenges both with security (e.g. do you fully trust the users of your VMs not to do bad things as root either maliciously or accidentally? how do you ensure userids are properly mapped inside the guest?) and logistically (as VMs come and go how do you automate adding them/removing them to/from the GPFS cluster). I think the 2nd option is ideal perhaps using something like 9p (http://www.linux-kvm.org/page/9p_virtio) to export filesystems from the hypervisor to the guest. I'm not sure how you would integrate this with Nova and I've heard from others that there are stability issues, but I can't comment first hand. Another option might be to NFS/CIFS export the filesystems from the hypervisor to the guests via the 169.254.169.254 metadata address although I don't know how feasible that may or may not be. The advantage to using the metadata address is it should scale well and it should take the pain out of a guest mapping an IP address to its local hypervisor using an external method. Perhaps number 3 is the best way to go, especially (arguably only) if you use kerberized NFS or SMB. That way you don't have to trust anything about the guest and you theoretically should get decent performance. I'm really curious what other folks have done on this front. -Aaron On 1/17/17 4:50 PM, Brian Marshall wrote: > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to > mount of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Wed Jan 18 08:46:53 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 18 Jan 2017 08:46:53 +0000 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> References: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> Message-ID: >Another option might be to NFS/CIFS export the >filesystems from the hypervisor to the guests via the 169.254.169.254 >metadata address although I don't know how feasible that may or may not Doesn't the metadata IP site on the network nodes though and not the hypervisor? We currently have created interfaces on out net nodes attached to the appropriate VLAN/VXLAN and then run CES on top of that. The problem with this is if you have the same subnet existing in two networks, then you have a problem. I had some discussion with some of the IBM guys about the possibility of using a different CES protocol group and running multiple ganesha servers (maybe a container attached to the net?) so you could then have different NFS configs on different ganesha instances with CES managing a floating IP that could exist multiple times. There were some potential issues in the way the CES HA bits work though with this approach. Simon From S.J.Thompson at bham.ac.uk Wed Jan 18 08:59:48 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 18 Jan 2017 08:59:48 +0000 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: References: Message-ID: I'd be inclined to look at something like: ibqueryerrors -s PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors -c And see if you have a high number of symbol errors, might be a cable needs replugging or replacing. Simon From: > on behalf of "J. Eric Wonderley" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Tuesday, 17 January 2017 at 21:16 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs I have messages like these frequent my logs: Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 vendor_err 136 Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error IBV_WC_REM_ACCESS_ERR index 23 Any ideas on cause..? -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jan 18 15:22:51 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 18 Jan 2017 10:22:51 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: References: Message-ID: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> I'm curious about this too. We see these messages sometimes when things have gone horribly wrong but also sometimes during recovery events. Here's a recent one: loremds20 (manager/nsd node): Mon Jan 16 14:19:02.048 2017: [E] VERBS RDMA rdma read error IBV_WC_REM_ACCESS_ERR to 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 vendor_err 136 Mon Jan 16 14:19:02.049 2017: [E] VERBS RDMA closed connection to 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 due to RDMA read error IBV_WC_REM_ACCESS_ERR index 11 lorej006 (client): Mon Jan 16 14:19:01.990 2017: [N] VERBS RDMA closed connection to 10.101.53.18 (loremds18) on mlx5_0 port 1 fabnum 3 index 2 Mon Jan 16 14:19:01.995 2017: [N] VERBS RDMA closed connection to 10.101.53.19 (loremds19) on mlx5_0 port 1 fabnum 3 index 0 Mon Jan 16 14:19:01.997 2017: [I] Recovering nodes: 10.101.53.18 10.101.53.19 Mon Jan 16 14:19:02.047 2017: [W] VERBS RDMA async event IBV_EVENT_QP_ACCESS_ERR on mlx5_0 qp 0x7fffe550f1c8. Mon Jan 16 14:19:02.051 2017: [E] VERBS RDMA closed connection to 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 error 733 index 1 Mon Jan 16 14:19:02.071 2017: [I] Recovered 2 nodes for file system tnb32. Mon Jan 16 14:19:02.140 2017: [I] VERBS RDMA connecting to 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 index 0 Mon Jan 16 14:19:02.160 2017: [I] VERBS RDMA connected to 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 sl 0 index 0 I had just shut down loremds18 and loremds19 so there was certainly recovery taking place and during that time is when the error seems to have occurred. I looked up the meaning of IBV_WC_REM_ACCESS_ERR here (http://www.rdmamojo.com/2013/02/15/ibv_poll_cq/) and see this: IBV_WC_REM_ACCESS_ERR (10) - Remote Access Error: a protection error occurred on a remote data buffer to be read by an RDMA Read, written by an RDMA Write or accessed by an atomic operation. This error is reported only on RDMA operations or atomic operations. Relevant for RC QPs. my take on it during recovery it seems like one end of the connection more or less hanging up on the other end (e.g. Connection reset by peer /ECONNRESET). But like I said at the start, we also see this when there something has gone awfully wrong. -Aaron On 1/18/17 3:59 AM, Simon Thompson (Research Computing - IT Services) wrote: > I'd be inclined to look at something like: > > ibqueryerrors -s > PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors > -c > > And see if you have a high number of symbol errors, might be a cable > needs replugging or replacing. > > Simon > > From: > on behalf of "J. Eric > Wonderley" > > Reply-To: "gpfsug-discuss at spectrumscale.org > " > > > Date: Tuesday, 17 January 2017 at 21:16 > To: "gpfsug-discuss at spectrumscale.org > " > > > Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs > > I have messages like these frequent my logs: > Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error > IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 > vendor_err 136 > Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to > 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error > IBV_WC_REM_ACCESS_ERR index 23 > > Any ideas on cause..? > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jan 18 15:56:16 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 18 Jan 2017 15:56:16 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Message-ID: Hi All, We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.b.mills at nasa.gov Wed Jan 18 16:10:51 2017 From: jonathan.b.mills at nasa.gov (Jonathan Mills) Date: Wed, 18 Jan 2017 11:10:51 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: <012f8e22-1b04-1f12-0bba-d4ba235d8762@nasa.gov> Message-ID: <8d41b8c8-eb84-3d1c-eec2-d26f1816108b@nasa.gov> On 1/18/17 3:46 AM, Simon Thompson (Research Computing - IT Services) wrote: > >> Another option might be to NFS/CIFS export the >> filesystems from the hypervisor to the guests via the 169.254.169.254 >> metadata address although I don't know how feasible that may or may not > > Doesn't the metadata IP site on the network nodes though and not the > hypervisor? Not when Neutron is in DVR mode. It is intercepted at the hypervisor and redirected to the neutron-ns-metadata-proxy. See below: [root at gpcc003 ~]# ip netns exec qrouter-bc4aa217-5128-4eec-b9af-67923dae319a iptables -t nat -nvL neutron-l3-agent-PREROUTING Chain neutron-l3-agent-PREROUTING (1 references) pkts bytes target prot opt in out source destination 19 1140 REDIRECT tcp -- qr-+ * 0.0.0.0/0 169.254.169.254 tcp dpt:80 redir ports 9697 281 12650 DNAT all -- rfp-bc4aa217-5 * 0.0.0.0/0 169.154.180.32 to:10.0.4.22 [root at gpcc003 ~]# ip netns exec qrouter-bc4aa217-5128-4eec-b9af-67923dae319a netstat -tulpn |grep 9697 tcp 0 0 0.0.0.0:9697 0.0.0.0:* LISTEN 28130/python2 [root at gpcc003 ~]# ps aux |grep 28130 neutron 28130 0.0 0.0 286508 41364 ? S Jan04 0:02 /usr/bin/python2 /bin/neutron-ns-metadata-proxy --pid_file=/var/lib/neutron/external/pids/bc4aa217-5128-4eec-b9af-67923dae319a.pid --metadata_proxy_socket=/var/lib/neutron/metadata_proxy --router_id=bc4aa217-5128-4eec-b9af-67923dae319a --state_path=/var/lib/neutron --metadata_port=9697 --metadata_proxy_user=989 --metadata_proxy_group=986 --verbose --log-file=neutron-ns-metadata-proxy-bc4aa217-5128-4eec-b9af-67923dae319a.log --log-dir=/var/log/neutron root 31220 0.0 0.0 112652 972 pts/1 S+ 11:08 0:00 grep --color=auto 28130 > > We currently have created interfaces on out net nodes attached to the > appropriate VLAN/VXLAN and then run CES on top of that. > > The problem with this is if you have the same subnet existing in two > networks, then you have a problem. > > I had some discussion with some of the IBM guys about the possibility of > using a different CES protocol group and running multiple ganesha servers > (maybe a container attached to the net?) so you could then have different > NFS configs on different ganesha instances with CES managing a floating IP > that could exist multiple times. > > There were some potential issues in the way the CES HA bits work though > with this approach. > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Jonathan Mills / jonathan.mills at nasa.gov NASA GSFC / NCCS HPC (606.2) Bldg 28, Rm. S230 / c. 252-412-5710 From mimarsh2 at vt.edu Wed Jan 18 16:22:12 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Wed, 18 Jan 2017 11:22:12 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: To answer some more questions: What sort of workload will your Nova VM's be running? This is largely TBD but we anticipate webapps and other non-batch ways of interacting with and post processing data that has been computed on HPC batch systems. For example a user might host a website that allows users to view pieces of a large data set and do some processing in private cloud or kick off larger jobs on HPC clusters How many VM's are you running? This work is still in the design / build phase. We have 48 servers slated for the project. At max maybe 500 VMs; again this is a pretty wild estimate. This is a new service we are looking to provide What is your Network interconnect between the Scale Storage cluster and the Nova Compute cluster Each nova node has a dual 10gigE connection to switches that uplink to our core 40 gigE switches were NSD Servers are directly connectly. The information so far has been awesome. Thanks everyone. I am definitely leaning towards option #3 of creating protocol servers. Are there any design/build white papers targetting the virutalization use case? Thanks, Brian On Tue, Jan 17, 2017 at 5:55 PM, Andrew Beattie wrote: > HI Brian, > > > Couple of questions for you: > > What sort of workload will your Nova VM's be running? > How many VM's are you running? > What is your Network interconnect between the Scale Storage cluster and > the Nova Compute cluster > > I have cc'd Jake Carrol from University of Queensland in on the email as I > know they have done some basic performance testing using Scale to provide > storage to Openstack. > One of the issues that they found was the Openstack network translation > was a performance limiting factor. > > I think from memory the best performance scenario they had was, when they > installed the scale client locally into the virtual machines > > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Brian Marshall > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM > Date: Wed, Jan 18, 2017 7:51 AM > > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to mount > of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Wed Jan 18 16:58:24 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Wed, 18 Jan 2017 11:58:24 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> References: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> Message-ID: As background, we recently upgraded GPFS from 4.2.0 to 4.2.1 and updated the Mellanox OFED on our compute cluster to allow it to move from CentOS 7.1 to 7.2 We do some transient warnings from the Mellanox switch gear about various port counters that we are tracking down with them. Jobs and filesystem seem stable, but the logs are concerning. On Wed, Jan 18, 2017 at 10:22 AM, Aaron Knister wrote: > I'm curious about this too. We see these messages sometimes when things > have gone horribly wrong but also sometimes during recovery events. Here's > a recent one: > > loremds20 (manager/nsd node): > Mon Jan 16 14:19:02.048 2017: [E] VERBS RDMA rdma read error > IBV_WC_REM_ACCESS_ERR to 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 > vendor_err 136 > Mon Jan 16 14:19:02.049 2017: [E] VERBS RDMA closed connection to > 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 due to RDMA read error > IBV_WC_REM_ACCESS_ERR index 11 > > lorej006 (client): > Mon Jan 16 14:19:01.990 2017: [N] VERBS RDMA closed connection to > 10.101.53.18 (loremds18) on mlx5_0 port 1 fabnum 3 index 2 > Mon Jan 16 14:19:01.995 2017: [N] VERBS RDMA closed connection to > 10.101.53.19 (loremds19) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:01.997 2017: [I] Recovering nodes: 10.101.53.18 > 10.101.53.19 > Mon Jan 16 14:19:02.047 2017: [W] VERBS RDMA async event > IBV_EVENT_QP_ACCESS_ERR on mlx5_0 qp 0x7fffe550f1c8. > Mon Jan 16 14:19:02.051 2017: [E] VERBS RDMA closed connection to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 error 733 index 1 > Mon Jan 16 14:19:02.071 2017: [I] Recovered 2 nodes for file system tnb32. > Mon Jan 16 14:19:02.140 2017: [I] VERBS RDMA connecting to 10.101.53.20 > (loremds20) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:02.160 2017: [I] VERBS RDMA connected to 10.101.53.20 > (loremds20) on mlx5_0 port 1 fabnum 3 sl 0 index 0 > > I had just shut down loremds18 and loremds19 so there was certainly > recovery taking place and during that time is when the error seems to have > occurred. > > I looked up the meaning of IBV_WC_REM_ACCESS_ERR here ( > http://www.rdmamojo.com/2013/02/15/ibv_poll_cq/) and see this: > > IBV_WC_REM_ACCESS_ERR (10) - Remote Access Error: a protection error > occurred on a remote data buffer to be read by an RDMA Read, written by an > RDMA Write or accessed by an atomic operation. This error is reported only > on RDMA operations or atomic operations. Relevant for RC QPs. > > my take on it during recovery it seems like one end of the connection more > or less hanging up on the other end (e.g. Connection reset by peer > /ECONNRESET). > > But like I said at the start, we also see this when there something has > gone awfully wrong. > > -Aaron > > On 1/18/17 3:59 AM, Simon Thompson (Research Computing - IT Services) > wrote: > >> I'd be inclined to look at something like: >> >> ibqueryerrors -s >> PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemot >> ePhysicalErrors >> -c >> >> And see if you have a high number of symbol errors, might be a cable >> needs replugging or replacing. >> >> Simon >> >> From: > > on behalf of "J. Eric >> Wonderley" > >> Reply-To: "gpfsug-discuss at spectrumscale.org >> " >> > mscale.org>> >> Date: Tuesday, 17 January 2017 at 21:16 >> To: "gpfsug-discuss at spectrumscale.org >> " >> > mscale.org>> >> Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs >> >> I have messages like these frequent my logs: >> Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error >> IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 >> vendor_err 136 >> Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to >> 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error >> IBV_WC_REM_ACCESS_ERR index 23 >> >> Any ideas on cause..? >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From veb2005 at med.cornell.edu Wed Jan 18 22:54:10 2017 From: veb2005 at med.cornell.edu (Vanessa Borcherding) Date: Wed, 18 Jan 2017 17:54:10 -0500 Subject: [gpfsug-discuss] Issue with X forwarding Message-ID: Hi All, We've got a new-ish 4.1.1.0 Advanced cluster and we've run into a strange problem: users who have their home directory on the GPFS filesystem cannot do X11 forwarding. They get the following error: "/usr/bin/xauth: error in locking authority file /home/user/.Xauthority" The file ~/.Xauthority is there and also a new one ~/.Xauthority-c. Similarly, "xauth -b" also fails: Attempting to break locks on authority file /home/user/.Xauthority xauth: error in locking authority file /home/user/.Xauthority This behavior happens regardless of the client involved, and happens across multiple OS/kernel versions, and if GPFS is mounted natively or via NFS export. For any given host, if the user's home directory is moved to another NFS-exported location, X forwarding works correctly. Has anyone seen this before, or have any idea as to where it's coming from? Thanks, Vanessa -- * * * * * Vanessa Borcherding Director, Scientific Computing Technology Manager - Applied Bioinformatics Core Dept. of Physiology and Biophysics Institute for Computational Biomedicine Weill Cornell Medical College (212) 746-6281 - office (917) 861-9777 - cell * * * * * -------------- next part -------------- An HTML attachment was scrubbed... URL: From farid.chabane at ymail.com Thu Jan 19 06:00:54 2017 From: farid.chabane at ymail.com (FC) Date: Thu, 19 Jan 2017 06:00:54 +0000 (UTC) Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> Message-ID: <51281598.14159900.1484805654772@mail.yahoo.com> Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Jan 19 08:45:20 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 19 Jan 2017 09:45:20 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 19 15:46:55 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jan 2017 15:46:55 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: Hi Olaf, The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? Thanks... Kevin On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: have you checked, where th fsmgr runs as you have nodes with different code levels mmlsmgr From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 01/18/2017 04:57 PM Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Jan 19 16:05:41 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 19 Jan 2017 17:05:41 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 19 16:25:20 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jan 2017 16:25:20 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> Hi Olaf, We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: 1. mmrepquota reports mostly GIDs, only a few group names 2. /etc/nsswitch.conf says to look at files first 3. the GID is in /etc/group 4. length of group name doesn?t matter I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) Thanks? Kevin On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster cheers From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 01/19/2017 04:48 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? Thanks... Kevin On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: have you checked, where th fsmgr runs as you have nodes with different code levels mmlsmgr From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 01/18/2017 04:57 PM Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Thu Jan 19 16:36:42 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Thu, 19 Jan 2017 17:36:42 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> Message-ID: <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek From peserocka at gmail.com Thu Jan 19 17:07:55 2017 From: peserocka at gmail.com (Peter Serocka) Date: Fri, 20 Jan 2017 01:07:55 +0800 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: <7D8E5B3D-6BA9-4362-984D-6A74448FA7BC@gmail.com> Any caching in effect? Like nscd which is configured separately in /etc/nscd.conf Any insights from strace?ing mmrepquota? For example, when a plain ls -l doesn?t look groups up in /etc/group but queries from nscd instead, strace output has something like: connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0 sendto(4, "\2\0\0\0\f\0\0\0\6\0\0\0group\0", 18, MSG_NOSIGNAL, NULL, 0) = 18 ? Peter > On 2017 Jan 19 Thu, at 23:46, Buterbaugh, Kevin L wrote: > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > >> On Jan 19, 2017, at 2:45 AM, Olaf Weiser wrote: >> >> have you checked, where th fsmgr runs as you have nodes with different code levels >> >> mmlsmgr >> >> >> >> >> From: "Buterbaugh, Kevin L" >> To: gpfsug main discussion list >> Date: 01/18/2017 04:57 PM >> Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi All, >> >> We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. >> >> From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. >> >> However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). >> >> I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) >> >> I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? >> >> Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Thu Jan 19 17:16:27 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 19 Jan 2017 18:16:27 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Thu Jan 19 18:07:32 2017 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 19 Jan 2017 19:07:32 +0100 Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 In-Reply-To: <51281598.14159900.1484805654772@mail.yahoo.com> References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> <51281598.14159900.1484805654772@mail.yahoo.com> Message-ID: Hi Farid, there is no official way for disabling the system health monitoring because other components rely on it (e.g. GUI, CES, Install Toolkit,..) If you are fine with the consequences you can just delete the mmsysmonitor.conf, which will prevent the monitor from starting. During our testing we did not see a significant performance impact caused by the monitoring. In 4.2.2 some component monitors (e.g. disk) have been further improved to reduce polling and use notifications instead. Nevertheless, I would like to better understand what the issue is. What kind of workload do you run ? Do you see spikes in CPU usage every 30 seconds ? Is it the same on all cluster nodes or just on some of them ? Could you send us the output of "mmhealth node show -v" to see which monitors are active. It might make sense to open a PMR to get this issue fixed. Thanks. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale - Release Lead Architect (4.2.X Release) System Health and Problem Determination Architect IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: FC To: "gpfsug-discuss at spectrumscale.org" Date: 01/19/2017 07:06 AM Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jan 19 18:21:18 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 19 Jan 2017 18:21:18 +0000 Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 In-Reply-To: References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> <51281598.14159900.1484805654772@mail.yahoo.com>, Message-ID: On some of our nodes we were regularly seeing procees hung timeouts in dmesg from a python process, which I vaguely thought was related to the monitoring process (though we have other python bits from openstack running on these boxes). These are all running 4.2.2.0 code Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mathias Dietz [MDIETZ at de.ibm.com] Sent: 19 January 2017 18:07 To: FC; gpfsug main discussion list Subject: Re: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Hi Farid, there is no official way for disabling the system health monitoring because other components rely on it (e.g. GUI, CES, Install Toolkit,..) If you are fine with the consequences you can just delete the mmsysmonitor.conf, which will prevent the monitor from starting. During our testing we did not see a significant performance impact caused by the monitoring. In 4.2.2 some component monitors (e.g. disk) have been further improved to reduce polling and use notifications instead. Nevertheless, I would like to better understand what the issue is. What kind of workload do you run ? Do you see spikes in CPU usage every 30 seconds ? Is it the same on all cluster nodes or just on some of them ? Could you send us the output of "mmhealth node show -v" to see which monitors are active. It might make sense to open a PMR to get this issue fixed. Thanks. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale - Release Lead Architect (4.2.X Release) System Health and Problem Determination Architect IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: FC To: "gpfsug-discuss at spectrumscale.org" Date: 01/19/2017 07:06 AM Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Thu Jan 19 21:22:40 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Thu, 19 Jan 2017 21:22:40 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz>, Message-ID: <1484860960203.43563@csiro.au> It's not something to do with the value of the GID, like being less or greater than some number? ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Friday, 20 January 2017 3:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x in my eyes.. that's the hint .. not to wait until all 700 clients 'll have been updated .. before open PMR .. ;-) ... From: Lukas Hejtmanek To: gpfsug main discussion list Date: 01/19/2017 05:37 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 19 21:51:07 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jan 2017 21:51:07 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <1484860960203.43563@csiro.au> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> Message-ID: <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> Hi All, Let me try to answer some questions that have been raised by various list members? 1. I am not using nscd. 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me Here?s another point ? I am 95% sure that I have a client that was running 4.2.1.1 and mmrepquota displayed the group names ? I then upgraded GPFS on it ? no other changes ? and now it?s mostly GID?s. I?m not 100% sure because output scrolled out of my terminal buffer. Thanks to all for the suggestions ? please feel free to keep them coming. To any of the GPFS team on this mailing list, at least one other person has reported the same behavior ? is this a known bug? Kevin On Jan 19, 2017, at 3:22 PM, Greg.Lehmann at csiro.au wrote: It's not something to do with the value of the GID, like being less or greater than some number? ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, 20 January 2017 3:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x in my eyes.. that's the hint .. not to wait until all 700 clients 'll have been updated .. before open PMR .. ;-) ... From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 01/19/2017 05:37 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at uni-mainz.de Fri Jan 20 08:41:26 2017 From: martin at uni-mainz.de (Christoph Martin) Date: Fri, 20 Jan 2017 09:41:26 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> Message-ID: Hi, I have a system with two servers with GPFS 4.2.1.2 on SLES 12.1 and some clients with GPFS 4.2.2.1 on SLES 11 and Centos 7. mmrepquota shows on all systems group names. I still have to upgrade the servers to 4.2.2.1. Christoph -- ============================================================================ Christoph Martin, Leiter Unix-Systeme Zentrum f?r Datenverarbeitung, Uni-Mainz, Germany Anselm Franz von Bentzel-Weg 12, 55128 Mainz Telefon: +49(6131)3926337 Instant-Messaging: Jabber: martin at uni-mainz.de (Siehe http://www.zdv.uni-mainz.de/4010.php) -------------- next part -------------- A non-text attachment was scrubbed... Name: martin.vcf Type: text/x-vcard Size: 421 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From Achim.Rehor at de.ibm.com Fri Jan 20 09:01:12 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Fri, 20 Jan 2017 10:01:12 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu><20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From farid.chabane at ymail.com Fri Jan 20 09:02:32 2017 From: farid.chabane at ymail.com (FC) Date: Fri, 20 Jan 2017 09:02:32 +0000 (UTC) Subject: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 In-Reply-To: References: <51281598.14159900.1484805654772.ref@mail.yahoo.com> <51281598.14159900.1484805654772@mail.yahoo.com> Message-ID: <1898813661.15589480.1484902952833@mail.yahoo.com> Hi Mathias, It's OK when we remove the configuration file, the process doens't start. The problem occurs mainly with our compute nodes (all of them) and we don't use GUI and CES. Ideed, I confirm we don't see performance impact with Linpack running on more than hundred nodes, it appears especially when there is a lot of communications wich is the case of our applications, our high speed network is based on Intel OmniPath Fabric. We are seeing irregular iteration time every 30 sec. By Enabling HyperThreading, the issue is a little bit hidden but still there. By using less cores per nodes (26 instead of 28), we don't see this behavior as if it needs one core for mmsysmon process. I agree with you, might be good idea to open a PMR... Please find below the output of mmhealth node show --verbose Node status:???????????? HEALTHY Component??????????????? Status?????????????????? Reasons ------------------------------------------------------------------- GPFS???????????????????? HEALTHY????????????????? - NETWORK????????????????? HEALTHY????????????????? - ? ib0????????????????????? HEALTHY????????????????? - FILESYSTEM?????????????? HEALTHY????????????????? - ? gpfs1??????????????????? HEALTHY????????????????? - ? gpfs2??????????????????? HEALTHY????????????????? - DISK???????????????????? HEALTHY????????????????? - Thanks Farid Le Jeudi 19 janvier 2017 19h21, Simon Thompson (Research Computing - IT Services) a ?crit : On some of our nodes we were regularly seeing procees hung timeouts in dmesg from a python process, which I vaguely thought was related to the monitoring process (though we have other python bits from openstack running on these boxes). These are all running 4.2.2.0 code Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mathias Dietz [MDIETZ at de.ibm.com] Sent: 19 January 2017 18:07 To: FC; gpfsug main discussion list Subject: Re: [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Hi Farid, there is no official way for disabling the system health monitoring because other components rely on it (e.g. GUI, CES, Install Toolkit,..) If you are fine with the consequences you can just delete the mmsysmonitor.conf, which will prevent the monitor from starting. During our testing we did not see a significant performance impact caused by the monitoring. In 4.2.2 some component monitors (e.g. disk) have been further improved to reduce polling and use notifications instead. Nevertheless, I would like to better understand what the issue is. What kind of workload do you run ? Do you see spikes in CPU usage every 30 seconds ? Is it the same on all cluster nodes or just on some of them ? Could you send us the output of "mmhealth node show -v" to see which monitors are active. It might make sense to open a PMR to get this issue fixed. Thanks. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale - Release Lead Architect (4.2.X Release) System Health and Problem Determination Architect IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From:? ? ? ? FC To:? ? ? ? "gpfsug-discuss at spectrumscale.org" Date:? ? ? ? 01/19/2017 07:06 AM Subject:? ? ? ? [gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 4.2.1.1 Sent by:? ? ? ? gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are facing performance issues with some of our applications due to the GPFS system monitoring (mmsysmon) on CentOS 7.2. Bad performances (increase of iteration time) are seen every 30s exactly as the occurence frequency of mmsysmon ; the default monitor interval set to 30s in /var/mmfs/mmsysmon/mmsysmonitor.conf Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with the command mmsysmoncontrol and we get a stable iteration time. What are the impacts of disabling this process except losing access to mmhealth commands ? Do you have an idea of a proper way to disable it for good without doing it in rc.local or increasing the monitoring interval in the configuration file ? Thanks, Farid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Fri Jan 20 09:45:04 2017 From: st.graf at fz-juelich.de (Stephan Graf) Date: Fri, 20 Jan 2017 10:45:04 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: Guten Morgen herr Rehor! Ich habe gerade geguckt. Auf dem Knoten, auf dem wir das mmlsquota -g Problem haben, sehe ich auch beim mmlsrepquota -g, dass einige Gruppen nur numerisch ausgegeben werden. Ich kann gerne einen PMR dazu ?ffnen. Viele Gr??e, Stephan Graf On 01/20/17 10:01, Achim Rehor wrote: fully agreed, there are PMRs open on "mmlsquota -g failes : no such group" where the handling of group names vs. ids is being tracked. a PMR on mmrepquota and a slightly different facette of a similar problem might give more and faster insight and solution. Mit freundlichen Gr??en / Kind regards Achim Rehor ________________________________ Software Technical Support Specialist AIX/ Emea HPC Support [cid:part1.A7833F18.D0EA2498 at fz-juelich.de] IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services ________________________________ Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke, Dieter Scholz, Gregor Pillen, Ivo Koerner, Christian Noll Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: Olaf Weiser/Germany/IBM at IBMDE To: gpfsug main discussion list Date: 01/19/2017 06:17 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ in my eyes.. that's the hint .. not to wait until all 700 clients 'll have been updated .. before open PMR .. ;-) ... From: Lukas Hejtmanek To: gpfsug main discussion list Date: 01/19/2017 05:37 PM Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota resolves only some of group names. On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > Hi Olaf, > > We will continue upgrading clients in a rolling fashion, but with ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t consider figuring out why this is happening a waste of time and therefore having systems on both versions is a good thing. > > While I would prefer not to paste actual group names and GIDs into this public forum, I can assure you that on every 4.2.1.1 system that I have tried this on: > > 1. mmrepquota reports mostly GIDs, only a few group names > 2. /etc/nsswitch.conf says to look at files first > 3. the GID is in /etc/group > 4. length of group name doesn?t matter > > I have a support contract with IBM, so I can open a PMR if necessary. I just thought someone on the list might have an idea as to what is happening or be able to point out the obvious explanation that I?m missing. ;-) > > Thanks? > > Kevin > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser > wrote: > > unfortunately , I don't own a cluster right now, which has 4.2.2 to double check... SpectrumScale should resolve the GID into a name, if it find the name somewhere... > > but in your case.. I would say.. before we waste to much time in a version-mismatch issue.. finish the rolling migration, especially RHEL .. and then we continue > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > cheers > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/19/2017 04:48 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi Olaf, > > The filesystem manager runs on one of our servers, all of which are upgraded to 4.2.2.x. > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf has ?files? listed first for /etc/group. > > In addition to a mixture of GPFS versions, we also have a mixture of OS versions (RHEL 6/7). AFAIK tell with all of my testing / experimenting the only factor that seems to change the behavior of mmrepquota in regards to GIDs versus group names is the GPFS version. > > Other ideas, anyone? Is anyone else in a similar situation and can test whether they see similar behavior? > > Thanks... > > Kevin > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser > wrote: > > have you checked, where th fsmgr runs as you have nodes with different code levels > > mmlsmgr > > > > > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Date: 01/18/2017 04:57 PM > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ________________________________ > > > > Hi All, > > We recently upgraded our cluster (well, the servers are all upgraded; the clients are still in progress) from GPFS 4.2.1.1 to GPFS 4.2.2.1 and there appears to be a change in how mmrepquota handles group names in its? output. I?m trying to get a handle on it, because it is messing with some of my scripts and - more importantly - because I don?t understand the behavior. > > From one of my clients which is still running GPFS 4.2.1.1 I can run an ?mmrepquota -g ? and if the group exists in /etc/group the group name is displayed. Of course, if the group doesn?t exist in /etc/group, the GID is displayed. Makes sense. > > However, on my servers which have been upgraded to GPFS 4.2.2.1 most - but not all - of the time I see GID numbers instead of group names. My question is, what is the criteria GPFS 4.2.2.x is using to decide when to display a GID instead of a group name? It?s apparently *not* the length of the name of the group, because I have output in front of me where a 13 character long group name is displayed but a 7 character long group name is *not* displayed - its? GID is instead (and yes, both exist in /etc/group). > > I know that sample output would be useful to illustrate this, but I do not want to post group names or GIDs to a public mailing list ? if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > I am in the process of updating scripts to use ?mmrepquota -gn ? and then looking up the group name myself, but I want to try to understand this. Thanks? > > Kevin > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Stephan Graf Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From st.graf at fz-juelich.de Fri Jan 20 10:22:09 2017 From: st.graf at fz-juelich.de (Stephan Graf) Date: Fri, 20 Jan 2017 11:22:09 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> Message-ID: Sorry for the mail. I just can tell, that we are facing the same issue: We run GPFS 4.1.1.11 & 4.2.1.2 In both versions the mmlsquota -g fails. I also tried the mmrepquota -g command on GPFS 4.2.1.2, and some groups are displayed only numerical. Stephan On 01/20/17 09:41, Christoph Martin wrote: Hi, I have a system with two servers with GPFS 4.2.1.2 on SLES 12.1 and some clients with GPFS 4.2.2.1 on SLES 11 and Centos 7. mmrepquota shows on all systems group names. I still have to upgrade the servers to 4.2.2.1. Christoph _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Stephan Graf Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Fri Jan 20 10:54:37 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Fri, 20 Jan 2017 11:54:37 +0100 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu><20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From duersch at us.ibm.com Fri Jan 20 14:14:23 2017 From: duersch at us.ibm.com (Steve Duersch) Date: Fri, 20 Jan 2017 09:14:23 -0500 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: Kevin, Please go ahead and open a PMR. Cursorily, we don't know of an obvious known bug. Thank you. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York gpfsug-discuss-bounces at spectrumscale.org wrote on 01/19/2017 04:52:02 PM: > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 01/19/2017 04:52 PM > Subject: gpfsug-discuss Digest, Vol 60, Issue 47 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: mmrepquota and group names in GPFS 4.2.2.x > (Buterbaugh, Kevin L) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 19 Jan 2017 21:51:07 +0000 > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS > 4.2.2.x > Message-ID: <31F584FD-A926-4D86-B365-63EA244DEE45 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > Let me try to answer some questions that have been raised by various > list members? > > 1. I am not using nscd. > 2. getent group with either a GID or a group name resolves GID?s / > names that are being printed as GIDs by mmrepquota > 3. The GID?s in question are all in a normal range ? i.e. some > group names that are being printed by mmrepquota have GIDs ?close? > to others that are being printed as GID?s > 4. strace?ing mmrepquota doesn?t show anything relating to nscd or > anything that jumps out at me > > Here?s another point ? I am 95% sure that I have a client that was > running 4.2.1.1 and mmrepquota displayed the group names ? I then > upgraded GPFS on it ? no other changes ? and now it?s mostly GID?s. > I?m not 100% sure because output scrolled out of my terminal buffer. > > Thanks to all for the suggestions ? please feel free to keep them > coming. To any of the GPFS team on this mailing list, at least one > other person has reported the same behavior ? is this a known bug? > > Kevin > > On Jan 19, 2017, at 3:22 PM, Greg.Lehmann at csiro.au< > mailto:Greg.Lehmann at csiro.au> wrote: > > > It's not something to do with the value of the GID, like being less > or greater than some number? > > ________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org discuss-bounces at spectrumscale.org> mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Olaf > Weiser > > Sent: Friday, 20 January 2017 3:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > > in my eyes.. that's the hint .. not to wait until all 700 clients > 'll have been updated .. before open PMR .. ;-) ... > > > > From: Lukas Hejtmanek >> > To: gpfsug main discussion list mailto:gpfsug-discuss at spectrumscale.org>> > Date: 01/19/2017 05:37 PM > Subject: Re: [gpfsug-discuss] mmrepquota and group names in > GPFS 4.2.2.x > Sent by: gpfsug-discuss-bounces at spectrumscale.org< > mailto:gpfsug-discuss-bounces at spectrumscale.org> > ________________________________ > > > > Just leting know, I see the same problem with 4.2.2.1 version. mmrepquota > resolves only some of group names. > > On Thu, Jan 19, 2017 at 04:25:20PM +0000, Buterbaugh, Kevin L wrote: > > Hi Olaf, > > > > We will continue upgrading clients in a rolling fashion, but with > ~700 of them, it?ll be a few weeks. And to me that?s good ? I don?t > consider figuring out why this is happening a waste of time and > therefore having systems on both versions is a good thing. > > > > While I would prefer not to paste actual group names and GIDs into > this public forum, I can assure you that on every 4.2.1.1 system > that I have tried this on: > > > > 1. mmrepquota reports mostly GIDs, only a few group names > > 2. /etc/nsswitch.conf says to look at files first > > 3. the GID is in /etc/group > > 4. length of group name doesn?t matter > > > > I have a support contract with IBM, so I can open a PMR if > necessary. I just thought someone on the list might have an idea as > to what is happening or be able to point out the obvious explanation > that I?m missing. ;-) > > > > Thanks? > > > > Kevin > > > > On Jan 19, 2017, at 10:05 AM, Olaf Weiser mailto:olaf.weiser at de.ibm.com>> wrote: > > > > unfortunately , I don't own a cluster right now, which has 4.2.2 > to double check... SpectrumScale should resolve the GID into a name, > if it find the name somewhere... > > > > but in your case.. I would say.. before we waste to much time in a > version-mismatch issue.. finish the rolling migration, especially > RHEL .. and then we continue > > meanwhile -I'll try to find a way for me here to setup up an 4.2.2. cluster > > cheers > > > > > > > > From: "Buterbaugh, Kevin L" mailto:Kevin.Buterbaugh at Vanderbilt.Edu> >> > > To: gpfsug main discussion list mailto:gpfsug-discuss at spectrumscale.org> discuss at spectrumscale.org>> > > Date: 01/19/2017 04:48 PM > > Subject: Re: [gpfsug-discuss] mmrepquota and group names in > GPFS 4.2.2.x > > Sent by: gpfsug-discuss-bounces at spectrumscale.org< > mailto:gpfsug-discuss-bounces at spectrumscale.org> discuss-bounces at spectrumscale.org> > > ________________________________ > > > > > > > > Hi Olaf, > > > > The filesystem manager runs on one of our servers, all of which > are upgraded to 4.2.2.x. > > > > Also, I didn?t mention this yesterday but our /etc/nsswitch.conf > has ?files? listed first for /etc/group. > > > > In addition to a mixture of GPFS versions, we also have a mixture > of OS versions (RHEL 6/7). AFAIK tell with all of my testing / > experimenting the only factor that seems to change the behavior of > mmrepquota in regards to GIDs versus group names is the GPFS version. > > > > Other ideas, anyone? Is anyone else in a similar situation and > can test whether they see similar behavior? > > > > Thanks... > > > > Kevin > > > > On Jan 19, 2017, at 2:45 AM, Olaf Weiser mailto:olaf.weiser at de.ibm.com>> wrote: > > > > have you checked, where th fsmgr runs as you have nodes with > different code levels > > > > mmlsmgr > > > > > > > > > > From: "Buterbaugh, Kevin L" mailto:Kevin.Buterbaugh at Vanderbilt.Edu> >> > > To: gpfsug main discussion list mailto:gpfsug-discuss at spectrumscale.org> discuss at spectrumscale.org>> > > Date: 01/18/2017 04:57 PM > > Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x > > Sent by: gpfsug-discuss-bounces at spectrumscale.org< > mailto:gpfsug-discuss-bounces at spectrumscale.org> discuss-bounces at spectrumscale.org> > > ________________________________ > > > > > > > > Hi All, > > > > We recently upgraded our cluster (well, the servers are all > upgraded; the clients are still in progress) from GPFS 4.2.1.1 to > GPFS 4.2.2.1 and there appears to be a change in how mmrepquota > handles group names in its? output. I?m trying to get a handle on > it, because it is messing with some of my scripts and - more > importantly - because I don?t understand the behavior. > > > > From one of my clients which is still running GPFS 4.2.1.1 I can > run an ?mmrepquota -g ? and if the group exists in /etc/group > the group name is displayed. Of course, if the group doesn?t exist > in /etc/group, the GID is displayed. Makes sense. > > > > However, on my servers which have been upgraded to GPFS 4.2.2.1 > most - but not all - of the time I see GID numbers instead of group > names. My question is, what is the criteria GPFS 4.2.2.x is using > to decide when to display a GID instead of a group name? It?s > apparently *not* the length of the name of the group, because I have > output in front of me where a 13 character long group name is > displayed but a 7 character long group name is *not* displayed - > its? GID is instead (and yes, both exist in /etc/group). > > > > I know that sample output would be useful to illustrate this, but > I do not want to post group names or GIDs to a public mailing list ? > if you want to know what those are, you?ll have to ask Vladimir Putin? ;-) > > > > I am in the process of updating scripts to use ?mmrepquota -gn > ? and then looking up the group name myself, but I want to try > to understand this. Thanks? > > > > Kevin > > > > > > ? > > Kevin Buterbaugh - Senior System Administrator > > Vanderbilt University - Advanced Computing Center for Research andEducation > > Kevin.Buterbaugh at vanderbilt.edu< > mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633 > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > Luk?? Hejtm?nek > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20170119/8e599938/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 60, Issue 47 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 20 14:33:23 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 20 Jan 2017 14:33:23 +0000 Subject: [gpfsug-discuss] Weird log message Message-ID: So today I was just trying to collect a gpfs.snap to log a ticket, and part way through the log collection it said: Month '12' out of range 0..11 at /usr/lpp/mmfs/bin/mmlogsort line 114. This is a cluster running 4.2.2.0 It carried on anyway so hardly worth me logging a ticket, but just in case someone want to pick it up internally in IBM ...? Simon From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jan 20 15:09:06 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 20 Jan 2017 15:09:06 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> Message-ID: <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> Hi Phil, Nope - that was the very first thought I had but on a 4.2.2.1 node I have a 13 character group name displaying and a resolvable 7 character long group name being displayed as its? GID? Kevin > On Jan 20, 2017, at 9:06 AM, Phil Pishioneri wrote: > > On 1/19/17 4:51 PM, Buterbaugh, Kevin L wrote: >> Hi All, >> >> Let me try to answer some questions that have been raised by various list members? >> >> 1. I am not using nscd. >> 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota >> 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s >> 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me >> > > Anything unique about the lengths of the names of the affected groups? (i.e., all a certain value, all greater than some value, etc.) > > -Phil From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jan 20 15:10:05 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 20 Jan 2017 15:10:05 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: References: Message-ID: <8F3B6E42-6B37-48DF-8870-0CC5F293DCF7@vanderbilt.edu> Steve, I just opened a PMR - thanks? Kevin On Jan 20, 2017, at 8:14 AM, Steve Duersch > wrote: Kevin, Please go ahead and open a PMR. Cursorily, we don't know of an obvious known bug. Thank you. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Jan 20 15:32:17 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Fri, 20 Jan 2017 10:32:17 -0500 Subject: [gpfsug-discuss] Path to NSD lost when host_sas_address changed on port Message-ID: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> We have most of our GPFS NSD storage set up as pairs of RAID boxes served by failover pairs of servers. Most of it is FibreChannel, but the newest four boxes and servers are using dual port SAS controllers. Just this week, we had one server lose one out of the paths to one of the raid boxes. Took a while to realize what happened, but apparently the port2 ID changed from 51866da05cf7b001 to 51866da05cf7b002 on the fly, without rebooting. Port1 is still 51866da05cf7b000, which is the card ID (host_add). We?re running gpfs 4.2.2.1 on RHEL7.2 on these hosts. Has anyone else seen this kind of behavior? First noticed these messages, 3 hours 13 minutes after boot: Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd The multipath daemon was sending lots of log messages like: Jan 10 13:49:22 storage043 multipathd: mpathw: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:64 1] Jan 10 13:49:22 storage043 multipathd: mpathaa: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:96 1] Jan 10 13:49:22 storage043 multipathd: mpathx: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:128 1] Currently worked around problem by including 00 01 and 02 for all 8 SAS cards when mapping LUN/volume to host groups. Thanks, ? ddj Dave Johnson Brown University CCV -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 20 15:43:56 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 20 Jan 2017 15:43:56 +0000 Subject: [gpfsug-discuss] SOBAR questions Message-ID: We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon From eric.wonderley at vt.edu Fri Jan 20 16:14:09 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 20 Jan 2017 11:14:09 -0500 Subject: [gpfsug-discuss] Path to NSD lost when host_sas_address changed on port In-Reply-To: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> References: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> Message-ID: Maybe multipath is not seeing all of the wwns? multipath -v3 | grep ^51855 look ok? For some unknown reason multipath does not see our sandisk array...we have to add them to the end of /etc/multipath/wwids file On Fri, Jan 20, 2017 at 10:32 AM, David D. Johnson wrote: > We have most of our GPFS NSD storage set up as pairs of RAID boxes served > by failover pairs of servers. > Most of it is FibreChannel, but the newest four boxes and servers are > using dual port SAS controllers. > Just this week, we had one server lose one out of the paths to one of the > raid boxes. Took a while > to realize what happened, but apparently the port2 ID changed from > 51866da05cf7b001 to > 51866da05cf7b002 on the fly, without rebooting. Port1 is still > 51866da05cf7b000, which is the card ID (host_add). > > We?re running gpfs 4.2.2.1 on RHEL7.2 on these hosts. > > Has anyone else seen this kind of behavior? > First noticed these messages, 3 hours 13 minutes after boot: > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from > build_and_issue_cmd > > The multipath daemon was sending lots of log messages like: > Jan 10 13:49:22 storage043 multipathd: mpathw: load table [0 4642340864 > multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 > 1 8:64 1] > Jan 10 13:49:22 storage043 multipathd: mpathaa: load table [0 4642340864 > multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 > 1 8:96 1] > Jan 10 13:49:22 storage043 multipathd: mpathx: load table [0 4642340864 > multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 > 1 8:128 1] > > Currently worked around problem by including 00 01 and 02 for all 8 SAS > cards when mapping LUN/volume to host groups. > > Thanks, > ? ddj > Dave Johnson > Brown University CCV > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Jan 20 16:27:30 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Fri, 20 Jan 2017 11:27:30 -0500 Subject: [gpfsug-discuss] Path to NSD lost when host_sas_address changed on port In-Reply-To: References: <5DDBFF8D-8927-42A7-8A81-3F0D167DDAAC@brown.edu> Message-ID: Actually, we can see all the Volume LUN WWNs such as 3600a098000a11f990000022457cf5091 1:0:0:0 sdb 8:16 14 undef ready DELL 3600a098000a0b4ea000001fd57cf50b2 1:0:0:1 sdc 8:32 9 undef ready DELL 3600a098000a11f990000024457cf576f 1:0:0:10 sdl 8:176 14 undef ready DELL (45 lines, 11 LUNs from each controller, each showing up twice, plus the boot volume) My problem involves the ID of the server's host adapter as seen by the 60 drive RAID box. [root at storage043 scsi]# lsscsi -Ht [0] megaraid_sas [1] mpt3sas sas:0x51866da05f388a00 [2] ahci sata: [3] ahci sata: [4] ahci sata: [5] ahci sata: [6] ahci sata: [7] ahci sata: [8] ahci sata: [9] ahci sata: [10] ahci sata: [11] ahci sata: [12] mpt3sas sas:0x51866da05cf7b000 Each card [1] and [12] is a dual port card. The address of the second port is not consistent. ? ddj > On Jan 20, 2017, at 11:14 AM, J. Eric Wonderley wrote: > > > Maybe multipath is not seeing all of the wwns? > > multipath -v3 | grep ^51855 look ok? > > For some unknown reason multipath does not see our sandisk array...we have to add them to the end of /etc/multipath/wwids file > > > On Fri, Jan 20, 2017 at 10:32 AM, David D. Johnson > wrote: > We have most of our GPFS NSD storage set up as pairs of RAID boxes served by failover pairs of servers. > Most of it is FibreChannel, but the newest four boxes and servers are using dual port SAS controllers. > Just this week, we had one server lose one out of the paths to one of the raid boxes. Took a while > to realize what happened, but apparently the port2 ID changed from 51866da05cf7b001 to > 51866da05cf7b002 on the fly, without rebooting. Port1 is still 51866da05cf7b000, which is the card ID (host_add). > > We?re running gpfs 4.2.2.1 on RHEL7.2 on these hosts. > > Has anyone else seen this kind of behavior? > First noticed these messages, 3 hours 13 minutes after boot: > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > Jan 10 13:15:53 storage043 kernel: megasas: Err returned from build_and_issue_cmd > > The multipath daemon was sending lots of log messages like: > Jan 10 13:49:22 storage043 multipathd: mpathw: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:64 1] > Jan 10 13:49:22 storage043 multipathd: mpathaa: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:96 1] > Jan 10 13:49:22 storage043 multipathd: mpathx: load table [0 4642340864 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:128 1] > > Currently worked around problem by including 00 01 and 02 for all 8 SAS cards when mapping LUN/volume to host groups. > > Thanks, > ? ddj > Dave Johnson > Brown University CCV > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From duersch at us.ibm.com Fri Jan 20 16:54:12 2017 From: duersch at us.ibm.com (Steve Duersch) Date: Fri, 20 Jan 2017 11:54:12 -0500 Subject: [gpfsug-discuss] Weird log message In-Reply-To: References: Message-ID: This is a known bug. It is fixed in 4.2.2.1. It does not impact any of the gathering of information. It impacts the sorting of the logs, but all the logs will be there. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York > > Message: 1 > Date: Fri, 20 Jan 2017 14:33:23 +0000 > From: "Simon Thompson (Research Computing - IT Services)" > > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [gpfsug-discuss] Weird log message > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > > So today I was just trying to collect a gpfs.snap to log a ticket, and > part way through the log collection it said: > > Month '12' out of range 0..11 at /usr/lpp/mmfs/bin/mmlogsort line 114. > > This is a cluster running 4.2.2.0 > > It carried on anyway so hardly worth me logging a ticket, but just in case > someone want to pick it up internally in IBM ...? > > Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Jan 20 16:57:56 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 20 Jan 2017 11:57:56 -0500 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaurang.tapase at in.ibm.com Fri Jan 20 18:04:45 2017 From: gaurang.tapase at in.ibm.com (Gaurang Tapase) Date: Fri, 20 Jan 2017 23:34:45 +0530 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: Hi Brian, For option #3, you can use GPFS Manila (OpenStack shared file system service) driver for exporting data from protocol servers to the OpenStack VMs. It was updated to support CES in the Newton release. A new feature of bringing existing filesets under Manila management has also been added recently. Thanks, Gaurang ------------------------------------------------------------------------ Gaurang S Tapase Spectrum Scale & OpenStack IBM India Storage Lab, Pune (India) Email : gaurang.tapase at in.ibm.com Phone : +91-20-42025699 (W), +91-9860082042(Cell) ------------------------------------------------------------------------- From: Brian Marshall To: gpfsug main discussion list Date: 01/18/2017 09:52 PM Subject: Re: [gpfsug-discuss] Mounting GPFS data on OpenStack VM Sent by: gpfsug-discuss-bounces at spectrumscale.org To answer some more questions: What sort of workload will your Nova VM's be running? This is largely TBD but we anticipate webapps and other non-batch ways of interacting with and post processing data that has been computed on HPC batch systems. For example a user might host a website that allows users to view pieces of a large data set and do some processing in private cloud or kick off larger jobs on HPC clusters How many VM's are you running? This work is still in the design / build phase. We have 48 servers slated for the project. At max maybe 500 VMs; again this is a pretty wild estimate. This is a new service we are looking to provide What is your Network interconnect between the Scale Storage cluster and the Nova Compute cluster Each nova node has a dual 10gigE connection to switches that uplink to our core 40 gigE switches were NSD Servers are directly connectly. The information so far has been awesome. Thanks everyone. I am definitely leaning towards option #3 of creating protocol servers. Are there any design/build white papers targetting the virutalization use case? Thanks, Brian On Tue, Jan 17, 2017 at 5:55 PM, Andrew Beattie wrote: HI Brian, Couple of questions for you: What sort of workload will your Nova VM's be running? How many VM's are you running? What is your Network interconnect between the Scale Storage cluster and the Nova Compute cluster I have cc'd Jake Carrol from University of Queensland in on the email as I know they have done some basic performance testing using Scale to provide storage to Openstack. One of the issues that they found was the Openstack network translation was a performance limiting factor. I think from memory the best performance scenario they had was, when they installed the scale client locally into the virtual machines Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Brian Marshall Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM Date: Wed, Jan 18, 2017 7:51 AM UG, I have a GPFS filesystem. I have a OpenStack private cloud. What is the best way for Nova Compute VMs to have access to data inside the GPFS filesystem? 1)Should VMs mount GPFS directly with a GPFS client? 2) Should the hypervisor mount GPFS and share to nova computes? 3) Should I create GPFS protocol servers that allow nova computes to mount of NFS? All advice is welcome. Best, Brian Marshall Virginia Tech _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Fri Jan 20 18:22:11 2017 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 20 Jan 2017 13:22:11 -0500 Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM In-Reply-To: References: Message-ID: Perfect. Thanks for the advice. Further: this might be a basic question: Are their design guides for building CES protocl servers? Brian On Fri, Jan 20, 2017 at 1:04 PM, Gaurang Tapase wrote: > Hi Brian, > > For option #3, you can use GPFS Manila (OpenStack shared file system > service) driver for exporting data from protocol servers to the OpenStack > VMs. > It was updated to support CES in the Newton release. > > A new feature of bringing existing filesets under Manila management has > also been added recently. > > Thanks, > Gaurang > ------------------------------------------------------------------------ > Gaurang S Tapase > Spectrum Scale & OpenStack > IBM India Storage Lab, Pune (India) > Email : gaurang.tapase at in.ibm.com > Phone : +91-20-42025699 <+91%2020%204202%205699> (W), +91-9860082042 > <+91%2098600%2082042>(Cell) > ------------------------------------------------------------------------- > > > > From: Brian Marshall > To: gpfsug main discussion list > Date: 01/18/2017 09:52 PM > Subject: Re: [gpfsug-discuss] Mounting GPFS data on OpenStack VM > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > To answer some more questions: > > What sort of workload will your Nova VM's be running? > This is largely TBD but we anticipate webapps and other non-batch ways of > interacting with and post processing data that has been computed on HPC > batch systems. For example a user might host a website that allows users > to view pieces of a large data set and do some processing in private cloud > or kick off larger jobs on HPC clusters > > How many VM's are you running? > This work is still in the design / build phase. We have 48 servers slated > for the project. At max maybe 500 VMs; again this is a pretty wild > estimate. This is a new service we are looking to provide > > What is your Network interconnect between the Scale Storage cluster and > the Nova Compute cluster > Each nova node has a dual 10gigE connection to switches that uplink to our > core 40 gigE switches were NSD Servers are directly connectly. > > The information so far has been awesome. Thanks everyone. I am > definitely leaning towards option #3 of creating protocol servers. Are > there any design/build white papers targetting the virutalization use case? > > Thanks, > Brian > > On Tue, Jan 17, 2017 at 5:55 PM, Andrew Beattie <*abeattie at au1.ibm.com* > > wrote: > HI Brian, > > > Couple of questions for you: > > What sort of workload will your Nova VM's be running? > How many VM's are you running? > What is your Network interconnect between the Scale Storage cluster and > the Nova Compute cluster > > I have cc'd Jake Carrol from University of Queensland in on the email as I > know they have done some basic performance testing using Scale to provide > storage to Openstack. > One of the issues that they found was the Openstack network translation > was a performance limiting factor. > > I think from memory the best performance scenario they had was, when they > installed the scale client locally into the virtual machines > > > *Andrew Beattie* > *Software Defined Storage - IT Specialist* > *Phone: *614-2133-7927 > *E-mail: **abeattie at au1.ibm.com* > > > ----- Original message ----- > From: Brian Marshall <*mimarsh2 at vt.edu* > > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Cc: > Subject: [gpfsug-discuss] Mounting GPFS data on OpenStack VM > Date: Wed, Jan 18, 2017 7:51 AM > > UG, > > I have a GPFS filesystem. > > I have a OpenStack private cloud. > > What is the best way for Nova Compute VMs to have access to data inside > the GPFS filesystem? > > 1)Should VMs mount GPFS directly with a GPFS client? > 2) Should the hypervisor mount GPFS and share to nova computes? > 3) Should I create GPFS protocol servers that allow nova computes to mount > of NFS? > > All advice is welcome. > > > Best, > Brian Marshall > Virginia Tech > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Fri Jan 20 22:23:07 2017 From: ulmer at ulmer.org (Stephen Ulmer) Date: Fri, 20 Jan 2017 17:23:07 -0500 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> Message-ID: <3D2CE694-2A3A-4B5E-8078-238A09681BE8@ulmer.org> My list of questions that might or might not be thought provoking: How about the relative position of the items in the /etc/group file? Are all of the failures later in the file than all of the successes? Do any groups have group passwords (parsing error due to ?different" line format)? Is the /etc/group sorted by either GID or group name (not normally required, but it would be interesting to see if it changed the problem)? Is the set that is translated versus not translated consistent or do they change? (Across all axes of comparison by {node, command invocation, et al.}) Are the not translated groups more or less likely to be the default group of the owning UID? Can you translate the GID other ways? Like with ls? (I think this was in the original problem description, but I don?t remember the answer.) What is you just turn of nscd? -- Stephen > On Jan 20, 2017, at 10:09 AM, Buterbaugh, Kevin L > wrote: > > Hi Phil, > > Nope - that was the very first thought I had but on a 4.2.2.1 node I have a 13 character group name displaying and a resolvable 7 character long group name being displayed as its? GID? > > Kevin > >> On Jan 20, 2017, at 9:06 AM, Phil Pishioneri > wrote: >> >> On 1/19/17 4:51 PM, Buterbaugh, Kevin L wrote: >>> Hi All, >>> >>> Let me try to answer some questions that have been raised by various list members? >>> >>> 1. I am not using nscd. >>> 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota >>> 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s >>> 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me >>> >> >> Anything unique about the lengths of the names of the affected groups? (i.e., all a certain value, all greater than some value, etc.) >> >> -Phil > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Fri Jan 20 22:37:15 2017 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Sat, 21 Jan 2017 08:37:15 +1000 Subject: [gpfsug-discuss] CES permissions Message-ID: Hi we have an existing configuration with a home - cache relationship on linked clusters, we are running CES on the cache cluster. When data is copied to an SMB share the the afm target for the cache is marked dirty and the replication back to the home cluster stops. both clusters are running 4.2.1 We have seen this behaviour whether the acls on the home cluster file system are nfsv4 only or posix and nfsv4 the cache cluster is nfsv4 only so that we can use CES on it for SMB. We are using uid remapping between the cache and the home can anyone suggest why the cache is marked dirty and how we can get around this issue the other thing we would like to do is force group and posix file permissions via samba but these are not supported options in the CES installation of samba any help is appreciated leslie Leslie Elliott, Infrastructure Support Specialist Information Technology Services, The University of Queensland -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Jan 23 01:10:14 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 22 Jan 2017 20:10:14 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? Message-ID: This is going to sound like a ridiculous request, but, is there a way to cause a filesystem to panic everywhere in one "swell foop"? I'm assuming the answer will come with an appropriate disclaimer of "don't ever do this, we don't support it, it might eat your data, summon cthulu, etc.". I swear I've seen the fs manager initiate this type of operation before. I can seem to do it on a per-node basis with "mmfsadm test panic " but if I do that over all 1k nodes in my test cluster at once it results in about 45 minutes of almost total deadlock while each panic is processed by the fs manager. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From abeattie at au1.ibm.com Mon Jan 23 01:16:58 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Mon, 23 Jan 2017 01:16:58 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Mon Jan 23 01:23:34 2017 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Sun, 22 Jan 2017 20:23:34 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: Message-ID: <142910.1485134614@turing-police.cc.vt.edu> On Sun, 22 Jan 2017 20:10:14 -0500, Aaron Knister said: > This is going to sound like a ridiculous request, but, is there a way to > cause a filesystem to panic everywhere in one "swell foop"? (...) > I can seem to do it on a per-node basis with "mmfsadm test panic > " but if I do that over all 1k nodes in my test cluster at > once it results in about 45 minutes of almost total deadlock while each > panic is processed by the fs manager. Sounds like you've already found the upper bound for panicking all at once. :) What exactly are you trying to do here? Force-dismount all over the cluster due to some urgent external condition (UPS fail, whatever)? And how much do you care about file system metadata consistency and/or pending data writes? (Be prepared to Think Outside The Box - the *fastest* way may be to use a controllable power strip in the rack and cut power to your fiber channel switches, isolating the storage *real* fast....) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Jan 23 01:31:06 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 22 Jan 2017 20:31:06 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: Message-ID: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> I was afraid someone would ask :) One possible use would be testing how monitoring reacts to and/or corrects stale filesystems. The use in my case is there's an issue we see quite often where a filesystem won't unmount when trying to shut down gpfs. Linux insists its still busy despite every process being killed on the node just about except init. It's a real pain because it complicates maintenance, requiring a reboot of some nodes prior to patching for example. I dug into it and it appears as though when this happens the filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm trying to debug it further but I need to actually be able to make the condition happen a few more times to debug it. A stripegroup panic isn't a surefire way but it's the only way I've found so far to trigger this behavior somewhat on demand. One way I've found to trigger a mass stripegroup panic is to induce what I call a "301 error": loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted by the system with return code 301 reason code 0 loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument and tickle a known race condition between nodes being expelled from the cluster and a manager node joining the cluster. When this happens it seems to cause a mass stripe group panic that's over in a few minutes. The trick there is that it doesn't happen every time I go through the exercise and when it does there's no guarantee the filesystem that panics is the one in use. If it's not an fs in use then it doesn't help me reproduce the error condition. I was trying to use the "mmfsadm test panic" command to try a more direct approach. Hope that helps shed some light. -Aaron On 1/22/17 8:16 PM, Andrew Beattie wrote: > Out of curiosity -- why would you want to? > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Aaron Knister > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Date: Mon, Jan 23, 2017 11:11 AM > > This is going to sound like a ridiculous request, but, is there a way to > cause a filesystem to panic everywhere in one "swell foop"? I'm assuming > the answer will come with an appropriate disclaimer of "don't ever do > this, we don't support it, it might eat your data, summon cthulu, etc.". > I swear I've seen the fs manager initiate this type of operation before. > > I can seem to do it on a per-node basis with "mmfsadm test panic > " but if I do that over all 1k nodes in my test cluster at > once it results in about 45 minutes of almost total deadlock while each > panic is processed by the fs manager. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at us.ibm.com Mon Jan 23 04:12:02 2017 From: oehmes at us.ibm.com (Sven Oehme) Date: Mon, 23 Jan 2017 04:12:02 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: What version of Scale/ GPFS code is this cluster on ? ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Aaron Knister To: Date: 01/23/2017 01:31 AM Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? Sent by: gpfsug-discuss-bounces at spectrumscale.org I was afraid someone would ask :) One possible use would be testing how monitoring reacts to and/or corrects stale filesystems. The use in my case is there's an issue we see quite often where a filesystem won't unmount when trying to shut down gpfs. Linux insists its still busy despite every process being killed on the node just about except init. It's a real pain because it complicates maintenance, requiring a reboot of some nodes prior to patching for example. I dug into it and it appears as though when this happens the filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm trying to debug it further but I need to actually be able to make the condition happen a few more times to debug it. A stripegroup panic isn't a surefire way but it's the only way I've found so far to trigger this behavior somewhat on demand. One way I've found to trigger a mass stripegroup panic is to induce what I call a "301 error": loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted by the system with return code 301 reason code 0 loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument and tickle a known race condition between nodes being expelled from the cluster and a manager node joining the cluster. When this happens it seems to cause a mass stripe group panic that's over in a few minutes. The trick there is that it doesn't happen every time I go through the exercise and when it does there's no guarantee the filesystem that panics is the one in use. If it's not an fs in use then it doesn't help me reproduce the error condition. I was trying to use the "mmfsadm test panic" command to try a more direct approach. Hope that helps shed some light. -Aaron On 1/22/17 8:16 PM, Andrew Beattie wrote: > Out of curiosity -- why would you want to? > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > > ----- Original message ----- > From: Aaron Knister > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Date: Mon, Jan 23, 2017 11:11 AM > > This is going to sound like a ridiculous request, but, is there a way to > cause a filesystem to panic everywhere in one "swell foop"? I'm assuming > the answer will come with an appropriate disclaimer of "don't ever do > this, we don't support it, it might eat your data, summon cthulu, etc.". > I swear I've seen the fs manager initiate this type of operation before. > > I can seem to do it on a per-node basis with "mmfsadm test panic > " but if I do that over all 1k nodes in my test cluster at > once it results in about 45 minutes of almost total deadlock while each > panic is processed by the fs manager. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Jan 23 04:22:38 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 22 Jan 2017 23:22:38 -0500 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: It's at 4.1.1.10. On 1/22/17 11:12 PM, Sven Oehme wrote: > What version of Scale/ GPFS code is this cluster on ? > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > was afraid someone would ask :) One possible use would beAaron Knister > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > possible use would be testing how monitoring reacts to and/or > > From: Aaron Knister > To: > Date: 01/23/2017 01:31 AM > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > I was afraid someone would ask :) > > One possible use would be testing how monitoring reacts to and/or > corrects stale filesystems. > > The use in my case is there's an issue we see quite often where a > filesystem won't unmount when trying to shut down gpfs. Linux insists > its still busy despite every process being killed on the node just about > except init. It's a real pain because it complicates maintenance, > requiring a reboot of some nodes prior to patching for example. > > I dug into it and it appears as though when this happens the > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > trying to debug it further but I need to actually be able to make the > condition happen a few more times to debug it. A stripegroup panic isn't > a surefire way but it's the only way I've found so far to trigger this > behavior somewhat on demand. > > One way I've found to trigger a mass stripegroup panic is to induce what > I call a "301 error": > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > by the system with return code 301 reason code 0 > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > and tickle a known race condition between nodes being expelled from the > cluster and a manager node joining the cluster. When this happens it > seems to cause a mass stripe group panic that's over in a few minutes. > The trick there is that it doesn't happen every time I go through the > exercise and when it does there's no guarantee the filesystem that > panics is the one in use. If it's not an fs in use then it doesn't help > me reproduce the error condition. I was trying to use the "mmfsadm test > panic" command to try a more direct approach. > > Hope that helps shed some light. > > -Aaron > > On 1/22/17 8:16 PM, Andrew Beattie wrote: >> Out of curiosity -- why would you want to? >> Andrew Beattie >> Software Defined Storage - IT Specialist >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com >> >> >> >> ----- Original message ----- >> From: Aaron Knister >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug main discussion list >> Cc: >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? >> Date: Mon, Jan 23, 2017 11:11 AM >> >> This is going to sound like a ridiculous request, but, is there a way to >> cause a filesystem to panic everywhere in one "swell foop"? I'm assuming >> the answer will come with an appropriate disclaimer of "don't ever do >> this, we don't support it, it might eat your data, summon cthulu, etc.". >> I swear I've seen the fs manager initiate this type of operation before. >> >> I can seem to do it on a per-node basis with "mmfsadm test panic >> " but if I do that over all 1k nodes in my test cluster at >> once it results in about 45 minutes of almost total deadlock while each >> panic is processed by the fs manager. >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Mon Jan 23 05:03:43 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 23 Jan 2017 05:03:43 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: Then i would suggest to move up to at least 4.2.1.LATEST , there is a high chance your problem might already be fixed. i see 2 potential area that got significant improvements , Token Manager recovery and Log Recovery, both are in latest 4.2.1 code enabled : 2 significant improvements on Token Recovery in 4.2.1 : 1. Extendible hashing for token hash table. This speeds up token lookup and thereby reduce tcMutex hold times for configurations with a large ratio of clients to token servers. 2. Cleaning up tokens held by failed nodes was making multiple passes over the whole token table, one for each failed node. The loops are now inverted, so it makes a single pass over the able, and for each token found, does cleanup for all failed nodes. there are multiple smaller enhancements beyond 4.2.1 but thats the minimum level you want to be. i have seen token recovery of 10's of minutes similar to what you described going down to a minute with this change. on Log Recovery - in case of an unclean unmount/shutdown of a node prior 4.2.1 the Filesystem manager would only recover one Log file at a time, using a single thread, with 4.2.1 this is now done with multiple threads and multiple log files in parallel . Sven On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister wrote: > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > ------------------------------------------ > > Sven Oehme > > Scalable Storage Research > > email: oehmes at us.ibm.com > > Phone: +1 (408) 824-8904 <(408)%20824-8904> > > IBM Almaden Research Lab > > ------------------------------------------ > > > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > > was afraid someone would ask :) One possible use would beAaron Knister > > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > > possible use would be testing how monitoring reacts to and/or > > > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > ------------------------------------------------------------------------ > > > > > > > > I was afraid someone would ask :) > > > > One possible use would be testing how monitoring reacts to and/or > > corrects stale filesystems. > > > > The use in my case is there's an issue we see quite often where a > > filesystem won't unmount when trying to shut down gpfs. Linux insists > > its still busy despite every process being killed on the node just about > > except init. It's a real pain because it complicates maintenance, > > requiring a reboot of some nodes prior to patching for example. > > > > I dug into it and it appears as though when this happens the > > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > > trying to debug it further but I need to actually be able to make the > > condition happen a few more times to debug it. A stripegroup panic isn't > > a surefire way but it's the only way I've found so far to trigger this > > behavior somewhat on demand. > > > > One way I've found to trigger a mass stripegroup panic is to induce what > > I call a "301 error": > > > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > > by the system with return code 301 reason code 0 > > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > > > and tickle a known race condition between nodes being expelled from the > > cluster and a manager node joining the cluster. When this happens it > > seems to cause a mass stripe group panic that's over in a few minutes. > > The trick there is that it doesn't happen every time I go through the > > exercise and when it does there's no guarantee the filesystem that > > panics is the one in use. If it's not an fs in use then it doesn't help > > me reproduce the error condition. I was trying to use the "mmfsadm test > > panic" command to try a more direct approach. > > > > Hope that helps shed some light. > > > > -Aaron > > > > On 1/22/17 8:16 PM, Andrew Beattie wrote: > >> Out of curiosity -- why would you want to? > >> Andrew Beattie > >> Software Defined Storage - IT Specialist > >> Phone: 614-2133-7927 > >> E-mail: abeattie at au1.ibm.com > >> > >> > >> > >> ----- Original message ----- > >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> To: gpfsug main discussion list > >> Cc: > >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > >> Date: Mon, Jan 23, 2017 11:11 AM > >> > >> This is going to sound like a ridiculous request, but, is there a > way to > >> cause a filesystem to panic everywhere in one "swell foop"? I'm > assuming > >> the answer will come with an appropriate disclaimer of "don't ever > do > >> this, we don't support it, it might eat your data, summon cthulu, > etc.". > >> I swear I've seen the fs manager initiate this type of operation > before. > >> > >> I can seem to do it on a per-node basis with "mmfsadm test panic > > >> " but if I do that over all 1k nodes in my test cluster > at > >> once it results in about 45 minutes of almost total deadlock while > each > >> panic is processed by the fs manager. > >> > >> -Aaron > >> > >> -- > >> Aaron Knister > >> NASA Center for Climate Simulation (Code 606.2) > >> Goddard Space Flight Center > >> (301) 286-2776 > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jan 23 05:27:53 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 23 Jan 2017 05:27:53 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: Aaron, hold a bit with the upgrade , i just got word that while 4.2.1+ most likely addresses the issues i mentioned, there was a defect in the initial release of the parallel log recovery code. i will get the exact minimum version you need to deploy and send another update to this thread. sven On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme wrote: > Then i would suggest to move up to at least 4.2.1.LATEST , there is a high > chance your problem might already be fixed. > > i see 2 potential area that got significant improvements , Token Manager > recovery and Log Recovery, both are in latest 4.2.1 code enabled : > > 2 significant improvements on Token Recovery in 4.2.1 : > > 1. Extendible hashing for token hash table. This speeds up token lookup > and thereby reduce tcMutex hold times for configurations with a large ratio > of clients to token servers. > 2. Cleaning up tokens held by failed nodes was making multiple passes > over the whole token table, one for each failed node. The loops are now > inverted, so it makes a single pass over the able, and for each token > found, does cleanup for all failed nodes. > > there are multiple smaller enhancements beyond 4.2.1 but thats the minimum > level you want to be. i have seen token recovery of 10's of minutes similar > to what you described going down to a minute with this change. > > on Log Recovery - in case of an unclean unmount/shutdown of a node prior > 4.2.1 the Filesystem manager would only recover one Log file at a time, > using a single thread, with 4.2.1 this is now done with multiple threads > and multiple log files in parallel . > > Sven > > On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister > wrote: > > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > ------------------------------------------ > > Sven Oehme > > Scalable Storage Research > > email: oehmes at us.ibm.com > > Phone: +1 (408) 824-8904 <(408)%20824-8904> > > IBM Almaden Research Lab > > ------------------------------------------ > > > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > > was afraid someone would ask :) One possible use would beAaron Knister > > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > > possible use would be testing how monitoring reacts to and/or > > > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > ------------------------------------------------------------------------ > > > > > > > > I was afraid someone would ask :) > > > > One possible use would be testing how monitoring reacts to and/or > > corrects stale filesystems. > > > > The use in my case is there's an issue we see quite often where a > > filesystem won't unmount when trying to shut down gpfs. Linux insists > > its still busy despite every process being killed on the node just about > > except init. It's a real pain because it complicates maintenance, > > requiring a reboot of some nodes prior to patching for example. > > > > I dug into it and it appears as though when this happens the > > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > > trying to debug it further but I need to actually be able to make the > > condition happen a few more times to debug it. A stripegroup panic isn't > > a surefire way but it's the only way I've found so far to trigger this > > behavior somewhat on demand. > > > > One way I've found to trigger a mass stripegroup panic is to induce what > > I call a "301 error": > > > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > > by the system with return code 301 reason code 0 > > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > > > and tickle a known race condition between nodes being expelled from the > > cluster and a manager node joining the cluster. When this happens it > > seems to cause a mass stripe group panic that's over in a few minutes. > > The trick there is that it doesn't happen every time I go through the > > exercise and when it does there's no guarantee the filesystem that > > panics is the one in use. If it's not an fs in use then it doesn't help > > me reproduce the error condition. I was trying to use the "mmfsadm test > > panic" command to try a more direct approach. > > > > Hope that helps shed some light. > > > > -Aaron > > > > On 1/22/17 8:16 PM, Andrew Beattie wrote: > >> Out of curiosity -- why would you want to? > >> Andrew Beattie > >> Software Defined Storage - IT Specialist > >> Phone: 614-2133-7927 > >> E-mail: abeattie at au1.ibm.com > >> > >> > >> > >> ----- Original message ----- > >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> To: gpfsug main discussion list > >> Cc: > >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > >> Date: Mon, Jan 23, 2017 11:11 AM > >> > >> This is going to sound like a ridiculous request, but, is there a > way to > >> cause a filesystem to panic everywhere in one "swell foop"? I'm > assuming > >> the answer will come with an appropriate disclaimer of "don't ever > do > >> this, we don't support it, it might eat your data, summon cthulu, > etc.". > >> I swear I've seen the fs manager initiate this type of operation > before. > >> > >> I can seem to do it on a per-node basis with "mmfsadm test panic > > >> " but if I do that over all 1k nodes in my test cluster > at > >> once it results in about 45 minutes of almost total deadlock while > each > >> panic is processed by the fs manager. > >> > >> -Aaron > >> > >> -- > >> Aaron Knister > >> NASA Center for Climate Simulation (Code 606.2) > >> Goddard Space Flight Center > >> (301) 286-2776 > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Mon Jan 23 05:40:25 2017 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Mon, 23 Jan 2017 05:40:25 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: I?ve also done the ?panic stripe group everywhere? trick on a test cluster for a large FPO filesystem solution. With FPO it?s not very hard to get a filesystem to become unmountable due to missing disks. Sometimes the best answer, especially in a scratch use-case, may be to throw the filesystem away and start again empty so that research can resume (even though there will be work loss and repeated effort for some). But the stuck mounts problem can make this a long-lived problem. In my case, I just repeatedly panic any nodes which continue to mount the filesystem and try mmdelfs until it works (usually takes a few attempts). In this case, I really don?t want/need the filesystem to be recovered. I just want the cluster to forget about it as quickly as possible. So far, in testing, the panic/destroy times aren?t bad, but I don?t have heavy user workloads running against it yet. It would be interesting to know if there were any shortcuts to skip SG manager reassignment and recovery attempts. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sven Oehme Sent: Monday, January 23, 2017 12:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? Aaron, hold a bit with the upgrade , i just got word that while 4.2.1+ most likely addresses the issues i mentioned, there was a defect in the initial release of the parallel log recovery code. i will get the exact minimum version you need to deploy and send another update to this thread. sven On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme > wrote: Then i would suggest to move up to at least 4.2.1.LATEST , there is a high chance your problem might already be fixed. i see 2 potential area that got significant improvements , Token Manager recovery and Log Recovery, both are in latest 4.2.1 code enabled : 2 significant improvements on Token Recovery in 4.2.1 : 1. Extendible hashing for token hash table. This speeds up token lookup and thereby reduce tcMutex hold times for configurations with a large ratio of clients to token servers. 2. Cleaning up tokens held by failed nodes was making multiple passes over the whole token table, one for each failed node. The loops are now inverted, so it makes a single pass over the able, and for each token found, does cleanup for all failed nodes. there are multiple smaller enhancements beyond 4.2.1 but thats the minimum level you want to be. i have seen token recovery of 10's of minutes similar to what you described going down to a minute with this change. on Log Recovery - in case of an unclean unmount/shutdown of a node prior 4.2.1 the Filesystem manager would only recover one Log file at a time, using a single thread, with 4.2.1 this is now done with multiple threads and multiple log files in parallel . Sven On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister > wrote: It's at 4.1.1.10. On 1/22/17 11:12 PM, Sven Oehme wrote: > What version of Scale/ GPFS code is this cluster on ? > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > was afraid someone would ask :) One possible use would beAaron Knister > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > possible use would be testing how monitoring reacts to and/or > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > I was afraid someone would ask :) > > One possible use would be testing how monitoring reacts to and/or > corrects stale filesystems. > > The use in my case is there's an issue we see quite often where a > filesystem won't unmount when trying to shut down gpfs. Linux insists > its still busy despite every process being killed on the node just about > except init. It's a real pain because it complicates maintenance, > requiring a reboot of some nodes prior to patching for example. > > I dug into it and it appears as though when this happens the > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > trying to debug it further but I need to actually be able to make the > condition happen a few more times to debug it. A stripegroup panic isn't > a surefire way but it's the only way I've found so far to trigger this > behavior somewhat on demand. > > One way I've found to trigger a mass stripegroup panic is to induce what > I call a "301 error": > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > by the system with return code 301 reason code 0 > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > and tickle a known race condition between nodes being expelled from the > cluster and a manager node joining the cluster. When this happens it > seems to cause a mass stripe group panic that's over in a few minutes. > The trick there is that it doesn't happen every time I go through the > exercise and when it does there's no guarantee the filesystem that > panics is the one in use. If it's not an fs in use then it doesn't help > me reproduce the error condition. I was trying to use the "mmfsadm test > panic" command to try a more direct approach. > > Hope that helps shed some light. > > -Aaron > > On 1/22/17 8:16 PM, Andrew Beattie wrote: >> Out of curiosity -- why would you want to? >> Andrew Beattie >> Software Defined Storage - IT Specialist >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com > >> >> >> >> ----- Original message ----- >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug main discussion list > >> Cc: >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? >> Date: Mon, Jan 23, 2017 11:11 AM >> >> This is going to sound like a ridiculous request, but, is there a way to >> cause a filesystem to panic everywhere in one "swell foop"? I'm assuming >> the answer will come with an appropriate disclaimer of "don't ever do >> this, we don't support it, it might eat your data, summon cthulu, etc.". >> I swear I've seen the fs manager initiate this type of operation before. >> >> I can seem to do it on a per-node basis with "mmfsadm test panic >> " but if I do that over all 1k nodes in my test cluster at >> once it results in about 45 minutes of almost total deadlock while each >> panic is processed by the fs manager. >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jan 23 10:17:03 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 23 Jan 2017 10:17:03 +0000 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: Hi Mark, Thanks. I get that using it to move to a new FS version is probably beyond design. But equally, I could easily see that having to support implementing the latest FS version is a strong requirement. I.e. In a DR situation say three years down the line, it would be a new FS of (say) 5.1.1, we wouldn't want to have to go back and find 4.1.1 code, nor would we necessarily be able to even run that version (as kernels and OSes move forward). That?s sorta also the situation where you don't want to suddenly have to run back to IBM support because your DR solution suddenly doesn't work like it says on the tin ;-) I can test 1 and 2 relatively easily, but 3 is a bit more difficult for us to test out as the FS we want to use SOBAR on is 4.2 already. Simon From: > on behalf of Marc A Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 20 January 2017 at 16:57 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SOBAR questions I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 23 15:32:41 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 23 Jan 2017 15:32:41 +0000 Subject: [gpfsug-discuss] mmrepquota and group names in GPFS 4.2.2.x In-Reply-To: <3D2CE694-2A3A-4B5E-8078-238A09681BE8@ulmer.org> References: <639AF4BD-0382-44EF-8D26-8F6149613A65@vanderbilt.edu> <20170119163642.oyjlsmkdpfygk2fj@ics.muni.cz> <1484860960203.43563@csiro.au> <31F584FD-A926-4D86-B365-63EA244DEE45@vanderbilt.edu> <791bb4d1-eb22-5ba5-9fcd-d7553aeebdc0@psu.edu> <566473A3-D5F1-4508-84AE-AE4B892C25B8@vanderbilt.edu> <3D2CE694-2A3A-4B5E-8078-238A09681BE8@ulmer.org> Message-ID: <031A80F6-B00B-4AF9-963B-98E61BC537B4@vanderbilt.edu> Hi All, Stephens? very first question below has led me to figure out what the problem is ? we have one group in /etc/group that has dozens and dozens of members ? any group above that in /etc/group gets printed as a name by mmrepquota; any group below it gets printed as a GID. Wasn?t there an identical bug in mmlsquota a while back? I will update the PMR I have open with IBM. Thanks to all who took the time to respond with suggestions. Kevin On Jan 20, 2017, at 4:23 PM, Stephen Ulmer > wrote: My list of questions that might or might not be thought provoking: How about the relative position of the items in the /etc/group file? Are all of the failures later in the file than all of the successes? Do any groups have group passwords (parsing error due to ?different" line format)? Is the /etc/group sorted by either GID or group name (not normally required, but it would be interesting to see if it changed the problem)? Is the set that is translated versus not translated consistent or do they change? (Across all axes of comparison by {node, command invocation, et al.}) Are the not translated groups more or less likely to be the default group of the owning UID? Can you translate the GID other ways? Like with ls? (I think this was in the original problem description, but I don?t remember the answer.) What is you just turn of nscd? -- Stephen On Jan 20, 2017, at 10:09 AM, Buterbaugh, Kevin L > wrote: Hi Phil, Nope - that was the very first thought I had but on a 4.2.2.1 node I have a 13 character group name displaying and a resolvable 7 character long group name being displayed as its? GID? Kevin On Jan 20, 2017, at 9:06 AM, Phil Pishioneri > wrote: On 1/19/17 4:51 PM, Buterbaugh, Kevin L wrote: Hi All, Let me try to answer some questions that have been raised by various list members? 1. I am not using nscd. 2. getent group with either a GID or a group name resolves GID?s / names that are being printed as GIDs by mmrepquota 3. The GID?s in question are all in a normal range ? i.e. some group names that are being printed by mmrepquota have GIDs ?close? to others that are being printed as GID?s 4. strace?ing mmrepquota doesn?t show anything relating to nscd or anything that jumps out at me Anything unique about the lengths of the names of the affected groups? (i.e., all a certain value, all greater than some value, etc.) -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jan 23 15:35:41 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 23 Jan 2017 10:35:41 -0500 Subject: [gpfsug-discuss] SOBAR questions In-Reply-To: References: Message-ID: Regarding back level file systems and testing... 1. Did you know that the mmcrfs command supports --version which allows you to create a back level file system? 2. If your concern is restoring from a SOBAR backup that was made a long while ago with an old version of GPFS/sobar... I'd say that should work... BUT I don't know for sure AND I'd caution that AFAIK (someone may correct me) Sobar is not intended for long term archiving of file systems. Personally ( IBM hat off ;-) ), for that I'd choose a standard, vendor-neutral archival format that is likely to be supported in the future.... My current understanding: Spectrum Scal SOBAR is for "disaster recovery" or "migrate/upgrade entire file system" -- where presumably you do Sobar backups on a regular schedule... and/or do one just before you begin an upgrade or migration to new hardware. --marc From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 01/23/2017 05:17 AM Subject: Re: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Mark, Thanks. I get that using it to move to a new FS version is probably beyond design. But equally, I could easily see that having to support implementing the latest FS version is a strong requirement. I.e. In a DR situation say three years down the line, it would be a new FS of (say) 5.1.1, we wouldn't want to have to go back and find 4.1.1 code, nor would we necessarily be able to even run that version (as kernels and OSes move forward). That?s sorta also the situation where you don't want to suddenly have to run back to IBM support because your DR solution suddenly doesn't work like it says on the tin ;-) I can test 1 and 2 relatively easily, but 3 is a bit more difficult for us to test out as the FS we want to use SOBAR on is 4.2 already. Simon From: on behalf of Marc A Kaplan Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Friday, 20 January 2017 at 16:57 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] SOBAR questions I worked on some aspects of SOBAR, but without studying and testing the commands - I'm not in a position right now to give simple definitive answers - having said that.... Generally your questions are reasonable and the answer is: "Yes it should be possible to do that, but you might be going a bit beyond the design point.., so you'll need to try it out on a (smaller) test system with some smaller tedst files. Point by point. 1. If SOBAR is unable to restore a particular file, perhaps because the premigration did not complete -- you should only lose that particular file, and otherwise "keep going". 2. I think SOBAR helps you build a similar file system to the original, including block sizes. So you'd have to go in and tweak the file system creation step(s). I think this is reasonable... If you hit a problem... IMO that would be a fair APAR. 3. Similar to 2. From: "Simon Thompson (Research Computing - IT Services)" < S.J.Thompson at bham.ac.uk> To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 01/20/2017 10:44 AM Subject: [gpfsug-discuss] SOBAR questions Sent by: gpfsug-discuss-bounces at spectrumscale.org We've recently been looking at deploying SOBAR to support DR of some of our file-systems, I have some questions (as ever!) that I can't see are clearly documented, so was wondering if anyone has any insight on this. 1. If we elect not to premigrate certain files, are we still able to use SOBAR? We are happy to take a hit that those files will never be available again, but some are multi TB files which change daily and we can't stream to tape effectively. 2. When doing a restore, does the block size of the new SOBAR'd to file-system have to match? For example the old FS was 1MB blocks, the new FS we create with 2MB blocks. Will this work (this strikes me as one way we might be able to migrate an FS to a new block size?)? 3. If the file-system was originally created with an older GPFS code but has since been upgraded, does restore work, and does it matter what client code? E.g. We have a file-system that was originally 3.5.x, its been upgraded over time to 4.2.2.0. Will this work if the client code was say 4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01 (3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file system version". Say there was 4.2.2.5 which created version 16.01 file-system as the new FS, what would happen? This sort of detail is missing from: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s cale.v4r22.doc/bl1adv_sobarrestore.htm But is probably quite important for us to know! Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jan 23 22:04:25 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 23 Jan 2017 22:04:25 +0000 Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? In-Reply-To: References: <0ee21a83-e9b7-6ab7-4abb-739d15e8d98f@nasa.gov> Message-ID: Hi, you either need to request access to GPFS 4.2.1.0 efix16 via your PMR or need to upgrade to 4.2.2.1 both contain the fixes required. Sven On Mon, Jan 23, 2017 at 6:27 AM Sven Oehme wrote: > Aaron, > > hold a bit with the upgrade , i just got word that while 4.2.1+ most > likely addresses the issues i mentioned, there was a defect in the initial > release of the parallel log recovery code. i will get the exact minimum > version you need to deploy and send another update to this thread. > > sven > > On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme wrote: > > Then i would suggest to move up to at least 4.2.1.LATEST , there is a high > chance your problem might already be fixed. > > i see 2 potential area that got significant improvements , Token Manager > recovery and Log Recovery, both are in latest 4.2.1 code enabled : > > 2 significant improvements on Token Recovery in 4.2.1 : > > 1. Extendible hashing for token hash table. This speeds up token lookup > and thereby reduce tcMutex hold times for configurations with a large ratio > of clients to token servers. > 2. Cleaning up tokens held by failed nodes was making multiple passes > over the whole token table, one for each failed node. The loops are now > inverted, so it makes a single pass over the able, and for each token > found, does cleanup for all failed nodes. > > there are multiple smaller enhancements beyond 4.2.1 but thats the minimum > level you want to be. i have seen token recovery of 10's of minutes similar > to what you described going down to a minute with this change. > > on Log Recovery - in case of an unclean unmount/shutdown of a node prior > 4.2.1 the Filesystem manager would only recover one Log file at a time, > using a single thread, with 4.2.1 this is now done with multiple threads > and multiple log files in parallel . > > Sven > > On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister > wrote: > > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > ------------------------------------------ > > Sven Oehme > > Scalable Storage Research > > email: oehmes at us.ibm.com > > Phone: +1 (408) 824-8904 <(408)%20824-8904> > > IBM Almaden Research Lab > > ------------------------------------------ > > > > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I > > was afraid someone would ask :) One possible use would beAaron Knister > > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One > > possible use would be testing how monitoring reacts to and/or > > > > From: Aaron Knister > > To: > > Date: 01/23/2017 01:31 AM > > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere? > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > ------------------------------------------------------------------------ > > > > > > > > I was afraid someone would ask :) > > > > One possible use would be testing how monitoring reacts to and/or > > corrects stale filesystems. > > > > The use in my case is there's an issue we see quite often where a > > filesystem won't unmount when trying to shut down gpfs. Linux insists > > its still busy despite every process being killed on the node just about > > except init. It's a real pain because it complicates maintenance, > > requiring a reboot of some nodes prior to patching for example. > > > > I dug into it and it appears as though when this happens the > > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm > > trying to debug it further but I need to actually be able to make the > > condition happen a few more times to debug it. A stripegroup panic isn't > > a surefire way but it's the only way I've found so far to trigger this > > behavior somewhat on demand. > > > > One way I've found to trigger a mass stripegroup panic is to induce what > > I call a "301 error": > > > > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted > > by the system with return code 301 reason code 0 > > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument > > > > and tickle a known race condition between nodes being expelled from the > > cluster and a manager node joining the cluster. When this happens it > > seems to cause a mass stripe group panic that's over in a few minutes. > > The trick there is that it doesn't happen every time I go through the > > exercise and when it does there's no guarantee the filesystem that > > panics is the one in use. If it's not an fs in use then it doesn't help > > me reproduce the error condition. I was trying to use the "mmfsadm test > > panic" command to try a more direct approach. > > > > Hope that helps shed some light. > > > > -Aaron > > > > On 1/22/17 8:16 PM, Andrew Beattie wrote: > >> Out of curiosity -- why would you want to? > >> Andrew Beattie > >> Software Defined Storage - IT Specialist > >> Phone: 614-2133-7927 > >> E-mail: abeattie at au1.ibm.com > >> > >> > >> > >> ----- Original message ----- > >> From: Aaron Knister > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> To: gpfsug main discussion list > >> Cc: > >> Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere? > >> Date: Mon, Jan 23, 2017 11:11 AM > >> > >> This is going to sound like a ridiculous request, but, is there a > way to > >> cause a filesystem to panic everywhere in one "swell foop"? I'm > assuming > >> the answer will come with an appropriate disclaimer of "don't ever > do > >> this, we don't support it, it might eat your data, summon cthulu, > etc.". > >> I swear I've seen the fs manager initiate this type of operation > before. > >> > >> I can seem to do it on a per-node basis with "mmfsadm test panic > > >> " but if I do that over all 1k nodes in my test cluster > at > >> once it results in about 45 minutes of almost total deadlock while > each > >> panic is processed by the fs manager. > >> > >> -Aaron > >> > >> -- > >> Aaron Knister > >> NASA Center for Climate Simulation (Code 606.2) > >> Goddard Space Flight Center > >> (301) 286-2776 > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jan 24 10:00:42 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 24 Jan 2017 10:00:42 +0000 Subject: [gpfsug-discuss] Manager nodes Message-ID: We are looking at moving manager processes off our NSD nodes and on to dedicated quorum/manager nodes. Are there some broad recommended hardware specs for the function of these nodes. I assume they benefit from having high memory (for some value of high, probably a function of number of clients, files, expected open files?, and probably completely incalculable, so some empirical evidence may be useful here?) (I'm going to ignore the docs that say you should have twice as much swap as RAM!) What about cores, do they benefit from high core counts or high clock rates? For example would I benefit more form a high core count, low clock speed, or going for higher clock speeds and reducing core count? Or is memory bandwidth more important for manager nodes? Connectivity, does token management run over IB or only over Ethernet/admin network? I.e. Should I bother adding IB cards, or just have fast Ethernet on them (my clients/NSDs all have IB). I'm looking for some hints on what I would most benefit in investing in vs keeping to budget. Thanks Simon From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jan 24 15:18:09 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 24 Jan 2017 15:18:09 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: References: Message-ID: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From janfrode at tanso.net Tue Jan 24 15:51:05 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 24 Jan 2017 15:51:05 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> Message-ID: Just some datapoints, in hope that it helps.. I've seen metadata performance improvements by turning down hyperthreading from 8/core to 4/core on Power8. Also it helped distributing the token managers over multiple nodes (6+) instead of fewer. I would expect this to flow over IP, not IB. -jf tir. 24. jan. 2017 kl. 16.18 skrev Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu>: Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jan 24 16:34:16 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 24 Jan 2017 16:34:16 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu>, Message-ID: Thanks both. I was thinking of adding 4 (we have a storage cluster over two DC's, so was planning to put two in each and use them as quorum nodes as well plus one floating VM to guarantee only one sitr is quorate in the event of someone cutting a fibre...) We pretty much start at 128GB ram and go from there, so this sounds fine. Would be good if someone could comment on if token traffic goes via IB or Ethernet, maybe I can save myself a few EDR cards... Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net] Sent: 24 January 2017 15:51 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Manager nodes Just some datapoints, in hope that it helps.. I've seen metadata performance improvements by turning down hyperthreading from 8/core to 4/core on Power8. Also it helped distributing the token managers over multiple nodes (6+) instead of fewer. I would expect this to flow over IP, not IB. -jf tir. 24. jan. 2017 kl. 16.18 skrev Buterbaugh, Kevin L >: Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) > wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Tue Jan 24 16:53:24 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 24 Jan 2017 16:53:24 +0000 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu>, Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06544544@CHI-EXCHANGEW1.w2k.jumptrading.com> It goes over IP, and that could be IPoIB if you have the daemon interface or subnets configured that way, but it will go over native IB VERBS if you have rdmaVerbsSend enabled (not recommended for large clusters). verbsRdmaSend Enables or disables the use of InfiniBand RDMA rather than TCP for most GPFS daemon-to-daemon communication. When disabled, only data transfers between an NSD client and NSD server are eligible for RDMA. Valid values are enable or disable. The default value is disable. The verbsRdma option must be enabled for verbsRdmaSend to have any effect. HTH, -B -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, January 24, 2017 10:34 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Manager nodes Thanks both. I was thinking of adding 4 (we have a storage cluster over two DC's, so was planning to put two in each and use them as quorum nodes as well plus one floating VM to guarantee only one sitr is quorate in the event of someone cutting a fibre...) We pretty much start at 128GB ram and go from there, so this sounds fine. Would be good if someone could comment on if token traffic goes via IB or Ethernet, maybe I can save myself a few EDR cards... Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net] Sent: 24 January 2017 15:51 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Manager nodes Just some datapoints, in hope that it helps.. I've seen metadata performance improvements by turning down hyperthreading from 8/core to 4/core on Power8. Also it helped distributing the token managers over multiple nodes (6+) instead of fewer. I would expect this to flow over IP, not IB. -jf tir. 24. jan. 2017 kl. 16.18 skrev Buterbaugh, Kevin L >: Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ... ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU's and 64 GB RAM. They've got 10 GbE, but we don't use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don't have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL... Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) > wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From UWEFALKE at de.ibm.com Tue Jan 24 17:36:22 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 24 Jan 2017 18:36:22 +0100 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu> Message-ID: Hi, Kevin, I'd look for more cores on the expense of clock speed. You send data over routes involving much higher latencies than your CPU-memory combination has even in the slowest available clock rate, but GPFS with its multi-threaded appoach is surely happy if it can start a few more threads. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 01/24/2017 04:18 PM Subject: Re: [gpfsug-discuss] Manager nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Simon, FWIW, we have two servers dedicated to cluster and filesystem management functions (and 8 NSD servers). I guess you would describe our cluster as small to medium sized ? ~700 nodes and a little over 1 PB of storage. Our two managers have 2 quad core (3 GHz) CPU?s and 64 GB RAM. They?ve got 10 GbE, but we don?t use IB anywhere. We have an 8 Gb FC SAN and we do have them connected in to the SAN so that they don?t have to ask the NSD servers to do any I/O for them. I do collect statistics on all the servers and plunk them into an RRDtool database. Looking at the last 30 days the load average on the two managers is in the 5-10 range. Memory utilization seems to be almost entirely dependent on how parameters like the pagepool are set on them. HTHAL? Kevin > On Jan 24, 2017, at 4:00 AM, Simon Thompson (Research Computing - IT Services) wrote: > > We are looking at moving manager processes off our NSD nodes and on to > dedicated quorum/manager nodes. > > Are there some broad recommended hardware specs for the function of these > nodes. > > I assume they benefit from having high memory (for some value of high, > probably a function of number of clients, files, expected open files?, and > probably completely incalculable, so some empirical evidence may be useful > here?) (I'm going to ignore the docs that say you should have twice as > much swap as RAM!) > > What about cores, do they benefit from high core counts or high clock > rates? For example would I benefit more form a high core count, low clock > speed, or going for higher clock speeds and reducing core count? Or is > memory bandwidth more important for manager nodes? > > Connectivity, does token management run over IB or only over > Ethernet/admin network? I.e. Should I bother adding IB cards, or just have > fast Ethernet on them (my clients/NSDs all have IB). > > I'm looking for some hints on what I would most benefit in investing in vs > keeping to budget. > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Tue Jan 24 19:48:02 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 24 Jan 2017 19:48:02 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes Message-ID: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. From Achim.Rehor at de.ibm.com Wed Jan 25 08:58:58 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 25 Jan 2017 09:58:58 +0100 Subject: [gpfsug-discuss] Manager nodes In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06544544@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <2EF90E19-F0DB-45B9-8AEE-0213C87FA3AF@vanderbilt.edu>, <21BC488F0AEA2245B2C3E83FC0B33DBB06544544@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Jan 25 11:30:00 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 25 Jan 2017 12:30:00 +0100 Subject: [gpfsug-discuss] snapshots Message-ID: <20170125113000.lwvzpekzjsjvghx5@ics.muni.cz> Hello, is there a way to get number of inodes consumed by a particular snapshot? I have a fileset with separate inodespace: Filesets in file system 'vol1': Name Status Path InodeSpace MaxInodes AllocInodes UsedInodes export Linked /gpfs/vol1/export 1 300000256 300000256 157515747 and it reports no space left on device. It seems that inodes consumed by fileset snapshots are not accounted under usedinodes. So can I somehow check how many inodes are consumed by snapshots? The 'no space left on device' IS caused by exhausted inodes, I can store more data into existing files and if I increase the inode limit, I can create new files. -- Luk?? Hejtm?nek From r.sobey at imperial.ac.uk Wed Jan 25 16:08:27 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 25 Jan 2017 16:08:27 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors Message-ID: Hoping someone can show me what should be obvious. I've got an LROC device configured but I want to see stats for it in the GUI: 1) On the CES node itself I've modified ZIMonSensors.cfg and under the GPFSLROC section changed it to 10: { name = "GPFSLROC" period = 10 }, 2) On the CES node restarted pmsensors. 3) On the collector node restarted pmcollector. But I can't find anywhere in the GUI that lets me look at anything LROC related. Anyone got this working? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jan 25 20:25:19 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 20:25:19 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: References: Message-ID: Richard, there are no exposures of LROC counters in the Scale GUI. you need to use the grafana bridge to get graphs or the command line tools to query the data in text format. Sven On Wed, Jan 25, 2017 at 5:08 PM Sobey, Richard A wrote: > Hoping someone can show me what should be obvious. I?ve got an LROC device > configured but I want to see stats for it in the GUI: > > > > 1) On the CES node itself I?ve modified ZIMonSensors.cfg and under > the GPFSLROC section changed it to 10: > > > > { > > name = "GPFSLROC" > > period = 10 > > }, > > > > 2) On the CES node restarted pmsensors. > > 3) On the collector node restarted pmcollector. > > > > But I can?t find anywhere in the GUI that lets me look at anything LROC > related. > > > > Anyone got this working? > > > > Cheers > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jan 25 20:45:05 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 25 Jan 2017 20:45:05 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors Message-ID: <0CDC969E-7CB9-4B4E-9AAA-1BF9193BF7E2@nuance.com> For the Zimon ?GPFSLROC?, what metrics can Grafana query, I don?t see them documented or exposed anywhere: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adv_listofmetricsPMT.htm Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Wednesday, January 25, 2017 at 2:25 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] LROC Zimon sensors Richard, there are no exposures of LROC counters in the Scale GUI. you need to use the grafana bridge to get graphs or the command line tools to query the data in text format. Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 25 20:50:28 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 25 Jan 2017 14:50:28 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: Message-ID: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From oehmes at gmail.com Wed Jan 25 21:00:03 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:00:03 +0000 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: Message-ID: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil wrote: > Hello all, > > We are having an issue where the LROC on a CES node gets overrun 100% > utilized. Processes then start to backup waiting for the LROC to > return data. Any way to have the GPFS client go direct if LROC gets to > busy? > > Thanks > Matt > > ________________________________ > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jan 25 21:01:11 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 25 Jan 2017 21:01:11 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: References: , Message-ID: Ok Sven thanks, looks like I'll be checking out grafana. Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: 25 January 2017 20:25 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] LROC Zimon sensors Richard, there are no exposures of LROC counters in the Scale GUI. you need to use the grafana bridge to get graphs or the command line tools to query the data in text format. Sven On Wed, Jan 25, 2017 at 5:08 PM Sobey, Richard A > wrote: Hoping someone can show me what should be obvious. I've got an LROC device configured but I want to see stats for it in the GUI: 1) On the CES node itself I've modified ZIMonSensors.cfg and under the GPFSLROC section changed it to 10: { name = "GPFSLROC" period = 10 }, 2) On the CES node restarted pmsensors. 3) On the collector node restarted pmcollector. But I can't find anywhere in the GUI that lets me look at anything LROC related. Anyone got this working? Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jan 25 21:06:12 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:06:12 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: <0CDC969E-7CB9-4B4E-9AAA-1BF9193BF7E2@nuance.com> References: <0CDC969E-7CB9-4B4E-9AAA-1BF9193BF7E2@nuance.com> Message-ID: Hi, i guess thats a docu gap, i will send a email trying to get this fixed. here is the list of sensors : [image: pasted1] i hope most of them are self explaining given the others are documented , if not let me know and i clarify . sven On Wed, Jan 25, 2017 at 9:45 PM Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > For the Zimon ?GPFSLROC?, what metrics can Grafana query, I don?t see them > documented or exposed anywhere: > > > > > http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adv_listofmetricsPMT.htm > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > > > *From: * on behalf of Sven > Oehme > *Reply-To: *gpfsug main discussion list > *Date: *Wednesday, January 25, 2017 at 2:25 PM > *To: *"gpfsug-discuss at spectrumscale.org" > > *Subject: *[EXTERNAL] Re: [gpfsug-discuss] LROC Zimon sensors > > > > Richard, > > > > there are no exposures of LROC counters in the Scale GUI. you need to use > the grafana bridge to get graphs or the command line tools to query the > data in text format. > > > > Sven > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pasted1 Type: image/png Size: 283191 bytes Desc: not available URL: From oehmes at gmail.com Wed Jan 25 21:08:02 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:08:02 +0000 Subject: [gpfsug-discuss] LROC Zimon sensors In-Reply-To: References: Message-ID: start here : https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/IBM%20Spectrum%20Scale%20Performance%20Monitoring%20Bridge On Wed, Jan 25, 2017 at 10:01 PM Sobey, Richard A wrote: > Ok Sven thanks, looks like I'll be checking out grafana. > > > Richard > > > ------------------------------ > *From:* gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < > oehmes at gmail.com> > *Sent:* 25 January 2017 20:25 > *To:* gpfsug-discuss at spectrumscale.org > *Subject:* Re: [gpfsug-discuss] LROC Zimon sensors > > Richard, > > there are no exposures of LROC counters in the Scale GUI. you need to use > the grafana bridge to get graphs or the command line tools to query the > data in text format. > > Sven > > > On Wed, Jan 25, 2017 at 5:08 PM Sobey, Richard A > wrote: > > Hoping someone can show me what should be obvious. I?ve got an LROC device > configured but I want to see stats for it in the GUI: > > > > 1) On the CES node itself I?ve modified ZIMonSensors.cfg and under > the GPFSLROC section changed it to 10: > > > > { > > name = "GPFSLROC" > > period = 10 > > }, > > > > 2) On the CES node restarted pmsensors. > > 3) On the collector node restarted pmcollector. > > > > But I can?t find anywhere in the GUI that lets me look at anything LROC > related. > > > > Anyone got this working? > > > > Cheers > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 25 21:20:21 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 25 Jan 2017 15:20:21 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: Message-ID: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jan 25 21:29:50 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 25 Jan 2017 21:29:50 +0000 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> Message-ID: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil wrote: > > > On 1/25/17 3:00 PM, Sven Oehme wrote: > > Matt, > > the assumption was that the remote devices are slower than LROC. there is > some attempts in the code to not schedule more than a maximum numbers of > outstanding i/os to the LROC device, but this doesn't help in all cases and > is depending on what kernel level parameters for the device are set. the > best way is to reduce the max size of data to be cached into lroc. > > I just turned LROC file caching completely off. most if not all of the IO > is metadata. Which is what I wanted to keep fast. It is amazing once you > drop the latency the IO's go up way more than they ever where before. I > guess we will need another nvme. > > > sven > > > On Wed, Jan 25, 2017 at 9:50 PM Matt Weil wrote: > > Hello all, > > We are having an issue where the LROC on a CES node gets overrun 100% > utilized. Processes then start to backup waiting for the LROC to > return data. Any way to have the GPFS client go direct if LROC gets to > busy? > > Thanks > Matt > > ________________________________ > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Jan 25 21:51:43 2017 From: mweil at wustl.edu (Matt Weil) Date: Wed, 25 Jan 2017 15:51:43 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> Message-ID: [ces1,ces2,ces3] maxStatCache 80000 worker1Threads 2000 maxFilesToCache 500000 pagepool 100G maxStatCache 80000 lrocData no 378G system memory. On 1/25/17 3:29 PM, Sven Oehme wrote: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? yes data I just turned off also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil > wrote: On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Jan 26 15:37:54 2017 From: mweil at wustl.edu (Matt Weil) Date: Thu, 26 Jan 2017 09:37:54 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> Message-ID: <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> 100% utilized are bursts above 200,000 IO's. Any way to tell ganesha.nfsd to cache more? On 1/25/17 3:51 PM, Matt Weil wrote: [ces1,ces2,ces3] maxStatCache 80000 worker1Threads 2000 maxFilesToCache 500000 pagepool 100G maxStatCache 80000 lrocData no 378G system memory. On 1/25/17 3:29 PM, Sven Oehme wrote: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? yes data I just turned off also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil > wrote: On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 26 17:15:56 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 26 Jan 2017 17:15:56 +0000 Subject: [gpfsug-discuss] mmlsquota output question Message-ID: <73AC6907-90BD-447F-9F72-4B7CBBFE2321@vanderbilt.edu> Hi All, We had 3 local GPFS filesystems on our cluster ? let?s call them gpfs0, gpfs1, and gpfs2. gpfs0 is for project space (i.e. groups can buy quota in 1 TB increments there). gpfs1 is scratch and gpfs2 is home. We are combining gpfs0 and gpfs1 into one new filesystem (gpfs3) ? we?re doing this for multiple reasons that aren?t really pertinent to my question here, but suffice it to say I have discussed our plan with some of IBM?s GPFS people and they agree that it?s the thing for us to do. gpfs3 will have a scratch fileset with no fileset quota, but user and group quotas (just like the gpfs1 filesystem currently has). We will also move all the filesets from gpfs0 over to gpfs3 - those use fileset quotas only - no user or group quotas. I have created the new gpfs3 filesystem, the scratch fileset within it, and one of the project filesets coming over from gpfs0. I?ve also moved my scratch directory to the gpfs3 scratch fileset. When I run mmlsquota I see (please note, I?ve changed names of things to protect the guilty): kevin at gateway: mmlsquota -u kevin --block-size auto Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs0 USR no limits Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs1 USR 2.008G 50G 200G 0 none | 3 100000 1000000 0 none Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs2 USR 11.69G 25G 35G 0 none | 8453 100000 200000 0 none Block Limits | File Limits Filesystem Fileset type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks gpfs3 root USR no limits gpfs3 scratch USR 31.04G 50G 200G 0 none | 2134 200000 1000000 0 none gpfs3 fakegroup USR no limits kevin at gateway: My question is this ? why am I seeing the ?root? and ?fakegroup? filesets listed in the output for gpfs3? They don?t show up for gpfs0 and the also exist there. Is it possibly because there are no user quotas whatsoever for gpfs0 and there are user quotas on the gpfs3:scratch fileset? If so, that still doesn?t make sense as to why mmlsquota would think it needs to show the filesets within that filesystem that don?t have user quotas. In fact, we don?t *want* that to happen, as we have certain groups that deal with various types of restricted data and we?d prefer that their existence not be advertised to everyone on the cluster. Oh, we?re still in the process of upgrading clients on our cluster, but this output is from a client running 4.2.2.1, in case that matters. Thanks all... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Jan 26 20:20:00 2017 From: mweil at wustl.edu (Matt Weil) Date: Thu, 26 Jan 2017 14:20:00 -0600 Subject: [gpfsug-discuss] LROC nvme small IO size 4 k In-Reply-To: <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> Message-ID: I still see small 4k IO's going to the nvme device after changing the max_sectors_kb. Writes did increase from 64 to 512. Is that a nvme limitation. > [root at ces1 system]# cat /sys/block/nvme0n1/queue/read_ahead_kb > 8192 > [root at ces1 system]# cat /sys/block/nvme0n1/queue/nr_requests > 512 > [root at ces1 system]# cat /sys/block/nvme0n1/queue/max_sectors_kb > 8192 > [root at ces1 system]# collectl -sD --dskfilt=nvme0n1 > waiting for 1 second sample... > > # DISK STATISTICS (/sec) > # > <---------reads---------><---------writes---------><--------averages--------> > Pct > #Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize > QLen Wait SvcTim Util > nvme0n1 47187 0 11K 4 30238 0 59 512 > 6 8 0 0 34 > nvme0n1 61730 0 15K 4 14321 0 28 512 > 4 9 0 0 45 ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From Robert.Oesterlin at nuance.com Fri Jan 27 00:57:05 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 27 Jan 2017 00:57:05 +0000 Subject: [gpfsug-discuss] Waiter identification help - Quota related Message-ID: OK, I have a sick cluster, and it seems to be tied up with quota related RPCs like this. Any help in narrowing down what the issue is? Waiting 3.8729 sec since 19:54:09, monitored, thread 32786 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.3158 sec since 19:54:08, monitored, thread 32771 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.3173 sec since 19:54:08, monitored, thread 35829 Msg handler quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.4619 sec since 19:54:08, monitored, thread 9694 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.4967 sec since 19:54:08, monitored, thread 32357 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.6885 sec since 19:54:08, monitored, thread 32305 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.7123 sec since 19:54:08, monitored, thread 32261 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 4.7932 sec since 19:54:08, monitored, thread 53409 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.2954 sec since 19:54:07, monitored, thread 32905 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3058 sec since 19:54:07, monitored, thread 32573 Msg handler quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3207 sec since 19:54:07, monitored, thread 32397 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3274 sec since 19:54:07, monitored, thread 32897 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3343 sec since 19:54:07, monitored, thread 32691 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3347 sec since 19:54:07, monitored, thread 32364 Msg handler quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Waiting 5.3348 sec since 19:54:07, monitored, thread 32522 Msg handler quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason 'waiting for WA lock' Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jan 27 01:26:49 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 26 Jan 2017 20:26:49 -0500 Subject: [gpfsug-discuss] Waiter identification help - Quota related In-Reply-To: References: Message-ID: <49f984fc-4881-60fd-88a0-29701ce4ea73@nasa.gov> This might be a stretch but do you happen to have a user/fileset/group over it's hard quota or soft quota + grace period? We've had this really upset our cluster before. At least with 3.5 each op that's done against an over quota user/group/fileset results in at least one rpc from the fs manager to every node in the cluster. Are those waiters from an fs manager node? If so perhaps briefly fire up tracing (/usr/lpp/mmfs/bin/mmtrace start) let it run for ~10 seconds then stop it (/usr/lpp/mmfs/bin/mmtrace stop) then grep for "TRACE_QUOTA" out of the resulting trcrpt file. If you see a bunch of lines that contain: TRACE_QUOTA: qu.server revoke reply type that might be what's going on. You can also see the behavior if you look at the output of mmdiag --network on your fs manager nodes and see a bunch of RPC's with all of your cluster node listed as the recipients. Can't recall what the RPC is called that you're looking for, though. Hope that helps! -Aaron On 1/26/17 7:57 PM, Oesterlin, Robert wrote: > OK, I have a sick cluster, and it seems to be tied up with quota related > RPCs like this. Any help in narrowing down what the issue is? > > > > Waiting 3.8729 sec since 19:54:09, monitored, thread 32786 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.3158 sec since 19:54:08, monitored, thread 32771 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.3173 sec since 19:54:08, monitored, thread 35829 Msg handler > quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.4619 sec since 19:54:08, monitored, thread 9694 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.4967 sec since 19:54:08, monitored, thread 32357 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.6885 sec since 19:54:08, monitored, thread 32305 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.7123 sec since 19:54:08, monitored, thread 32261 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 4.7932 sec since 19:54:08, monitored, thread 53409 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.2954 sec since 19:54:07, monitored, thread 32905 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3058 sec since 19:54:07, monitored, thread 32573 Msg handler > quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3207 sec since 19:54:07, monitored, thread 32397 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3274 sec since 19:54:07, monitored, thread 32897 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3343 sec since 19:54:07, monitored, thread 32691 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3347 sec since 19:54:07, monitored, thread 32364 Msg handler > quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > Waiting 5.3348 sec since 19:54:07, monitored, thread 32522 Msg handler > quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason > 'waiting for WA lock' > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From r.sobey at imperial.ac.uk Fri Jan 27 11:12:25 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 27 Jan 2017 11:12:25 +0000 Subject: [gpfsug-discuss] Nodeclasses question Message-ID: All, Can it be clarified whether specifying "-N ces" (for example, I have a custom nodeclass called ces containing CES nodes of course) will then apply changes to future nodes that join the same nodeclass? For example, "mmchconfig maxFilesToCache=100000 -N ces" will give existing nodes that new config. I then add a 5th node to the nodeclass. Will it inherit the cache value or will I need to set it again? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 27 12:43:40 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 27 Jan 2017 12:43:40 +0000 Subject: [gpfsug-discuss] ?spam? Nodeclasses question Message-ID: I think this depends on you FS min version. We had some issues where ours was still set to 3.5 I think even though we have 4.x clients. The nodeclasses in mmlsconfig were expanded to individual nodes. But adding a node to a node class would apply the config to the node, though I'd expect you to have to stop/restart GPFS on the node and not expect it to work like "mmchconfig -I" Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 27 January 2017 at 11:12 To: "gpfsug-discuss at spectrumscale.org" > Subject: ?spam? [gpfsug-discuss] Nodeclasses question All, Can it be clarified whether specifying ?-N ces? (for example, I have a custom nodeclass called ces containing CES nodes of course) will then apply changes to future nodes that join the same nodeclass? For example, ?mmchconfig maxFilesToCache=100000 ?N ces? will give existing nodes that new config. I then add a 5th node to the nodeclass. Will it inherit the cache value or will I need to set it again? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From gil at us.ibm.com Fri Jan 27 13:08:06 2017 From: gil at us.ibm.com (Gil Sharon) Date: Fri, 27 Jan 2017 08:08:06 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 60, Issue 72 In-Reply-To: References: Message-ID: yes, node-classes are updated across all nodes, so if you add a node to an existing class it will be included from then on. But for CES nodes there is already a 'built-in' system class: cesNodes. why not use that? you can see all system nodeclasses by: mmlsnodeclass --system Regards, GIL SHARON Spectrum Scale (GPFS) Development Mobile: 978-302-9355 E-mail: gil at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/27/2017 07:00 AM Subject: gpfsug-discuss Digest, Vol 60, Issue 72 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Nodeclasses question (Sobey, Richard A) ---------------------------------------------------------------------- Message: 1 Date: Fri, 27 Jan 2017 11:12:25 +0000 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] Nodeclasses question Message-ID: Content-Type: text/plain; charset="us-ascii" All, Can it be clarified whether specifying "-N ces" (for example, I have a custom nodeclass called ces containing CES nodes of course) will then apply changes to future nodes that join the same nodeclass? For example, "mmchconfig maxFilesToCache=100000 -N ces" will give existing nodes that new config. I then add a 5th node to the nodeclass. Will it inherit the cache value or will I need to set it again? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170127/0d841ddb/attachment-0001.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 60, Issue 72 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at wustl.edu Fri Jan 27 15:49:12 2017 From: mweil at wustl.edu (Matt Weil) Date: Fri, 27 Jan 2017 09:49:12 -0600 Subject: [gpfsug-discuss] LROC 100% utilized in terms of IOs In-Reply-To: <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> References: <657cc9a0-1588-18e6-a647-5d48e1802695@wustl.edu> <55747bf9-b4c1-4523-8d8b-94e8f35f22f9@wustl.edu> Message-ID: <0ad3735a-77d4-6d98-6e8a-135479f3f594@wustl.edu> turning off data seems to have helped this issue Thanks all On 1/26/17 9:37 AM, Matt Weil wrote: 100% utilized are bursts above 200,000 IO's. Any way to tell ganesha.nfsd to cache more? On 1/25/17 3:51 PM, Matt Weil wrote: [ces1,ces2,ces3] maxStatCache 80000 worker1Threads 2000 maxFilesToCache 500000 pagepool 100G maxStatCache 80000 lrocData no 378G system memory. On 1/25/17 3:29 PM, Sven Oehme wrote: have you tried to just leave lrocInodes and lrocDirectories on and turn data off ? yes data I just turned off also did you increase maxstatcache so LROC actually has some compact objects to use ? if you send value for maxfilestocache,maxfilestocache,workerthreads and available memory of the node i can provide a start point. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil > wrote: On 1/25/17 3:00 PM, Sven Oehme wrote: Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set. the best way is to reduce the max size of data to be cached into lroc. I just turned LROC file caching completely off. most if not all of the IO is metadata. Which is what I wanted to keep fast. It is amazing once you drop the latency the IO's go up way more than they ever where before. I guess we will need another nvme. sven On Wed, Jan 25, 2017 at 9:50 PM Matt Weil > wrote: Hello all, We are having an issue where the LROC on a CES node gets overrun 100% utilized. Processes then start to backup waiting for the LROC to return data. Any way to have the GPFS client go direct if LROC gets to busy? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Fri Jan 27 17:17:53 2017 From: laurence at qsplace.co.uk (laurence at qsplace.co.uk) Date: Fri, 27 Jan 2017 17:17:53 +0000 Subject: [gpfsug-discuss] ?spam? Nodeclasses question In-Reply-To: References: Message-ID: Richard, As Simon notes in 3.5 they were expanded and where a pain; however this has since been tidied up and now works as it "should". So any further node added to a group will inherit the relevant parts of the config. i.e. (I've snipped the boring bits out) mmlsnodeclass Node Class Name Members --------------------- ----------------------------------------------------------- site2 s2gpfs1.site2,s2gpfs2.site2 mmchconfig pagepool=2G -N site2 mmshutdown -a mmstartup -a mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs3.site2: pagepool 1073741824 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 mmchnodeclass site2 add -N s2gpfs3.site2 mmshutdown -N s2gpfs3.site2 mmstartup -N s2gpfs3.site2 mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 s2gpfs3.site2: ! pagepool 2147483648 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 -- Lauz On 2017-01-27 12:43, Simon Thompson (Research Computing - IT Services) wrote: > I think this depends on you FS min version. > > We had some issues where ours was still set to 3.5 I think even though > we have 4.x clients. The nodeclasses in mmlsconfig were expanded to > individual nodes. But adding a node to a node class would apply the > config to the node, though I'd expect you to have to stop/restart GPFS > on the node and not expect it to work like "mmchconfig -I" > > Simon > > From: on behalf of "Sobey, > Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > > Date: Friday, 27 January 2017 at 11:12 > To: "gpfsug-discuss at spectrumscale.org" > > Subject: ?spam? [gpfsug-discuss] Nodeclasses question > > All, > > Can it be clarified whether specifying ?-N ces? (for example, I > have a custom nodeclass called ces containing CES nodes of course) > will then apply changes to future nodes that join the same nodeclass? > > For example, ?mmchconfig maxFilesToCache=100000 ?N ces? will > give existing nodes that new config. I then add a 5th node to the > nodeclass. Will it inherit the cache value or will I need to set it > again? > > Thanks > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From r.sobey at imperial.ac.uk Fri Jan 27 21:13:28 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 27 Jan 2017 21:13:28 +0000 Subject: [gpfsug-discuss] ?spam? Nodeclasses question In-Reply-To: References: , Message-ID: Thanks Lauz and Simon. Next question and I presume the answer is "yes": if you specify a node explicitly that already has a certain config applied through a nodeclass, the value that has been set specific to that node should override the nodeclass setting. Correct? Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of laurence at qsplace.co.uk Sent: 27 January 2017 17:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] ?spam? Nodeclasses question Richard, As Simon notes in 3.5 they were expanded and where a pain; however this has since been tidied up and now works as it "should". So any further node added to a group will inherit the relevant parts of the config. i.e. (I've snipped the boring bits out) mmlsnodeclass Node Class Name Members --------------------- ----------------------------------------------------------- site2 s2gpfs1.site2,s2gpfs2.site2 mmchconfig pagepool=2G -N site2 mmshutdown -a mmstartup -a mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs3.site2: pagepool 1073741824 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 mmchnodeclass site2 add -N s2gpfs3.site2 mmshutdown -N s2gpfs3.site2 mmstartup -N s2gpfs3.site2 mmdsh -N nsdnodes "mmdiag --config | grep page" s2gpfs2.site2: ! pagepool 2147483648 s2gpfs2.site2: pagepoolMaxPhysMemPct 75 s2gpfs1.site2: ! pagepool 2147483648 s2gpfs1.site2: pagepoolMaxPhysMemPct 75 s2gpfs3.site2: ! pagepool 2147483648 s2gpfs3.site2: pagepoolMaxPhysMemPct 75 -- Lauz On 2017-01-27 12:43, Simon Thompson (Research Computing - IT Services) wrote: > I think this depends on you FS min version. > > We had some issues where ours was still set to 3.5 I think even though > we have 4.x clients. The nodeclasses in mmlsconfig were expanded to > individual nodes. But adding a node to a node class would apply the > config to the node, though I'd expect you to have to stop/restart GPFS > on the node and not expect it to work like "mmchconfig -I" > > Simon > > From: on behalf of "Sobey, > Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > > Date: Friday, 27 January 2017 at 11:12 > To: "gpfsug-discuss at spectrumscale.org" > > Subject: ?spam? [gpfsug-discuss] Nodeclasses question > > All, > > Can it be clarified whether specifying "-N ces" (for example, I > have a custom nodeclass called ces containing CES nodes of course) > will then apply changes to future nodes that join the same nodeclass? > > For example, "mmchconfig maxFilesToCache=100000 -N ces" will > give existing nodes that new config. I then add a 5th node to the > nodeclass. Will it inherit the cache value or will I need to set it > again? > > Thanks > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jan 27 22:54:51 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 27 Jan 2017 17:54:51 -0500 Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs In-Reply-To: References: <061a15b7-f5e9-7c16-2e79-3236665a9368@nasa.gov> Message-ID: <239473a0-a8b7-0f13-f55d-a9e85948ce19@nasa.gov> This is rather disconcerting. We just finished upgrading our nsd servers from 3.5.0.31 to 4.1.1.10 (All clients were previously migrated from 3.5.0.31 to 4.1.1.10). After finishing that upgrade I'm now seeing these errors with some frequency (a couple every few minutes). Anyone have insight? On 1/18/17 11:58 AM, Brian Marshall wrote: > As background, we recently upgraded GPFS from 4.2.0 to 4.2.1 and > updated the Mellanox OFED on our compute cluster to allow it to move > from CentOS 7.1 to 7.2 > > We do some transient warnings from the Mellanox switch gear about > various port counters that we are tracking down with them. > > Jobs and filesystem seem stable, but the logs are concerning. > > On Wed, Jan 18, 2017 at 10:22 AM, Aaron Knister > > wrote: > > I'm curious about this too. We see these messages sometimes when > things have gone horribly wrong but also sometimes during recovery > events. Here's a recent one: > > loremds20 (manager/nsd node): > Mon Jan 16 14:19:02.048 2017: [E] VERBS RDMA rdma read error > IBV_WC_REM_ACCESS_ERR to 10.101.11.6 (lorej006) on mlx5_0 port 1 > fabnum 3 vendor_err 136 > Mon Jan 16 14:19:02.049 2017: [E] VERBS RDMA closed connection to > 10.101.11.6 (lorej006) on mlx5_0 port 1 fabnum 3 due to RDMA read > error IBV_WC_REM_ACCESS_ERR index 11 > > lorej006 (client): > Mon Jan 16 14:19:01.990 2017: [N] VERBS RDMA closed connection to > 10.101.53.18 (loremds18) on mlx5_0 port 1 fabnum 3 index 2 > Mon Jan 16 14:19:01.995 2017: [N] VERBS RDMA closed connection to > 10.101.53.19 (loremds19) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:01.997 2017: [I] Recovering nodes: 10.101.53.18 > 10.101.53.19 > Mon Jan 16 14:19:02.047 2017: [W] VERBS RDMA async event > IBV_EVENT_QP_ACCESS_ERR on mlx5_0 qp 0x7fffe550f1c8. > Mon Jan 16 14:19:02.051 2017: [E] VERBS RDMA closed connection to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 error 733 index 1 > Mon Jan 16 14:19:02.071 2017: [I] Recovered 2 nodes for file system > tnb32. > Mon Jan 16 14:19:02.140 2017: [I] VERBS RDMA connecting to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 index 0 > Mon Jan 16 14:19:02.160 2017: [I] VERBS RDMA connected to > 10.101.53.20 (loremds20) on mlx5_0 port 1 fabnum 3 sl 0 index 0 > > I had just shut down loremds18 and loremds19 so there was certainly > recovery taking place and during that time is when the error seems > to have occurred. > > I looked up the meaning of IBV_WC_REM_ACCESS_ERR here > (http://www.rdmamojo.com/2013/02/15/ibv_poll_cq/ > ) and see this: > > IBV_WC_REM_ACCESS_ERR (10) - Remote Access Error: a protection error > occurred on a remote data buffer to be read by an RDMA Read, written > by an RDMA Write or accessed by an atomic operation. This error is > reported only on RDMA operations or atomic operations. Relevant for > RC QPs. > > my take on it during recovery it seems like one end of the > connection more or less hanging up on the other end (e.g. Connection > reset by peer > /ECONNRESET). > > But like I said at the start, we also see this when there something > has gone awfully wrong. > > -Aaron > > On 1/18/17 3:59 AM, Simon Thompson (Research Computing - IT > Services) wrote: > > I'd be inclined to look at something like: > > ibqueryerrors -s > PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors > -c > > And see if you have a high number of symbol errors, might be a cable > needs replugging or replacing. > > Simon > > From: > >> on behalf of > "J. Eric > Wonderley" > >> > Reply-To: "gpfsug-discuss at spectrumscale.org > > >" > > >> > Date: Tuesday, 17 January 2017 at 21:16 > To: "gpfsug-discuss at spectrumscale.org > > >" > > >> > Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs > > I have messages like these frequent my logs: > Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error > IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 > fabnum 0 > vendor_err 136 > Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to > 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error > IBV_WC_REM_ACCESS_ERR index 23 > > Any ideas on cause..? > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From jonathon.anderson at colorado.edu Mon Jan 30 22:10:25 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Mon, 30 Jan 2017 22:10:25 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. From olaf.weiser at de.ibm.com Tue Jan 31 08:30:19 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 09:30:19 +0100 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 15:13:34 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 15:13:34 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Jan 31 15:42:33 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 16:42:33 +0100 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 16:32:18 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 16:32:18 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 16:35:23 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 16:35:23 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <33ECA51F-9169-4F8B-AAA9-1C6E494B8534@colorado.edu> Message-ID: <1515B2FC-1B1B-4A8B-BB7B-CD7C815B662A@colorado.edu> > [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa Just to head-off any concerns that this problem is a result of the ces-ip in this command not being one of the ces ips added in my earlier examples, this is just an artifact of changing configuration during the troubleshooting process. I realized that while 10.225.71.{104,105} were allocated to this node, they were to be used for something else, and shouldn?t be under CES control; so I changed our CES addresses to 10.225.71.{102,103}. On 1/30/17, 3:10 PM, "Jonathon A Anderson" wrote: In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. From olaf.weiser at de.ibm.com Tue Jan 31 17:45:17 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 17:45:17 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: Message-ID: I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von:"Jonathon A Anderson" An:"gpfsug main discussion list" Datum:Di. 31.01.2017 17:32Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Tue Jan 31 17:47:12 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 17:47:12 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: Message-ID: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jan 31 20:07:14 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 31 Jan 2017 20:07:14 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> References: , <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> Message-ID: We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes. According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 17:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Tue Jan 31 20:11:31 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Tue, 31 Jan 2017 20:11:31 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> Message-ID: Simon, This is what I?d usually do, and I?m pretty sure it?d fix the problem; but we only have two protocol nodes, so no good way to do quorum in a separate cluster of just those two. Plus, I?d just like to see the bug fixed. I suppose we could move the compute nodes to a separate cluster, and keep the protocol nodes together with the NSD servers; but then I?m back to the age-old question of ?do I technically violate the GPFS license in order to do the right thing architecturally?? (Since you have to nominate GPFS servers in the client-only cluster to manage quorum, for nodes that only have client licenses.) So far, we?re 100% legit, and it?d be better to stay that way. ~jonathon On 1/31/17, 1:07 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services)" wrote: We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes. According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 17:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jan 31 20:21:10 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 31 Jan 2017 20:21:10 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: References: <9A756F92-C3CF-42DF-983C-BD83334B37EB@colorado.edu> , Message-ID: Ah we have separate server licensed nodes in the hpc cluster (typically we have some stuff for config management, monitoring etc, so we license those as servers). Agreed the bug should be fixed, I was meaning that we probably don't see it as the CES cluster is 4 nodes serving protocols (plus some other data access boxes). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 20:11 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Simon, This is what I?d usually do, and I?m pretty sure it?d fix the problem; but we only have two protocol nodes, so no good way to do quorum in a separate cluster of just those two. Plus, I?d just like to see the bug fixed. I suppose we could move the compute nodes to a separate cluster, and keep the protocol nodes together with the NSD servers; but then I?m back to the age-old question of ?do I technically violate the GPFS license in order to do the right thing architecturally?? (Since you have to nominate GPFS servers in the client-only cluster to manage quorum, for nodes that only have client licenses.) So far, we?re 100% legit, and it?d be better to stay that way. ~jonathon On 1/31/17, 1:07 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services)" wrote: We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes. According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 31 January 2017 17:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi k Gesendet von IBM Verse Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von: "Jonathon A Anderson" An: "gpfsug main discussion list" Datum: Di. 31.01.2017 17:32 Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ________________________________ No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort. I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR? Thanks. ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ok.. so obviously ... it seems , that we have several issues.. the 3983 characters is obviously a defect have you already raised a PMR , if so , can you send me the number ? From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread. The actual command is tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l 120 [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l 403 Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters. [root at sgate2 ~]# tsctl shownodes up | wc -c 3983 Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete. [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1 shas0260-opa.rc.int.col[root at sgate2 ~]# I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :) I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though. For the record: [root at sgate2 ~]# rpm -qa | grep -i gpfs gpfs.base-4.2.1-2.x86_64 gpfs.msg.en_US-4.2.1-2.noarch gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64 gpfs.gskit-8.0.50-57.x86_64 gpfs.gpl-4.2.1-2.noarch nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64 gpfs.ext-4.2.1-2.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64 gpfs.docs-4.2.1-2.noarch ~jonathon From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Hi ...same thing here.. everything after 10 nodes will be truncated.. though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-) the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items... should be easy to fix.. cheers olaf From: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PM Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm? Here are the details of my investigation: ## GPFS is up on sgate2 [root at sgate2 ~]# mmgetstate Node number Node name GPFS state ------------------------------------------ 414 sgate2-opa active ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa mmces address move: GPFS is down on this node. mmces address move: Command failed. Examine previous error messages to determine cause. ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\ "%s: GPFS is down on this node." ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress downNodeList=$(getDownCesNodeList) for downNode in $downNodeList do if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd" ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up` [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs function getDownCesNodeList { typeset sourceFile="mmcesfuncs.sh" [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] && set -x $mmTRACE_ENTER "$*" typeset upnodefile=${cmdTmpDir}upnodefile typeset downNodeList # get all CES nodes $sort -o $nodefile $mmfsCesNodes.dae $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile downNodeList=$($comm -23 $nodefile $upnodefile) print -- $downNodeList } #----- end of function getDownCesNodeList -------------------- ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail shas0251-opa.rc.int.colorado.edu shas0252-opa.rc.int.colorado.edu shas0253-opa.rc.int.colorado.edu shas0254-opa.rc.int.colorado.edu shas0255-opa.rc.int.colorado.edu shas0256-opa.rc.int.colorado.edu shas0257-opa.rc.int.colorado.edu shas0258-opa.rc.int.colorado.edu shas0259-opa.rc.int.colorado.edu shas0260-opa.rc.int.col[root at sgate2 ~]# ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`. On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Tue Jan 31 22:47:23 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 31 Jan 2017 22:47:23 +0000 Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes In-Reply-To: Message-ID: Yeah... depending on the #nodes you 're affected or not. ..... So if your remote ces cluster is small enough in terms of the #nodes ... you'll neuer hit into this issue Gesendet von IBM Verse Simon Thompson (Research Computing - IT Services) --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes --- Von:"Simon Thompson (Research Computing - IT Services)" An:"gpfsug main discussion list" Datum:Di. 31.01.2017 21:07Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes.According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken.Simon________________________________________From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu]Sent: 31 January 2017 17:47To: gpfsug main discussion listSubject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesYeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment.~jonathonFrom: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 10:45 AMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesI ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base i thi kGesendet von IBM VerseJonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ---Von:"Jonathon A Anderson" An:"gpfsug main discussion list" Datum:Di. 31.01.2017 17:32Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes________________________________No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort.I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR?Thanks.~jonathonFrom: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 8:42 AMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesok.. so obviously ... it seems , that we have several issues..the 3983 characters is obviously a defecthave you already raised a PMR , if so , can you send me the number ?From: Jonathon A Anderson To: gpfsug main discussion list Date: 01/31/2017 04:14 PMSubject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesSent by: gpfsug-discuss-bounces at spectrumscale.org________________________________The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread.The actual command istsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefileBut you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster.[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l120[root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l403Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters.[root at sgate2 ~]# tsctl shownodes up | wc -c3983Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete.[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1shas0260-opa.rc.int.col[root at sgate2 ~]#I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :)I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though.For the record:[root at sgate2 ~]# rpm -qa | grep -i gpfsgpfs.base-4.2.1-2.x86_64gpfs.msg.en_US-4.2.1-2.noarchgpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64gpfs.gskit-8.0.50-57.x86_64gpfs.gpl-4.2.1-2.noarchnfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64gpfs.ext-4.2.1-2.x86_64gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64gpfs.docs-4.2.1-2.noarch~jonathonFrom: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Tuesday, January 31, 2017 at 1:30 AMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesHi ...same thing here.. everything after 10 nodes will be truncated..though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-)the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items...should be easy to fix..cheersolafFrom: Jonathon A Anderson To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2017 11:11 PMSubject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodesSent by: gpfsug-discuss-bounces at spectrumscale.org________________________________In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm?Here are the details of my investigation:## GPFS is up on sgate2[root at sgate2 ~]# mmgetstateNode number Node name GPFS state------------------------------------------ 414 sgate2-opa active## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down[root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opammces address move: GPFS is down on this node.mmces address move: Command failed. Examine previous error messages to determine cause.## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs[root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs 109 ) msgTxt=\"%s: GPFS is down on this node."## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList[root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddressdownNodeList=$(getDownCesNodeList)for downNode in $downNodeListdo if [[ $toNodeName == $downNode ]] then printErrorMsg 109 "$mmcmd"## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up`[root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncsfunction getDownCesNodeList{typeset sourceFile="mmcesfuncs.sh"[[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] &&set -x$mmTRACE_ENTER "$*"typeset upnodefile=${cmdTmpDir}upnodefiletypeset downNodeList# get all CES nodes$sort -o $nodefile $mmfsCesNodes.dae$tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefiledownNodeList=$($comm -23 $nodefile $upnodefile)print -- $downNodeList} #----- end of function getDownCesNodeList --------------------## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated[root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tailshas0251-opa.rc.int.colorado.edushas0252-opa.rc.int.colorado.edushas0253-opa.rc.int.colorado.edushas0254-opa.rc.int.colorado.edushas0255-opa.rc.int.colorado.edushas0256-opa.rc.int.colorado.edushas0257-opa.rc.int.colorado.edushas0258-opa.rc.int.colorado.edushas0259-opa.rc.int.colorado.edushas0260-opa.rc.int.col[root at sgate2 ~]### I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`.On 1/24/17, 12:48 PM, "Jonathon A Anderson" wrote: I think I'm having the same issue described here: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804) We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS. Here's the steps I took: --- mmcrnodeclass protocol -N sgate1-opa,sgate2-opa mmcrnodeclass nfs -N sgate1-opa,sgate2-opa mmchconfig cesSharedRoot=/gpfs/summit/ces mmchcluster --ccr-enable mmchnode --ces-enable -N protocol mmces service enable NFS mmces service start NFS -N nfs mmces address add --ces-ip 10.225.71.104,10.225.71.105 mmces address policy even-coverage mmces address move --rebalance --- This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot. Things I've tried: * disabling ces on the sgate nodes and re-running the above procedure * moving the cluster and filesystem managers to different snsd nodes * deleting and re-creating the cesSharedRoot directory Meanwhile, the following log entry appears in mmfs.log.latest every ~30s: --- Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1 Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+ Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+ --- Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries): --- 2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275 2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333 --- For the record, here's the interface I expect to get the address on sgate1: --- 11: bond0: mtu 9000 qdisc noqueue state UP link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff:fe08:a7c0/64 scope link valid_lft forever preferred_lft forever --- which is a bond of p2p1 and p2p2. --- 6: p2p1: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff 7: p2p2: mtu 9000 qdisc mq master bond0 state UP qlen 1000 link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff --- A similar bond0 exists on sgate2. I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far._______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: