From kkr at lbl.gov Thu Feb 1 18:10:46 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 1 Feb 2018 10:10:46 -0800 Subject: [gpfsug-discuss] Grafana Bridge/OpenTSDB-related question Message-ID: <00D3A984-5CAE-4A17-8948-A3063901701C@lbl.gov> Sorry this is slightly OT from GPFS, but it is an issue I?m bumping up against trying to use Grafana with the IBM-provided OpenTSDB bridge for Zimon stats. My issue is very similar to the one posted here, which comes to a dead end (https://community.grafana.com/t/one-alert-for-group-of-hosts/2090 ) I?d like to use the Grafana alert functionality to monitor for thresholds on individual nodes, NSDs etc. The ugly way to do this would be to add a metric and alert for each node, NSD or whatever I want to watch for threshold crossing. The better way to do this would be to let a query report back the node, NSD, whatever so I can generate an alert such as ?CPU approaching 100% on ? So my question is does anyone have a clever workaround or alternate approach to achieve this goal? Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Feb 2 16:43:51 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 2 Feb 2018 16:43:51 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> Message-ID: The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfosburg at mdanderson.org Fri Feb 2 16:49:36 2018 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Fri, 2 Feb 2018 16:49:36 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> Message-ID: I?ve just reached out to our GPFS architect at IBM. From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Feb 2 17:04:14 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 2 Feb 2018 17:04:14 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Message-ID: <90EF00A9-E89D-48EA-A04B-B069BF81E188@nuance.com> Link takes a bit to be active ? it?s there now. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [EXTERNAL] [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfosburg at mdanderson.org Fri Feb 2 17:03:00 2018 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Fri, 2 Feb 2018 17:03:00 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> Message-ID: <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> The document is now up. From: on behalf of Jonathan Fosburgh Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) I?ve just reached out to our GPFS architect at IBM. From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Feb 2 17:45:36 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 2 Feb 2018 17:45:36 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> , <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> Message-ID: Good stuff. Thanks all. Get Outlook for Android ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Fosburgh,Jonathan Sent: Friday, February 2, 2018 5:03:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The document is now up. From: on behalf of Jonathan Fosburgh Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) I?ve just reached out to our GPFS architect at IBM. From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From SAnderson at convergeone.com Fri Feb 2 19:59:14 2018 From: SAnderson at convergeone.com (Shaun Anderson) Date: Fri, 2 Feb 2018 19:59:14 +0000 Subject: [gpfsug-discuss] In place upgrade of ESS? Message-ID: <1517601554597.83665@convergeone.com> I haven't found a firm answer yet. Is it possible to in place upgrade say, a GL2 to a GL4 and subsequently a GL6? ? Do we know if this feature is coming? SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments hereto may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Feb 2 20:23:36 2018 From: ewahl at osc.edu (Edward Wahl) Date: Fri, 2 Feb 2018 15:23:36 -0500 Subject: [gpfsug-discuss] policy ilm features? In-Reply-To: References: <20180119163803.79fddbeb@osc.edu> Message-ID: <20180202152336.03e8bab7@osc.edu> Thanks John, this was the path I was HOPING to go down as I do similar things already, but there appears to be no extended attribute in ILM for what I want. Data block replication flag exists in the ILM, but not MetaData, or balance. Yet these states ARE reported by mmlsattr, so there must be a flag somewhere. bad MD replication & balance example: mmlsattr -L /fs/scratch/sysp/ed/180days.pol file name: /fs/scratch/sysp/ed/180days.pol metadata replication: 1 max 2 data replication: 1 max 2 flags: illreplicated,unbalanced Encrypted: yes File next to it for comparison. note proper MD replication and balance. mmlsattr -L /fs/scratch/sysp/ed/120days.pol file name: /fs/scratch/sysp/ed/120days.pol metadata replication: 2 max 2 data replication: 1 max 2 flags: Encrypted: yes misc_attributes flags from a policy run showing no difference in status: FJAEu -- /fs/scratch/sysp/ed/180days.pol FJAEu -- /fs/scratch/sysp/ed/120days.pol File system has MD replication enabled, but not Data, so ALL files show "J" ilm flag mmlsfs scratch -m flag value description ------------------- ------------------------ ----------------------------------- -m 2 Default number of metadata replicas mmlsfs scratch -r flag value description ------------------- ------------------------ ----------------------------------- -r 1 Default number of data replicas I poked around a little trying to find out if perhaps using GetXattr would work and show me what I wanted, it does not. All I sem to be able to get is the File Encryption Key. I was hoping perhaps someone had found a cheaper way for this to work rather than hundreds of millions of 'mmlsattr' execs. :-( On the plus side, I've only run across a few of these and all appear to be from before we did the MD replication and re-striping. On the minus, I have NO idea where they are, and they appears to be on both of our filesystems. So several hundred million files to check. Ed On Mon, 22 Jan 2018 08:29:42 +0000 John Hearns wrote: > Ed, > This is not a perfect answer. You need to look at policies for this. I have > been doing something similar recently. > > Something like: > > RULE 'list_file' EXTERNAL LIST 'all-files' EXEC > '/var/mmfs/etc/mmpolicyExec-list' RULE 'listall' list 'all-files' > SHOW( varchar(kb_allocated) || ' ' || varchar(file_size) || ' ' || > varchar(misc_attributes) || ' ' || name || ' ' || fileset_name ) WHERE > REGEX(misc_attributes,'[J]') > > > So this policy shows the kbytes allocates, file size, the miscellaneous > attributes, name and fileset name For all files with miscellaneous > attributes of 'J' which means 'Some data blocks might be ill replicated' > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Edward Wahl > Sent: Friday, January 19, 2018 10:38 PM To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] policy ilm features? > > > This one has been on my list a long time so I figured I'd ask here first > before I open an apar or request an enhancement (most likely). > > Is there a way using the policy engine to determine the following? > > -metadata replication total/current > -unbalanced file > > Looking to catch things like this that stand out on my filesystem without > having to run several hundred million 'mmlsattr's. > > metadata replication: 1 max 2 > flags: unbalanced > > Ed > > > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C056e34c5a8df4d8f10fd08d55f91e73c%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=dnt7vV4TCd68l7fSJnY35eyNM%2B8pNrZElImSZeZbit8%3D&reserved=0 > -- The information contained in this communication and any attachments is > confidential and may be privileged, and is for the sole use of the intended > recipient(s). Any unauthorized review, use, disclosure or distribution is > prohibited. Unless explicitly stated otherwise in the body of this > communication or the attachment thereto (if any), the information is provided > on an AS-IS basis without any express or implied warranties or liabilities. > To the extent you are relying on this information, you are doing so at your > own risk. If you are not the intended recipient, please notify the sender > immediately by replying to this message and destroy all copies of this > message and any attachments. Neither the sender nor the company/group of > companies he or she represents shall be liable for the proper and complete > transmission of the information contained in this communication, or for any > delay in its receipt. _______________________________________________ > gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From S.J.Thompson at bham.ac.uk Fri Feb 2 20:41:42 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 2 Feb 2018 20:41:42 +0000 Subject: [gpfsug-discuss] In place upgrade of ESS? In-Reply-To: <1517601554597.83665@convergeone.com> References: <1517601554597.83665@convergeone.com> Message-ID: If you mean adding storage shelves to increase capacity to an ESS, then no I don't believe it is supported. I think it is supported on the Lenovo DSS-G models, though you have to have a separate DA for each shelf increment so the performance may different between an upgraded Vs complete solution. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of SAnderson at convergeone.com [SAnderson at convergeone.com] Sent: 02 February 2018 19:59 To: gpfsug main discussion list Subject: [gpfsug-discuss] In place upgrade of ESS? I haven't found a firm answer yet. Is it possible to in place upgrade say, a GL2 to a GL4 and subsequently a GL6? ? Do we know if this feature is coming? SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments hereto may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. From aaron.s.knister at nasa.gov Fri Feb 2 20:46:27 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 2 Feb 2018 15:46:27 -0500 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> Message-ID: Has anyone asked for the efix and gotten it? I'm not having much luck so far. -Aaron On 2/2/18 12:45 PM, Sobey, Richard A wrote: > Good stuff. Thanks all. > > Get Outlook for Android > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at spectrumscale.org > on behalf of > Fosburgh,Jonathan > *Sent:* Friday, February 2, 2018 5:03:00 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > ? > > The document is now up. > > ? > > *From: * on behalf of Jonathan > Fosburgh > *Reply-To: *gpfsug main discussion list > *Date: *Friday, February 2, 2018 at 10:59 AM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > > ? > > I?ve just reached out to our GPFS architect at IBM. > > ? > > *From: * on behalf of "Sobey, > Richard A" > *Reply-To: *gpfsug main discussion list > *Date: *Friday, February 2, 2018 at 10:44 AM > *To: *"'gpfsug-discuss at spectrumscale.org'" > > *Subject: *[gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > > ? > > The link goes nowhere ? can anyone point us in the right direction? > > ? > > Thanks > > Richard > > ? > > *From:* IBM My Notifications [mailto:mynotify at stg.events.ihost.com] > *Sent:* 02 February 2018 16:39 > *To:* Sobey, Richard A > *Subject:* FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of > archived sparse files (Linux) (2018.02.02) > > ? > > ? > > *Storage * > > IBM My Notifications > > Check out the *IBM Electronic > Support* > > > > ? > > > > ? > > IBM Spectrum Scale > > > > *: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse > files > (Linux)*** > > > > IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for > Linux environments, in which a sparse file may be silently corrupted > during archival, resulting in the file being restored incorrectly. > > > > ? > > *Subscribe or Unsubscribe*| > *Feedback*| > *Follow us on Twitter*. > > Your support Notifications display in English by default. Machine > translation based on your IBM profile language setting is added if you > specify this option in My defaults within My Notifications. (Note: Not > all languages are available at this time, and the English version always > takes precedence over the machine translated version.) > > > > Get help with technical questions on the dW Answers > forum > > To ensure proper delivery please add > mynotify at stg.events.ihost.comto > your address book. > > You received this email because you are subscribed to IBM My > Notifications as: > r.sobey at imperial.ac.uk** > > Please do not reply to this message as it is generated by an automated > service machine. > > > > > ?International Business Machines Corporation 2018. All rights reserved. > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > ? > > The information contained in this e-mail message may be privileged, > confidential, and/or protected from disclosure. This e-mail message may > contain protected health information (PHI); dissemination of PHI should > comply with applicable federal and state laws. If you are not the > intended recipient, or an authorized representative of the intended > recipient, any further review, disclosure, use, dissemination, > distribution, or copying of this message or any attachment (or the > information contained therein) is strictly prohibited. If you think that > you have received this e-mail message in error, please notify the sender > by return e-mail and delete all references to it and its contents from > your systems. > > The information contained in this e-mail message may be privileged, > confidential, and/or protected from disclosure. This e-mail message may > contain protected health information (PHI); dissemination of PHI should > comply with applicable federal and state laws. If you are not the > intended recipient, or an authorized representative of the intended > recipient, any further review, disclosure, use, dissemination, > distribution, or copying of this message or any attachment (or the > information contained therein) is strictly prohibited. If you think that > you have received this e-mail message in error, please notify the sender > by return e-mail and delete all references to it and its contents from > your systems. > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From ewahl at osc.edu Fri Feb 2 22:17:47 2018 From: ewahl at osc.edu (Edward Wahl) Date: Fri, 2 Feb 2018 17:17:47 -0500 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: <90EF00A9-E89D-48EA-A04B-B069BF81E188@nuance.com> References: <90EF00A9-E89D-48EA-A04B-B069BF81E188@nuance.com> Message-ID: <20180202171747.5e7adeb2@osc.edu> Should we even ask if Spectrum Protect (TSM) is affected? Ed On Fri, 2 Feb 2018 17:04:14 +0000 "Oesterlin, Robert" wrote: > Link takes a bit to be active ? it?s there now. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > From: on behalf of "Sobey, Richard > A" Reply-To: gpfsug main discussion list > Date: Friday, February 2, 2018 at 10:44 AM > To: "'gpfsug-discuss at spectrumscale.org'" > Subject: [EXTERNAL] [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > > The link goes nowhere ? can anyone point us in the right direction? > > Thanks > Richard > > From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] > Sent: 02 February 2018 16:39 > To: Sobey, Richard A > Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived > sparse files (Linux) (2018.02.02) > > > > > Storage > > IBM My Notifications > > Check out the IBM Electronic > Support > > > > > > > IBM Spectrum Scale > > : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files > (Linux) > > IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux > environments, in which a sparse file may be silently corrupted during > archival, resulting in the file being restored incorrectly. > > > Subscribe or > Unsubscribe > | > Feedback > | Follow us on > Twitter. > > Your support Notifications display in English by default. Machine translation > based on your IBM profile language setting is added if you specify this > option in My defaults within My Notifications. (Note: Not all languages are > available at this time, and the English version always takes precedence over > the machine translated version.) > > > Get help with technical questions on the dW Answers > forum > > To ensure proper delivery please add > mynotify at stg.events.ihost.com to your > address book. > > You received this email because you are subscribed to IBM My Notifications as: > r.sobey at imperial.ac.uk > > Please do not reply to this message as it is generated by an automated > service machine. > > > > > ?International Business Machines Corporation 2018. All rights reserved. > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From duersch at us.ibm.com Sat Feb 3 02:32:49 2018 From: duersch at us.ibm.com (Steve Duersch) Date: Fri, 2 Feb 2018 21:32:49 -0500 Subject: [gpfsug-discuss] In place upgrade of ESS? In-Reply-To: References: Message-ID: This has been on our to-do list for quite some time. We hope to have in place hardware upgrade in 2H2018. Steve Duersch Spectrum Scale IBM Poughkeepsie, New York gpfsug-discuss-bounces at spectrumscale.org wrote on 02/02/2018 03:15:33 PM: > > Message: 2 > Date: Fri, 2 Feb 2018 19:59:14 +0000 > From: Shaun Anderson > To: gpfsug main discussion list > Subject: [gpfsug-discuss] In place upgrade of ESS? > Message-ID: <1517601554597.83665 at convergeone.com> > Content-Type: text/plain; charset="iso-8859-1" > > I haven't found a firm answer yet. Is it possible to in place > upgrade say, a GL2 to a GL4 and subsequently a GL6? > > ? > > Do we know if this feature is coming? > > SHAUN ANDERSON > STORAGE ARCHITECT > O 208.577.2112 > M 214.263.7014 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Sun Feb 4 19:58:39 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Sun, 04 Feb 2018 14:58:39 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? Message-ID: <20180204145839.77101pngtlr3qacv@support.scinet.utoronto.ca> Here is what I found for versions 4 & 3.5: * Maximum Number of Dependent Filesets: 10,000 * Maximum Number of Independent Filesets: 1,000 https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets I'm having some difficulty finding published documentation on limitations for version 5: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm Any hints? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From truongv at us.ibm.com Mon Feb 5 13:20:16 2018 From: truongv at us.ibm.com (Truong Vu) Date: Mon, 5 Feb 2018 08:20:16 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: Message-ID: Hi Jamie, The limits are the same in 5.0.0. We'll look into the FAQ. Thanks, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 02/05/2018 07:00 AM Subject: gpfsug-discuss Digest, Vol 73, Issue 9 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) ---------------------------------------------------------------------- Message: 1 Date: Sun, 04 Feb 2018 14:58:39 -0500 From: "Jaime Pinto" To: "gpfsug main discussion list" Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? Message-ID: <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Here is what I found for versions 4 & 3.5: * Maximum Number of Dependent Filesets: 10,000 * Maximum Number of Independent Filesets: 1,000 https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets I'm having some difficulty finding published documentation on limitations for version 5: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm Any hints? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= End of gpfsug-discuss Digest, Vol 73, Issue 9 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Mon Feb 5 13:50:51 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 05 Feb 2018 08:50:51 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: Message-ID: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca> Thanks Truong Jaime Quoting "Truong Vu" : > > Hi Jamie, > > The limits are the same in 5.0.0. We'll look into the FAQ. > > Thanks, > Tru. > > > > > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 02/05/2018 07:00 AM > Subject: gpfsug-discuss Digest, Vol 73, Issue 9 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 04 Feb 2018 14:58:39 -0500 > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > Message-ID: > <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> > Content-Type: text/plain; charset=ISO-8859-1; > DelSp="Yes"; > format="flowed" > > Here is what I found for versions 4 & 3.5: > * Maximum Number of Dependent Filesets: 10,000 > * Maximum Number of Independent Filesets: 1,000 > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets > > > > I'm having some difficulty finding published documentation on > limitations for version 5: > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm > > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm > > > Any hints? > > Thanks > Jaime > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > > > End of gpfsug-discuss Digest, Vol 73, Issue 9 > ********************************************* > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From daniel.kidger at uk.ibm.com Mon Feb 5 14:19:39 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Mon, 5 Feb 2018 14:19:39 +0000 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca> References: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca>, Message-ID: An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon Feb 5 15:02:17 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 05 Feb 2018 10:02:17 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca>, Message-ID: <20180205100217.46131a75yav2wi61@support.scinet.utoronto.ca> We are considering moving from user/group based quotas to path based quotas with nested filesets. We also facing challenges to traverse 'Dependent Filesets' for daily TSM backups of projects and for purging scratch area. We're about to deploy a new GPFS storage cluster, some 12-15PB, 13K+ users and 5K+ groups as the baseline, with expected substantial scaling up within the next 3-5 years in all dimmensions. Therefore, decisions we make now under GPFS v4.x trough v5.x will have consequences in the very near future, if they are not the proper ones. Thanks Jaime Quoting "Daniel Kidger" : > Jamie, I believe at least one of those limits is 'maximum supported' > rather than an architectural limit. Is your use case one which > would push these boundaries? If so care to describe what you would > wish to do? Daniel > > [1] > > DR DANIEL KIDGER > IBM Technical Sales Specialist > Software Defined Solution Sales > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Truong Vu" > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > Date: Mon, Feb 5, 2018 2:56 PM > Thanks Truong > Jaime > > Quoting "Truong Vu" : > >> >> Hi Jamie, >> >> The limits are the same in 5.0.0. We'll look into the FAQ. >> >> Thanks, >> Tru. >> >> >> >> >> From: gpfsug-discuss-request at spectrumscale.org >> To: gpfsug-discuss at spectrumscale.org >> Date: 02/05/2018 07:00 AM >> Subject: gpfsug-discuss Digest, Vol 73, Issue 9 >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at spectrumscale.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e=[2] >> >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at spectrumscale.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at spectrumscale.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) >> >> >> > ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Sun, 04 Feb 2018 14:58:39 -0500 >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" > >> Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? >> Message-ID: >> <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> >> Content-Type: text/plain; charset=ISO-8859-1; >> DelSp="Yes"; >> format="flowed" >> >> Here is what I found for versions 4 & 3.5: >> * Maximum Number of Dependent Filesets: 10,000 >> * Maximum Number of Independent Filesets: 1,000 >> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets[3] >> >> >> >> I'm having some difficulty finding published documentation on >> limitations for version 5: >> >> > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm[4] >> >> >> > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm[5] >> >> >> Any hints? >> >> Thanks >> Jaime >> >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e=[6] >> >> >> >> End of gpfsug-discuss Digest, Vol 73, Issue 9 >> ********************************************* >> >> >> >> > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=Dln7axLq9ej2KttpKZJwLKuvxfSDkPErDQI5KCAQcg4&e=[7] > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=ZMGxi-PBv5-WEGj5RFm1QV0K8azswe9Z-C6rE1ey-UQ&e=[8] > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire > PO6 3AU > > > > Links: > ------ > [1] https://www.youracclaim.com/user/danel-kidger > [2] > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > [3] > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets > [4] > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm > [5] > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm > [6] > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > [7] > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=Dln7axLq9ej2KttpKZJwLKuvxfSDkPErDQI5KCAQcg4&e= > [8] > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=ZMGxi-PBv5-WEGj5RFm1QV0K8azswe9Z-C6rE1ey-UQ&e= > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From jtucker at pixitmedia.com Mon Feb 5 16:11:58 2018 From: jtucker at pixitmedia.com (Jez Tucker) Date: Mon, 5 Feb 2018 16:11:58 +0000 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca> Message-ID: Hi ? IIRC these are hard limits - at least were a year or so ago. I have a customers with ~ 7500 dependent filesets and knocking on the door of the 1000 independent fileset limit. Before independent filesets were 'a thing', projects were created with dependent filesets.? However the arrival of independent filesets, per-fileset snapshotting etc. and improved workflow makes these a per-project primary choice - but with 10x less to operate with :-/ If someone @ IBM fancied upping the #defines x10 and confirming the testing limit, that would be appreciated :-) If you need testing kit, happy to facilitate. Best, Jez On 05/02/18 14:19, Daniel Kidger wrote: > Jamie, > I believe at least one of those limits is 'maximum supported' rather > than an architectural limit. > Is your use case one which would push these?boundaries? ?If so care to > describe what you would wish to do? > Daniel > > IBM Storage Professional Badge > > > *Dr Daniel Kidger* > IBM?Technical Sales Specialist > Software Defined Solution Sales > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" > , "Truong Vu" > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > Date: Mon, Feb 5, 2018 2:56 PM > Thanks Truong > Jaime > > Quoting "Truong Vu" : > > > > > Hi Jamie, > > > > The limits are the same in 5.0.0. ?We'll look into the FAQ. > > > > Thanks, > > Tru. > > > > > > > > > > From: gpfsug-discuss-request at spectrumscale.org > > To: gpfsug-discuss at spectrumscale.org > > Date: 02/05/2018 07:00 AM > > Subject: gpfsug-discuss Digest, Vol 73, Issue 9 > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > > > Send gpfsug-discuss mailing list submissions to > > gpfsug-discuss at spectrumscale.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > > > or, via email, send a message with subject or body 'help' to > > gpfsug-discuss-request at spectrumscale.org > > > > You can reach the person managing the list at > > gpfsug-discuss-owner at spectrumscale.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of gpfsug-discuss digest..." > > > > > > Today's Topics: > > > > ? ?1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) > > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sun, 04 Feb 2018 14:58:39 -0500 > > From: "Jaime Pinto" > > To: "gpfsug main discussion list" > > Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > > Message-ID: > > <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> > > Content-Type: text/plain; charset=ISO-8859-1; > > DelSp="Yes"; > > format="flowed" > > > > Here is what I found for versions 4 & 3.5: > > * Maximum Number of Dependent Filesets: 10,000 > > * Maximum Number of Independent Filesets: 1,000 > > > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets > > > > > > > > I'm having some difficulty finding published documentation on > > limitations for version 5: > > > > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm > > > > > > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm > > > > > > Any hints? > > > > Thanks > > Jaime > > > > > > --- > > Jaime Pinto > > SciNet HPC Consortium - Compute/Calcul Canada > > www.scinet.utoronto.ca - www.computecanada.ca > > University of Toronto > > > > > > ---------------------------------------------------------------- > > This message was sent using IMP at SciNet Consortium, University of > > Toronto. > > > > > > > > > > ------------------------------ > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > > > > > > > End of gpfsug-discuss Digest, Vol 73, Issue 9 > > ********************************************* > > > > > > > > > > > > > > > ?? ? ? ? ?************************************ > ?? ? ? ? ? TELL US ABOUT YOUR SUCCESS STORIES > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=Dln7axLq9ej2KttpKZJwLKuvxfSDkPErDQI5KCAQcg4&e= > ?? ? ? ? ?************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University > of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=ZMGxi-PBv5-WEGj5RFm1QV0K8azswe9Z-C6rE1ey-UQ&e= > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Feb 7 21:28:46 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 7 Feb 2018 16:28:46 -0500 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Message-ID: I noticed something curious after migrating some nodes from 4.1 to 4.2 which is that mounts now can take foorrreeevverrr. It seems to boil down to the point in the mount process where getEFOptions is called. To highlight the difference-- 4.1: # /usr/bin/time /usr/lpp/mmfs/bin/mmcommon getEFOptions dnb02 skipMountPointCheck >/dev/null 0.16user 0.04system 0:00.43elapsed 45%CPU (0avgtext+0avgdata 9108maxresident)k 0inputs+2768outputs (0major+15404minor)pagefaults 0swaps 4.2: /usr/bin/time /usr/lpp/mmfs/bin/mmcommon getEFOptions dnb02 skipMountPointCheck >/dev/null 9.75user 3.79system 0:23.35elapsed 58%CPU (0avgtext+0avgdata 10832maxresident)k 0inputs+38104outputs (0major+3135097minor)pagefaults 0swaps that's uh...a 543x increase. Which, if you have 25+ filesystems and 3500 nodes that time really starts to add up. It looks like under 4.2 this getEFOptions function triggers a bunch of mmsdrfs parsing happens and node lists get generated whereas on 4.1 that doesn't happen. Digging in a little deeper it looks to me like the big difference is in gpfsClusterInit after the node fetches the "shadow" mmsdrs file. Here's a 4.1 node: gpfsClusterInit:mmsdrfsdef.sh[2827]> loginPrefix='' gpfsClusterInit:mmsdrfsdef.sh[2828]> [[ -n '' ]] gpfsClusterInit:mmsdrfsdef.sh[2829]> /usr/bin/scp supersecrethost:/var/mmfs/gen/mmsdrfs /var/mmfs/gen/mmsdrfs.25326 gpfsClusterInit:mmsdrfsdef.sh[2830]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2831]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2863]> [[ -f /var/mmfs/gen/mmsdrfs.25326 ]] gpfsClusterInit:mmsdrfsdef.sh[2867]> /usr/bin/diff /var/mmfs/gen/mmsdrfs.25326 /var/mmfs/gen/mmsdrfs gpfsClusterInit:mmsdrfsdef.sh[2867]> 1> /dev/null 2> /dev/null gpfsClusterInit:mmsdrfsdef.sh[2868]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2869]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2874]> sdrfsFile=/var/mmfs/gen/mmsdrfs gpfsClusterInit:mmsdrfsdef.sh[2875]> /bin/rm -f /var/mmfs/gen/mmsdrfs.25326 Here's a 4.2 node: gpfsClusterInit:mmsdrfsdef.sh[2938]> loginPrefix='' gpfsClusterInit:mmsdrfsdef.sh[2939]> [[ -n '' ]] gpfsClusterInit:mmsdrfsdef.sh[2940]> /usr/bin/scp supersecrethost:/var/mmfs/gen/mmsdrfs /var/mmfs/gen/mmsdrfs.8534 gpfsClusterInit:mmsdrfsdef.sh[2941]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2942]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2974]> /bin/rm -f /var/mmfs/tmp/cmdTmpDir.mmcommon.8534/tmpsdrfs.gpfsClusterInit gpfsClusterInit:mmsdrfsdef.sh[2975]> [[ -f /var/mmfs/gen/mmsdrfs.8534 ]] gpfsClusterInit:mmsdrfsdef.sh[2979]> /usr/bin/diff /var/mmfs/gen/mmsdrfs.8534 /var/mmfs/gen/mmsdrfs gpfsClusterInit:mmsdrfsdef.sh[2979]> 1> /dev/null 2> /dev/null gpfsClusterInit:mmsdrfsdef.sh[2980]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2981]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2986]> sdrfsFile=/var/mmfs/gen/mmsdrfs it looks like the 4.1 code deletes the shadow mmsdrfs file is it's not different from what's locally on the node where as 4.2 does *not* do that. This seems to cause a problem when checkMmfsEnvironment is called because it will return 1 if the shadow file exists which according to the function comments indicates "something is not right", triggering the environment update where the slowdown is incurred. On 4.1 checkMmfsEnvironment returned 0 because the shadow mmsdrfs file had been removed, whereas on 4.2 it returned 1 because the shadow mmsdrfs file still existed despite it being identical to the mmsdrfs on the node. I've looked at 4.2.3.6 (efix12) and it doesn't look like 4.2.3.7 has dropped yet so it may be this has been fixed there. Maybe it's time for a PMR... -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From tortay at cc.in2p3.fr Thu Feb 8 07:08:50 2018 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Thu, 8 Feb 2018 08:08:50 +0100 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: References: Message-ID: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> On 07/02/2018 22:28, Aaron Knister wrote: > I noticed something curious after migrating some nodes from 4.1 to 4.2 > which is that mounts now can take foorrreeevverrr. It seems to boil down > to the point in the mount process where getEFOptions is called. > > To highlight the difference-- > [...] > Hello, I have had this (or a very similar) issue after migrating from 4.1.1.8 to 4.2.3. There are 37 filesystems in our main cluster, which made the problem really noticeable. A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, I'm told, should be released today) actually resolve my problems (APAR IJ03192 & IJ03235). Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From Tomasz.Wolski at ts.fujitsu.com Thu Feb 8 10:35:54 2018 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Thu, 8 Feb 2018 10:35:54 +0000 Subject: [gpfsug-discuss] Inode scan optimization Message-ID: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the "hidden" config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Feb 8 12:44:35 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 8 Feb 2018 07:44:35 -0500 Subject: [gpfsug-discuss] Inode scan optimization In-Reply-To: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: You mention that all the NSDs are metadata and data but you do not say how many NSDs are defined or the type of storage used, that is are these on SAS or NL-SAS storage? I'm assuming they are not on SSDs/flash storage. Have you considered moving the metadata to separate NSDs, preferably SSD/flash storage? This is likely to give you a significant performance boost. You state that using the inode scan API you reduced the time to 40 days. Did you analyze your backup application to determine where the time was being spent for the backup? If the inode scan is a small percentage of your backup time then optimizing it will not provide much benefit. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the ?hidden? config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=y2y22xZuqjpkKfO2WSdcJsBXMaM8hOedaB_AlgFlIb0&s=DL0ZnBuH9KpvKN6XQNvoYmvwfZDbbwMlM-4rCbsAgWo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 8 13:56:42 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Feb 2018 08:56:42 -0500 Subject: [gpfsug-discuss] Inode scan optimization In-Reply-To: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: Recall that many years ago we demonstrated a Billion files scanned with mmapplypolicy in under 20 minutes... And that was on ordinary at the time, spinning disks (not SSD!)... Granted we packed about 1000 files per directory and made some other choices that might not be typical usage.... OTOH storage and nodes have improved since then... SO when you say it takes 60 days to backup 2 billion files and that's a problem.... Like any large computing job, one has to do some analysis to find out what parts of the job are taking how much time... So... what commands are you using to do the backup...? What timing statistics or measurements have you collected? If you are using mmbackup and/or mmapplypolicy, those commands can show you how much time they spend scanning the file system looking for files to backup AND then how much time they spend copying the data to backup media. In fact they operate in distinct phases... directory scan, inode scan, THEN data copying ... so it's straightforward to see which phases are taking how much time. OH... I see you also say you are using gpfs_stat_inode_with_xattrs64 -- These APIs are tricky and not a panacea.... That's why we provide you with mmapplypolicy which in fact uses those APIs in clever, patented ways -- optimized and honed with years of work.... And more recently, we provided you with samples/ilm/mmfind -- which has the functionality of the classic unix find command -- but runs in parallel - using mmapplypolicy. TRY IT on you file system! From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the ?hidden? config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 8 15:33:13 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Feb 2018 10:33:13 -0500 Subject: [gpfsug-discuss] Inode scan optimization - (Tomasz.Wolski@ts.fujitsu.com ) In-Reply-To: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: Please clarify and elaborate .... When you write "a full backup ... takes 60 days" - that seems very poor indeed. BUT you haven't stated how much data is being copied to what kind of backup media nor how much equipment or what types you are using... Nor which backup software... We have Spectrum Scale installation doing nightly backups of huge file systems using the mmbackup command with TivoliStorageManager backup, using IBM branded or approved equipment and software. From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the ?hidden? config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Feb 8 15:52:22 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 08 Feb 2018 10:52:22 -0500 Subject: [gpfsug-discuss] Inode scan optimization - (Tomasz.Wolski@ts.fujitsu.com ) In-Reply-To: References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: <9124.1518105142@turing-police.cc.vt.edu> On Thu, 08 Feb 2018 10:33:13 -0500, "Marc A Kaplan" said: > Please clarify and elaborate .... When you write "a full backup ... takes > 60 days" - that seems very poor indeed. > BUT you haven't stated how much data is being copied to what kind of > backup media nor how much equipment or what types you are using... Nor > which backup software... > > We have Spectrum Scale installation doing nightly backups of huge file > systems using the mmbackup command with TivoliStorageManager backup, using > IBM branded or approved equipment and software. How long did the *first* TSM backup take? Remember that TSM does the moral equivalent of a 'full' backup at first, and incrementals thereafter. So it's quite possible to have a very large filesystem with little data churn to do incrementals in 5-6 hours, even though the first one took several weeks. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Feb 8 15:59:44 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 8 Feb 2018 15:59:44 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Hi All, We are in a bit of a difficult situation right now with one of our non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are looking for some advice on how to deal with this unfortunate situation. We have a non-IBM FC storage array with dual-?redundant? controllers. One of those controllers is dead and the vendor is sending us a replacement. However, the replacement controller will have mis-matched firmware with the surviving controller and - long story short - the vendor says there is no way to resolve that without taking the storage array down for firmware upgrades. Needless to say there?s more to that story than what I?ve included here, but I won?t bore everyone with unnecessary details. The storage array has 5 NSDs on it, but fortunately enough they are part of our ?capacity? pool ? i.e. the only way a file lands here is if an mmapplypolicy scan moved it there because the *access* time is greater than 90 days. Filesystem data replication is set to one. So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. There?s over 250 TB of data on those 5 NSDs, so restriping off of them or setting replication to two are not options. It is very unlikely that anyone would try to access a file on those NSDs during the hour or so I?d need to do the firmware upgrades, but how would GPFS itself react to those (suspended) disks going away for a while? I?m thinking I could be OK if there was just a way to actually stop them rather than suspend them. Any undocumented options to mmchdisk that I?m not aware of??? Are there other options - besides buying IBM hardware - that I am overlooking? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Feb 8 16:23:33 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 8 Feb 2018 16:23:33 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: Sorry I can?t help? the only thing going round and round my head right now is why on earth the existing controller cannot push the required firmware to the new one when it comes online. Never heard of anything else! Feel free to name and shame so I can avoid ? Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] mmchdisk suspend / stop Hi All, We are in a bit of a difficult situation right now with one of our non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are looking for some advice on how to deal with this unfortunate situation. We have a non-IBM FC storage array with dual-?redundant? controllers. One of those controllers is dead and the vendor is sending us a replacement. However, the replacement controller will have mis-matched firmware with the surviving controller and - long story short - the vendor says there is no way to resolve that without taking the storage array down for firmware upgrades. Needless to say there?s more to that story than what I?ve included here, but I won?t bore everyone with unnecessary details. The storage array has 5 NSDs on it, but fortunately enough they are part of our ?capacity? pool ? i.e. the only way a file lands here is if an mmapplypolicy scan moved it there because the *access* time is greater than 90 days. Filesystem data replication is set to one. So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. There?s over 250 TB of data on those 5 NSDs, so restriping off of them or setting replication to two are not options. It is very unlikely that anyone would try to access a file on those NSDs during the hour or so I?d need to do the firmware upgrades, but how would GPFS itself react to those (suspended) disks going away for a while? I?m thinking I could be OK if there was just a way to actually stop them rather than suspend them. Any undocumented options to mmchdisk that I?m not aware of??? Are there other options - besides buying IBM hardware - that I am overlooking? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Feb 8 16:25:33 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 8 Feb 2018 16:25:33 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop Message-ID: Check out ?unmountOnDiskFail? config parameter perhaps? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_tuningguide.htm unmountOnDiskFail The unmountOnDiskFail specifies how the GPFS daemon responds when a disk failure is detected. The valid values of this parameter are yes, no, and meta. The default value is no. I have it set to ?meta? which prevents the file system from unmounting if an NSD fails and the metadata is still available. I have 2 replicas of metadata and one data. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Thursday, February 8, 2018 at 10:15 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] mmchdisk suspend / stop So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Feb 8 16:31:25 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 08 Feb 2018 11:31:25 -0500 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: Message-ID: <14127.1518107485@turing-police.cc.vt.edu> On Thu, 08 Feb 2018 16:25:33 +0000, "Oesterlin, Robert" said: > unmountOnDiskFail > The unmountOnDiskFail specifies how the GPFS daemon responds when a disk > failure is detected. The valid values of this parameter are yes, no, and meta. > The default value is no. I suspect that the only relevant setting there is the default 'no' - it sounds like these 5 NSD's are just one storage pool in a much larger filesystem, and Kevin doesn't want the entire thing to unmount if GPFS notices that the NSDs went walkies. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Feb 8 17:10:39 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Feb 2018 12:10:39 -0500 Subject: [gpfsug-discuss] Inode scan optimization - (Tomasz.Wolski@ts.fujitsu.com ) In-Reply-To: <9124.1518105142@turing-police.cc.vt.edu> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> <9124.1518105142@turing-police.cc.vt.edu> Message-ID: Let's give Fujitsu an opportunity to answer with some facts and re-pose their questions. When I first read the complaint, I kinda assumed they were using mmbackup and TSM -- but then I noticed words about some gpfs_XXX apis.... So it looks like this Fujitsu fellow is "rolling his own"... NOT using mmapplypolicy. And we don't know if he is backing up to an old paper tape punch device or what ! He's just saying that whatever it is that he did took 60 days... Can you get from here to there faster? Sure, take an airplane instead of walking! My other remark which had a typo was and is: There have many satisfied customers and installations of Spectrum Scale File System using mmbackup and/or Tivoli Storage Manager. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sxiao at us.ibm.com Thu Feb 8 17:17:45 2018 From: sxiao at us.ibm.com (Steve Xiao) Date: Thu, 8 Feb 2018 12:17:45 -0500 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: References: Message-ID: You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Feb 8 19:38:33 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 8 Feb 2018 19:38:33 +0000 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: References: Message-ID: <550b2cc6552f4e669d2cfee72b1a244a@jumptrading.com> I don't know or care who the hardware vendor is, but they can DEFINITELY ship you a controller with the right firmware! Just demand it, which is what I do and they have basically always complied with the request. There is the risk associated with running even longer with a single point of failure, only using the surviving controller, but if this storage system has been in production a long time (e.g. a year or so) and is generally reliable, then they should be able to get you a new, factory tested controller with the right FW versions in a couple of days. The choice is yours of course, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Steve Xiao Sent: Thursday, February 08, 2018 11:18 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) Note: External Email ________________________________ You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Feb 8 19:48:54 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 8 Feb 2018 19:48:54 +0000 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: References: Message-ID: <769B6E06-BAB5-4EDB-A5A3-54E1063A8A6D@vanderbilt.edu> Hi again all, It sounds like doing the ?mmchconfig unmountOnDiskFail=meta -i? suggested by Steve and Bob followed by using mmchdisk to stop the disks temporarily is the way we need to go. We will, as an aside, also run a mmapplypolicy first to pull any files users have started accessing again back to the ?regular? pool before doing any of this. Given that this is our ?capacity? pool and files have to have an atime > 90 days to get migrated there in the 1st place I think this is reasonable. Especially since users will get an I/O error if they happen to try to access one of those NSDs during the brief maintenance window. As to naming and shaming the vendor ? I?m not going to do that at this point in time. We?ve been using their stuff for well over a decade at this point and have had a generally positive experience with them. In fact, I have spoken with them via phone since my original post today and they have clarified that the problem with the mismatched firmware is only an issue because we are a major version off of what is current due to us choosing to not have a downtime and therefore not having done any firmware upgrades in well over 18 months. Thanks, all... Kevin On Feb 8, 2018, at 11:17 AM, Steve Xiao > wrote: You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C435bd89b3fcc4a94ee5008d56f17e49e%7C5f88b91902e3490fb772327aa8177b95%7C0%7C0%7C636537070783260582&sdata=AbY7rJQecb76rMC%2FlxrthyzHfueQDJTT%2FJuuRCac5g8%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Thu Feb 8 18:33:32 2018 From: ewahl at osc.edu (Edward Wahl) Date: Thu, 8 Feb 2018 13:33:32 -0500 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: <20180208133332.30440b89@osc.edu> I'm with Richard on this one. Sounds dubious to me. Even older style stuff could start a new controller in a 'failed' or 'service' state and push firmware back in the 20th century... ;) Ed On Thu, 8 Feb 2018 16:23:33 +0000 "Sobey, Richard A" wrote: > Sorry I can?t help? the only thing going round and round my head right now is > why on earth the existing controller cannot push the required firmware to the > new one when it comes online. Never heard of anything else! Feel free to name > and shame so I can avoid ? > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, > Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmchdisk > suspend / stop > > Hi All, > > We are in a bit of a difficult situation right now with one of our non-IBM > hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are > looking for some advice on how to deal with this unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? controllers. One of > those controllers is dead and the vendor is sending us a replacement. > However, the replacement controller will have mis-matched firmware with the > surviving controller and - long story short - the vendor says there is no way > to resolve that without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included here, but > I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are part of > our ?capacity? pool ? i.e. the only way a file lands here is if an > mmapplypolicy scan moved it there because the *access* time is greater than > 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either suspend > or (preferably) stop those NSDs, do the firmware upgrade, and resume the > NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents > the allocation of new blocks ? so, in theory, if a user suddenly decided to > start using a file they hadn?t needed for 3 months then I?ve got a problem. > Stopping all I/O to the disks is what I really want to do. However, > according to the mmchdisk man page stop cannot be used on a filesystem with > replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of them or > setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those NSDs > during the hour or so I?d need to do the firmware upgrades, but how would > GPFS itself react to those (suspended) disks going away for a while? I?m > thinking I could be OK if there was just a way to actually stop them rather > than suspend them. Any undocumented options to mmchdisk that I?m not aware > of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From aaron.s.knister at nasa.gov Thu Feb 8 20:22:52 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 8 Feb 2018 15:22:52 -0500 (EST) Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> Message-ID: Hi Loic, Thank you for that information! I have two follow up questions-- 1. Are you using ccr? 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. what's the output of "mmlsconfig mmsdrservPort" on your cluster?). -Aaron On Thu, 8 Feb 2018, Loic Tortay wrote: > On 07/02/2018 22:28, Aaron Knister wrote: >> I noticed something curious after migrating some nodes from 4.1 to 4.2 >> which is that mounts now can take foorrreeevverrr. It seems to boil down >> to the point in the mount process where getEFOptions is called. >> >> To highlight the difference-- >> > [...] >> > Hello, > I have had this (or a very similar) issue after migrating from 4.1.1.8 to > 4.2.3. There are 37 filesystems in our main cluster, which made the problem > really noticeable. > > A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, > I'm told, should be released today) actually resolve my problems (APAR > IJ03192 & IJ03235). > > > Lo?c. > -- > | Lo?c Tortay - IN2P3 Computing Centre | > From Robert.Oesterlin at nuance.com Thu Feb 8 20:34:35 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 8 Feb 2018 20:34:35 +0000 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Message-ID: We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. Bob Oesterlin Sr Principal Storage Engineer, Nuance SSUG Co-principal -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Feb 8 21:11:34 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 8 Feb 2018 21:11:34 +0000 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> Message-ID: <2dbcc01f542d40698a7ad6cc10d2dbd1@jumptrading.com> It may be related to this issue of using root squashed file system option, here are some edited comments from my colleague who stumbled upon this while chatting with a friend at a CUG: " Something I learned last week: apparently the libmount code from util-linux (used by /bin/mount) will call utimensat() on new mountpoints if access() fails (for example, on root-squashed filesystems). This is done "just to be sure" that the filesystem is really read-only. This operation can be quite expensive and (anecdotally) may cause huge slowdowns when mounting root-squashed parallel filesystems on thousands of clients. Here is the relevant code: https://github.com/karelzak/util-linux/blame/1ea4e7bd8d9d0f0ef317558c627e6fa069950e8d/libmount/src/utils.c#L222 This code has been in util-linux for years. It's not clear exactly what the impact is in our environment, but this certainly can't be helping, especially since we've grown the size of the cluster considerably. Mounting GPFS has recently really become a slow and disruptive operation ? if you try to mount many clients at once, the FS will hang for a considerable period of time. The timing varies, but here is one example from an isolated mounting operation: 12:09:11.222513 mount("", "", "gpfs", MS_MGC_VAL, "dev="...) = 0 <1.590217> 12:09:12.812777 access("", W_OK) = -1 EACCES (Permission denied) <0.000022> 12:09:12.812841 utimensat(AT_FDCWD, "", \{UTIME_NOW, \{93824994378048, 1073741822}}, 0) = -1 EPERM (Operation not permitted) <2.993689> Here, the utimensat() took ~3 seconds, almost twice as long as the mount operation! I also suspect it will slow down other clients trying to mount the filesystem since the sgmgr has to process this write attempt to the mountpoint. (Hilariously, it still returns the "wrong" answer, because this filesystem is not read-only, just squashed.) As of today, the person who originally brought the issue to my attention at CUG has raised it for discussion on the util-linux mailing list. https://marc.info/?l=util-linux-ng&m=151075932824688&w=2 " We ended up putting facls on our mountpoints like such, which hacked around this stupidity: for fs in gpfs_mnt_point ; do chmod 1755 $fs setfacl -m u:99:rwx $fs # 99 is the "nobody" uid to which root is mapped--see "mmauth" output done Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aaron Knister Sent: Thursday, February 08, 2018 2:23 PM To: Loic Tortay Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Note: External Email ------------------------------------------------- Hi Loic, Thank you for that information! I have two follow up questions-- 1. Are you using ccr? 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. what's the output of "mmlsconfig mmsdrservPort" on your cluster?). -Aaron On Thu, 8 Feb 2018, Loic Tortay wrote: > On 07/02/2018 22:28, Aaron Knister wrote: >> I noticed something curious after migrating some nodes from 4.1 to 4.2 >> which is that mounts now can take foorrreeevverrr. It seems to boil down >> to the point in the mount process where getEFOptions is called. >> >> To highlight the difference-- >> > [...] >> > Hello, > I have had this (or a very similar) issue after migrating from 4.1.1.8 to > 4.2.3. There are 37 filesystems in our main cluster, which made the problem > really noticeable. > > A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, > I'm told, should be released today) actually resolve my problems (APAR > IJ03192 & IJ03235). > > > Lo?c. > -- > | Lo?c Tortay > - IN2P3 Computing Centre | > ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tortay at cc.in2p3.fr Fri Feb 9 08:59:12 2018 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Fri, 9 Feb 2018 09:59:12 +0100 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> Message-ID: <969a0c4b-a3b0-2fdb-80f4-2913bc9b0a67@cc.in2p3.fr> On 02/08/2018 09:22 PM, Aaron Knister wrote: > Hi Loic, > > Thank you for that information! > > I have two follow up questions-- > 1. Are you using ccr? > 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. > what's the output of "mmlsconfig mmsdrservPort" on your cluster?). > Hello, We do not use CCR on this cluster (yet). We use the default port for mmsdrserv: # mmlsconfig mmsdrservPort mmsdrservPort 1191 Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From Renar.Grunenberg at huk-coburg.de Fri Feb 9 09:06:32 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 09:06:32 +0000 Subject: [gpfsug-discuss] V5 Experience Message-ID: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From frankli at us.ibm.com Fri Feb 9 11:29:17 2018 From: frankli at us.ibm.com (Frank N Lee) Date: Fri, 9 Feb 2018 05:29:17 -0600 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th In-Reply-To: References: Message-ID: Bob, Can you provide your email or shall I just reply here? Frank Frank Lee, PhD IBM Systems 314-482-5329 | @drfranknlee From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 02/08/2018 02:35 PM Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Sent by: gpfsug-discuss-bounces at spectrumscale.org We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. Bob Oesterlin Sr Principal Storage Engineer, Nuance SSUG Co-principal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HIs14G9Qcs5MqpsAFL5E0TH5hqFD-KbquYdQ_mTmTnI&m=_7q7xOAgpDoLwznJe069elHn1thk8KmxGLgXM6zuST0&s=1aWP0EJWxIsAycMNiVX7v4FWC5BsSzyx566RyllXCCM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Fri Feb 9 11:53:30 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Fri, 9 Feb 2018 12:53:30 +0100 Subject: [gpfsug-discuss] V5 Experience In-Reply-To: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Fri Feb 9 12:30:10 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 9 Feb 2018 12:30:10 +0000 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Message-ID: <1AC64CE4-BEE8-4C4B-BB7D-02A39C176621@nuance.com> Replied to Frank directly. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Frank N Lee Reply-To: gpfsug main discussion list Date: Friday, February 9, 2018 at 5:30 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Bob, Can you provide your email or shall I just reply here? Frank Frank Lee, PhD IBM Systems 314-482-5329 | @drfranknlee From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 02/08/2018 02:35 PM Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Sent by: gpfsug-discuss-bounces at spectrumscale.org We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. Bob Oesterlin Sr Principal Storage Engineer, Nuance SSUG Co-principal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HIs14G9Qcs5MqpsAFL5E0TH5hqFD-KbquYdQ_mTmTnI&m=_7q7xOAgpDoLwznJe069elHn1thk8KmxGLgXM6zuST0&s=1aWP0EJWxIsAycMNiVX7v4FWC5BsSzyx566RyllXCCM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 138 bytes Desc: image001.png URL: From YARD at il.ibm.com Fri Feb 9 13:28:49 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Fri, 9 Feb 2018 15:28:49 +0200 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: <769B6E06-BAB5-4EDB-A5A3-54E1063A8A6D@vanderbilt.edu> References: <769B6E06-BAB5-4EDB-A5A3-54E1063A8A6D@vanderbilt.edu> Message-ID: Hi Just make sure you have a backup, just in case ... Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage architect Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 02/08/2018 09:49 PM Subject: Re: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi again all, It sounds like doing the ?mmchconfig unmountOnDiskFail=meta -i? suggested by Steve and Bob followed by using mmchdisk to stop the disks temporarily is the way we need to go. We will, as an aside, also run a mmapplypolicy first to pull any files users have started accessing again back to the ?regular? pool before doing any of this. Given that this is our ?capacity? pool and files have to have an atime > 90 days to get migrated there in the 1st place I think this is reasonable. Especially since users will get an I/O error if they happen to try to access one of those NSDs during the brief maintenance window. As to naming and shaming the vendor ? I?m not going to do that at this point in time. We?ve been using their stuff for well over a decade at this point and have had a generally positive experience with them. In fact, I have spoken with them via phone since my original post today and they have clarified that the problem with the mismatched firmware is only an issue because we are a major version off of what is current due to us choosing to not have a downtime and therefore not having done any firmware upgrades in well over 18 months. Thanks, all... Kevin On Feb 8, 2018, at 11:17 AM, Steve Xiao wrote: You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C435bd89b3fcc4a94ee5008d56f17e49e%7C5f88b91902e3490fb772327aa8177b95%7C0%7C0%7C636537070783260582&sdata=AbY7rJQecb76rMC%2FlxrthyzHfueQDJTT%2FJuuRCac5g8%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=3yfKUCiWGXtAEPiwlmQNFGTjLx5h3PlCYfUXDBMGJpQ&s=-pkjeFOUVSDUGgwtKkoYbmGLADk2UHfDbUPiuWSw4gQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From knop at us.ibm.com Fri Feb 9 13:32:30 2018 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 9 Feb 2018 08:32:30 -0500 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: <2dbcc01f542d40698a7ad6cc10d2dbd1@jumptrading.com> References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> <2dbcc01f542d40698a7ad6cc10d2dbd1@jumptrading.com> Message-ID: All, For at least one of the instances reported by this group, a PMR has been opened, and a fix is being developed. For folks that are getting affected by the problem: Please contact the service team to confirm your problem is the same as the one previously reported, and for an outlook for the availability of the fix. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Bryan Banister To: gpfsug main discussion list , "Loic Tortay" Date: 02/08/2018 04:11 PM Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Sent by: gpfsug-discuss-bounces at spectrumscale.org It may be related to this issue of using root squashed file system option, here are some edited comments from my colleague who stumbled upon this while chatting with a friend at a CUG: " Something I learned last week: apparently the libmount code from util-linux (used by /bin/mount) will call utimensat() on new mountpoints if access() fails (for example, on root-squashed filesystems). This is done "just to be sure" that the filesystem is really read-only. This operation can be quite expensive and (anecdotally) may cause huge slowdowns when mounting root-squashed parallel filesystems on thousands of clients. Here is the relevant code: https://github.com/karelzak/util-linux/blame/1ea4e7bd8d9d0f0ef317558c627e6fa069950e8d/libmount/src/utils.c#L222 This code has been in util-linux for years. It's not clear exactly what the impact is in our environment, but this certainly can't be helping, especially since we've grown the size of the cluster considerably. Mounting GPFS has recently really become a slow and disruptive operation ? if you try to mount many clients at once, the FS will hang for a considerable period of time. The timing varies, but here is one example from an isolated mounting operation: 12:09:11.222513 mount("", "", "gpfs", MS_MGC_VAL, "dev="...) = 0 <1.590217> 12:09:12.812777 access("", W_OK) = -1 EACCES (Permission denied) <0.000022> 12:09:12.812841 utimensat(AT_FDCWD, "", \{UTIME_NOW, \{93824994378048, 1073741822}}, 0) = -1 EPERM (Operation not permitted) <2.993689> Here, the utimensat() took ~3 seconds, almost twice as long as the mount operation! I also suspect it will slow down other clients trying to mount the filesystem since the sgmgr has to process this write attempt to the mountpoint. (Hilariously, it still returns the "wrong" answer, because this filesystem is not read-only, just squashed.) As of today, the person who originally brought the issue to my attention at CUG has raised it for discussion on the util-linux mailing list. https://marc.info/?l=util-linux-ng&m=151075932824688&w=2 " We ended up putting facls on our mountpoints like such, which hacked around this stupidity: for fs in gpfs_mnt_point ; do chmod 1755 $fs setfacl -m u:99:rwx $fs # 99 is the "nobody" uid to which root is mapped--see "mmauth" output done Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aaron Knister Sent: Thursday, February 08, 2018 2:23 PM To: Loic Tortay Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Note: External Email ------------------------------------------------- Hi Loic, Thank you for that information! I have two follow up questions-- 1. Are you using ccr? 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. what's the output of "mmlsconfig mmsdrservPort" on your cluster?). -Aaron On Thu, 8 Feb 2018, Loic Tortay wrote: > On 07/02/2018 22:28, Aaron Knister wrote: >> I noticed something curious after migrating some nodes from 4.1 to 4.2 >> which is that mounts now can take foorrreeevverrr. It seems to boil down >> to the point in the mount process where getEFOptions is called. >> >> To highlight the difference-- >> > [...] >> > Hello, > I have had this (or a very similar) issue after migrating from 4.1.1.8 to > 4.2.3. There are 37 filesystems in our main cluster, which made the problem > really noticeable. > > A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, > I'm told, should be released today) actually resolve my problems (APAR > IJ03192 & IJ03235). > > > Lo?c. > -- > | Lo?c Tortay - IN2P3 Computing Centre | > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=C0S8WTufrOCvXbHUegB8zS9jk_1SLczALa-4aVEubu4&s=VTWKI-xcUiJ_LeMhJ-xOPmnz0Zm9IspKsU3bsxA4BNo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From carlz at us.ibm.com Fri Feb 9 13:46:51 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Fri, 9 Feb 2018 13:46:51 +0000 Subject: [gpfsug-discuss] V5 Experience In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Feb 9 13:58:58 2018 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 9 Feb 2018 08:58:58 -0500 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Uwe Falke" To: gpfsug main discussion list Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From p.ward at nhm.ac.uk Thu Feb 8 16:46:25 2018 From: p.ward at nhm.ac.uk (Paul Ward) Date: Thu, 8 Feb 2018 16:46:25 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: We tend to get the maintenance company to down-grade the firmware to match what we have for our aging hardware, before sending it to us. I assume this isn?t an option? Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] mmchdisk suspend / stop Hi All, We are in a bit of a difficult situation right now with one of our non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are looking for some advice on how to deal with this unfortunate situation. We have a non-IBM FC storage array with dual-?redundant? controllers. One of those controllers is dead and the vendor is sending us a replacement. However, the replacement controller will have mis-matched firmware with the surviving controller and - long story short - the vendor says there is no way to resolve that without taking the storage array down for firmware upgrades. Needless to say there?s more to that story than what I?ve included here, but I won?t bore everyone with unnecessary details. The storage array has 5 NSDs on it, but fortunately enough they are part of our ?capacity? pool ? i.e. the only way a file lands here is if an mmapplypolicy scan moved it there because the *access* time is greater than 90 days. Filesystem data replication is set to one. So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. There?s over 250 TB of data on those 5 NSDs, so restriping off of them or setting replication to two are not options. It is very unlikely that anyone would try to access a file on those NSDs during the hour or so I?d need to do the firmware upgrades, but how would GPFS itself react to those (suspended) disks going away for a while? I?m thinking I could be OK if there was just a way to actually stop them rather than suspend them. Any undocumented options to mmchdisk that I?m not aware of??? Are there other options - besides buying IBM hardware - that I am overlooking? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Feb 9 14:30:34 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 14:30:34 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: Felipe, all, first thanks for clarification, but what was the reason for this logic? If i upgrade to Version 5 and want to create new filesystems, and the maxblocksize is on 1M, we must shutdown the hole cluster to change this to the defaults to use the new one default. I had no understanding for that decision. We are at 7 x 24h availability with our cluster today, we had no real maintenance window here! Any circumvention are welcome. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Freitag, 9. Februar 2018 14:59 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From oehmes at gmail.com Fri Feb 9 14:47:54 2018 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 09 Feb 2018 14:47:54 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one thing we might consider doing is changing the command to use the current active maxblocksize as input for mmcrfs if maxblocksize is below current default. Sven On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar < Renar.Grunenberg at huk-coburg.de> wrote: > Felipe, all, > > first thanks for clarification, but what was the reason for this logic? If > i upgrade to Version 5 and want to create new filesystems, and the > maxblocksize is on 1M, we must shutdown the hole cluster to change this to > the defaults to use the new one default. I had no understanding for that > decision. We are at 7 x 24h availability with our cluster today, we had no > real maintenance window here! Any circumvention are welcome. > > > > Regards Renar > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *Im Auftrag von *Felipe Knop > *Gesendet:* Freitag, 9. Februar 2018 14:59 > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] V5 Experience -- maxblocksize > > > > All, > > Correct. There is no need to change the value of 'maxblocksize' for > existing clusters which are upgraded to the 5.0.0 level. If a new file > system needs to be created with a block size which exceeds the value of > maxblocksize then the mmchconfig needs to be issued to increase the value > of maxblocksize (which requires the entire cluster to be stopped). > > For clusters newly created with 5.0.0, the value of maxblocksize is set to > 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man > pages in 5.0.0 . > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 > AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe > Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default > is <>1MB, so your config parameter was properly translated. > > From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM > Subject: Re: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > I suppose the new maxBlockSize default is <>1MB, so your config parameter > was properly translated. I'd see no need to change anything. > > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 <+49%20371%2069782165> > Mobile: +49 175 575 2877 <+49%20175%205752877> > E-Mail: uwefalke at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > > Date: 02/09/2018 10:16 AM > Subject: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hallo All, > we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but > I see after the mmchconfig release=LATEST a new common parameter > ?maxblocksize 1M? > (our fs are on these blocksizes) is happening. > Ok, but if I will change this parameter the hole cluster was requestet > that: > > root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT > Verifying GPFS is stopped on all nodes ... > mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de > mmchconfig: Command failed. Examine previous error messages to determine > cause. > Can someone explain the behavior here, and same clarification in an update > plan what can we do to go to the defaults without clusterdown. > Is this a bug or a feature;-) > > Regards Renar > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: > 09561 96-44110 > Telefax: > 09561 96-44104 > E-Mail: > Renar.Grunenberg at huk-coburg.de > Internet: > www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Fri Feb 9 14:59:31 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 14:59:31 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: Hallo Sven, that stated a mmcrfs ?newfs? -B 4M is possible if the maxblocksize is 1M (from the upgrade) without the requirement to change this parameter before?? Correct or not? Regards Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sven Oehme Gesendet: Freitag, 9. Februar 2018 15:48 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one thing we might consider doing is changing the command to use the current active maxblocksize as input for mmcrfs if maxblocksize is below current default. Sven On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar > wrote: Felipe, all, first thanks for clarification, but what was the reason for this logic? If i upgrade to Version 5 and want to create new filesystems, and the maxblocksize is on 1M, we must shutdown the hole cluster to change this to the defaults to use the new one default. I had no understanding for that decision. We are at 7 x 24h availability with our cluster today, we had no real maintenance window here! Any circumvention are welcome. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Freitag, 9. Februar 2018 14:59 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Feb 9 15:08:38 2018 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 09 Feb 2018 15:08:38 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: you can only create a filesystem with a blocksize of what ever current maxblocksize is set. let me discuss with felipe what//if we can share here to solve this. sven On Fri, Feb 9, 2018 at 6:59 AM Grunenberg, Renar < Renar.Grunenberg at huk-coburg.de> wrote: > Hallo Sven, > > that stated a mmcrfs ?newfs? -B 4M is possible if the maxblocksize is 1M > (from the upgrade) without the requirement to change this parameter > before?? Correct or not? > > Regards > > > > > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *Im Auftrag von *Sven Oehme > *Gesendet:* Freitag, 9. Februar 2018 15:48 > > > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] V5 Experience -- maxblocksize > > > > Renar, > > > > if you specify the filesystem blocksize of 1M during mmcr you don't have > to restart anything. scale 5 didn't change anything on the behaviour of > maxblocksize change while the cluster is online, it only changed the > default passed to the blocksize parameter for create a new filesystem. one > thing we might consider doing is changing the command to use the current > active maxblocksize as input for mmcrfs if maxblocksize is below current > default. > > > > Sven > > > > > > On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar < > Renar.Grunenberg at huk-coburg.de> wrote: > > Felipe, all, > > first thanks for clarification, but what was the reason for this logic? If > i upgrade to Version 5 and want to create new filesystems, and the > maxblocksize is on 1M, we must shutdown the hole cluster to change this to > the defaults to use the new one default. I had no understanding for that > decision. We are at 7 x 24h availability with our cluster today, we had no > real maintenance window here! Any circumvention are welcome. > > > > Regards Renar > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > > Telefon: > > 09561 96-44110 > > Telefax: > > 09561 96-44104 > > E-Mail: > > Renar.Grunenberg at huk-coburg.de > > Internet: > > www.huk.de > > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *Im Auftrag von *Felipe Knop > *Gesendet:* Freitag, 9. Februar 2018 14:59 > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] V5 Experience -- maxblocksize > > > > All, > > Correct. There is no need to change the value of 'maxblocksize' for > existing clusters which are upgraded to the 5.0.0 level. If a new file > system needs to be created with a block size which exceeds the value of > maxblocksize then the mmchconfig needs to be issued to increase the value > of maxblocksize (which requires the entire cluster to be stopped). > > For clusters newly created with 5.0.0, the value of maxblocksize is set to > 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man > pages in 5.0.0 . > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 > AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe > Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default > is <>1MB, so your config parameter was properly translated. > > From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM > Subject: Re: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > I suppose the new maxBlockSize default is <>1MB, so your config parameter > was properly translated. I'd see no need to change anything. > > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 <+49%20371%2069782165> > Mobile: +49 175 575 2877 <+49%20175%205752877> > E-Mail: uwefalke at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > > Date: 02/09/2018 10:16 AM > Subject: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hallo All, > we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but > I see after the mmchconfig release=LATEST a new common parameter > ?maxblocksize 1M? > (our fs are on these blocksizes) is happening. > Ok, but if I will change this parameter the hole cluster was requestet > that: > > root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT > Verifying GPFS is stopped on all nodes ... > mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de > mmchconfig: Command failed. Examine previous error messages to determine > cause. > Can someone explain the behavior here, and same clarification in an update > plan what can we do to go to the defaults without clusterdown. > Is this a bug or a feature;-) > > Regards Renar > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: > 09561 96-44110 > Telefax: > 09561 96-44104 > E-Mail: > Renar.Grunenberg at huk-coburg.de > Internet: > www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Feb 9 15:07:32 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 9 Feb 2018 15:07:32 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> Hi All, Since several people have made this same suggestion, let me respond to that. We did ask the vendor - twice - to do that. Their response boils down to, ?No, the older version has bugs and we won?t send you a controller with firmware that we know has bugs in it.? We have not had a full cluster downtime since the summer of 2016 - and then it was only a one day downtime to allow the cleaning of our core network switches after an electrical fire in our data center! So the firmware on not only our storage arrays, but our SAN switches as well, it a bit out of date, shall we say? That is an issue we need to address internally ? our users love us not having regularly scheduled downtimes quarterly, yearly, or whatever, but there is a cost to doing business that way... Kevin On Feb 8, 2018, at 10:46 AM, Paul Ward > wrote: We tend to get the maintenance company to down-grade the firmware to match what we have for our aging hardware, before sending it to us. I assume this isn?t an option? Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Feb 9 15:12:13 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 15:12:13 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: <8388dda58d064620908b9aa62ca86da5@SMXRF105.msg.hukrf.de> Hallo Sven, thanks, it?s clear now. You have work now ;-) Happy Weekend from Coburg. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sven Oehme Gesendet: Freitag, 9. Februar 2018 16:09 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize you can only create a filesystem with a blocksize of what ever current maxblocksize is set. let me discuss with felipe what//if we can share here to solve this. sven On Fri, Feb 9, 2018 at 6:59 AM Grunenberg, Renar > wrote: Hallo Sven, that stated a mmcrfs ?newfs? -B 4M is possible if the maxblocksize is 1M (from the upgrade) without the requirement to change this parameter before?? Correct or not? Regards Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sven Oehme Gesendet: Freitag, 9. Februar 2018 15:48 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one thing we might consider doing is changing the command to use the current active maxblocksize as input for mmcrfs if maxblocksize is below current default. Sven On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar > wrote: Felipe, all, first thanks for clarification, but what was the reason for this logic? If i upgrade to Version 5 and want to create new filesystems, and the maxblocksize is on 1M, we must shutdown the hole cluster to change this to the defaults to use the new one default. I had no understanding for that decision. We are at 7 x 24h availability with our cluster today, we had no real maintenance window here! Any circumvention are welcome. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Freitag, 9. Februar 2018 14:59 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Fri Feb 9 15:25:25 2018 From: p.ward at nhm.ac.uk (Paul Ward) Date: Fri, 9 Feb 2018 15:25:25 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> Message-ID: Not sure why it took over a day for my message to be sent out by the list? If it?s the firmware you currently have, I would still prefer to have it sent to me then I am able to do a controller firmware update online during an at risk period rather than a downtime, all the time you are running on one controller is at risk! Seems you have an alternative. Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 09 February 2018 15:08 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmchdisk suspend / stop Hi All, Since several people have made this same suggestion, let me respond to that. We did ask the vendor - twice - to do that. Their response boils down to, ?No, the older version has bugs and we won?t send you a controller with firmware that we know has bugs in it.? We have not had a full cluster downtime since the summer of 2016 - and then it was only a one day downtime to allow the cleaning of our core network switches after an electrical fire in our data center! So the firmware on not only our storage arrays, but our SAN switches as well, it a bit out of date, shall we say? That is an issue we need to address internally ? our users love us not having regularly scheduled downtimes quarterly, yearly, or whatever, but there is a cost to doing business that way... Kevin On Feb 8, 2018, at 10:46 AM, Paul Ward > wrote: We tend to get the maintenance company to down-grade the firmware to match what we have for our aging hardware, before sending it to us. I assume this isn?t an option? Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzieko at wcss.pl Mon Feb 12 15:11:55 2018 From: dzieko at wcss.pl (Pawel Dziekonski) Date: Mon, 12 Feb 2018 16:11:55 +0100 Subject: [gpfsug-discuss] Configuration advice Message-ID: <20180212151155.GD23944@cefeid.wcss.wroc.pl> Hi All, I inherited from previous admin 2 separate gpfs machines. All hardware+software is old so I want to switch to new servers, new disk arrays, new gpfs version and new gpfs "design". Each machine has 4 gpfs filesystems and runs a TSM HSM client that migrates data to tapes using separate TSM servers: GPFS+HSM no 1 -> TSM server no 1 -> tapes GPFS+HSM no 2 -> TSM server no 2 -> tapes Migration is done by HSM (not GPFS policies). All filesystems are used for archiving results from HPC system and other files (a kind of backup - don't ask...). Data is written by users via nfs shares. There are 8 nfs mount points corresponding to 8 gpfs filesystems, but there is no real reason for that. 4 filesystems are large and heavily used, 4 remaining are almost not used. The question is how to configure new gpfs infrastructure? My initial impression is that I should create a GPFS cluster of 2+ nodes and export NFS using CES. The most important question is how many filesystem do I need? Maybe just 2 and 8 filesets? Or how to do that in a flexible way and not to lock myself in stupid configuration? any hints? thanks, Pawel ps. I will recall all data and copy it to new infrastructure. Yes, that's the way I want to do that. :) -- Pawel Dziekonski , http://www.wcss.pl Wroclaw Centre for Networking & Supercomputing, HPC Department From jonathan.buzzard at strath.ac.uk Tue Feb 13 13:43:01 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 13 Feb 2018 13:43:01 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> Message-ID: <1518529381.3326.93.camel@strath.ac.uk> On Fri, 2018-02-09 at 15:07 +0000, Buterbaugh, Kevin L wrote: > Hi All, > > Since several people have made this same suggestion, let me respond > to that. ?We did ask the vendor - twice - to do that. ?Their response > boils down to, ?No, the older version has bugs and we won?t send you > a controller with firmware that we know has bugs in it.? > > We have not had a full cluster downtime since the summer of 2016 - > and then it was only a one day downtime to allow the cleaning of our > core network switches after an electrical fire in our data center! > ?So the firmware on not only our storage arrays, but our SAN switches > as well, it a bit out of date, shall we say? > > That is an issue we need to address internally ? our users love us > not having regularly scheduled downtimes quarterly, yearly, or > whatever, but there is a cost to doing business that way... > What sort of storage arrays are you using that don't allow you to do a live update of the controller firmware? Heck these days even cheapy Dell MD3 series storage arrays allow you to do live drive firmware updates. Similarly with SAN switches surely you have separate A/B fabrics and can upgrade them one at a time live. In a properly designed system one should not need to schedule downtime for firmware updates. He says as he plans a firmware update on his routers for next Tuesday morning, with no scheduled downtime and no interruption to service. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Kevin.Buterbaugh at Vanderbilt.Edu Tue Feb 13 15:56:00 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 13 Feb 2018 15:56:00 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <1518529381.3326.93.camel@strath.ac.uk> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> <1518529381.3326.93.camel@strath.ac.uk> Message-ID: Hi JAB, OK, let me try one more time to clarify. I?m not naming the vendor ? they?re a small maker of commodity storage and we?ve been using their stuff for years and, overall, it?s been very solid. The problem in this specific case is that a major version firmware upgrade is required ? if the controllers were only a minor version apart we could do it live. And yes, we can upgrade our QLogic SAN switches firmware live ? in fact, we?ve done that in the past. Should?ve been more clear there ? we just try to do that as infrequently as possible. So the bottom line here is that we were unaware that ?major version? firmware upgrades could not be done live on our storage, but we?ve got a plan to work around this this time. Kevin > On Feb 13, 2018, at 7:43 AM, Jonathan Buzzard wrote: > > On Fri, 2018-02-09 at 15:07 +0000, Buterbaugh, Kevin L wrote: >> Hi All, >> >> Since several people have made this same suggestion, let me respond >> to that. We did ask the vendor - twice - to do that. Their response >> boils down to, ?No, the older version has bugs and we won?t send you >> a controller with firmware that we know has bugs in it.? >> >> We have not had a full cluster downtime since the summer of 2016 - >> and then it was only a one day downtime to allow the cleaning of our >> core network switches after an electrical fire in our data center! >> So the firmware on not only our storage arrays, but our SAN switches >> as well, it a bit out of date, shall we say? >> >> That is an issue we need to address internally ? our users love us >> not having regularly scheduled downtimes quarterly, yearly, or >> whatever, but there is a cost to doing business that way... >> > > What sort of storage arrays are you using that don't allow you to do a > live update of the controller firmware? Heck these days even cheapy > Dell MD3 series storage arrays allow you to do live drive firmware > updates. > > Similarly with SAN switches surely you have separate A/B fabrics and > can upgrade them one at a time live. > > In a properly designed system one should not need to schedule downtime > for firmware updates. He says as he plans a firmware update on his > routers for next Tuesday morning, with no scheduled downtime and no > interruption to service. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C16b7c1eca3d846afc65208d572e7b6f1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636541261898197334&sdata=fY66HEDEia55g2x18VETOmE755IH7lXAfoznAewCe5A%3D&reserved=0 From griznog at gmail.com Wed Feb 14 05:32:39 2018 From: griznog at gmail.com (John Hanks) Date: Tue, 13 Feb 2018 21:32:39 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. Message-ID: Hi, We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty straightforward run of the mill stuff. But are seeing this odd behavior. If I do this in a shell script, given a file called "a" cat a a a a a a a a a a > /path/to/gpfs/mount/test grep ATAG /path/to/gpfs/mount/test | wc -l sleep 4 grep ATAG /path/to/gpfs/mount/test | wc -l The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/gpfs/mount/test matches" The second grep | wc -l returns the correct count of ATAG in the file. Why does it take 4 seconds (3 isn't enough) for that file to be properly recognized as a text file and/or why is it seen as a binary file in the first place since a is a plain text file? Note that I have the same filesystem mounted via NFS and over an NFS mount it works as expected. Any illumination is appreciated, jbh -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Feb 14 06:49:42 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 14 Feb 2018 08:49:42 +0200 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: Hi This seems to be setup specific Care to explain a bit more of the setup. Number of nodes GPFS versions, number of FS, Networking, running from admin node, server / client, number of NSD, separated meta and data, etc? I got interested and run a quick test on a gpfs far from powerful cluster of 3 nodes on KVM [root at specscale01 IBM_REPO]# echo "a a a a a a a a a a" > test && grep ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l 0 0 [root at specscale01 IBM_REPO]# -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: John Hanks To: gpfsug-discuss Date: 14/02/2018 07:33 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty straightforward run of the mill stuff. But are seeing this odd behavior. If I do this in a shell script, given a file called "a" cat a a a a a a a a a a > /path/to/gpfs/mount/test grep ATAG /path/to/gpfs/mount/test | wc -l sleep 4 grep ATAG /path/to/gpfs/mount/test | wc -l The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/gpfs/mount/test matches" The second grep | wc -l returns the correct count of ATAG in the file. Why does it take 4 seconds (3 isn't enough) for that file to be properly recognized as a text file and/or why is it seen as a binary file in the first place since a is a plain text file? Note that I have the same filesystem mounted via NFS and over an NFS mount it works as expected. Any illumination is appreciated, jbh_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=ut35qIIMxjZMX3obFJ2xtUMng4MtGtKz4YHxpkgQbak&s=cNt66GjRD6rVhq7nGcvT76l-0_u2C3UTz9SfwzHf1xw&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Feb 14 06:53:20 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 14 Feb 2018 08:53:20 +0200 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: Sorry With cat [root at specscale01 IBM_REPO]# cp test a [root at specscale01 IBM_REPO]# cat a a a a > test && grep ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l 0 0 -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: Luis Bolinches To: gpfsug main discussion list Date: 14/02/2018 08:49 Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi This seems to be setup specific Care to explain a bit more of the setup. Number of nodes GPFS versions, number of FS, Networking, running from admin node, server / client, number of NSD, separated meta and data, etc? I got interested and run a quick test on a gpfs far from powerful cluster of 3 nodes on KVM [root at specscale01 IBM_REPO]# echo "a a a a a a a a a a" > test && grep ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l 0 0 [root at specscale01 IBM_REPO]# -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: John Hanks To: gpfsug-discuss Date: 14/02/2018 07:33 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty straightforward run of the mill stuff. But are seeing this odd behavior. If I do this in a shell script, given a file called "a" cat a a a a a a a a a a > /path/to/gpfs/mount/test grep ATAG /path/to/gpfs/mount/test | wc -l sleep 4 grep ATAG /path/to/gpfs/mount/test | wc -l The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/gpfs/mount/test matches" The second grep | wc -l returns the correct count of ATAG in the file. Why does it take 4 seconds (3 isn't enough) for that file to be properly recognized as a text file and/or why is it seen as a binary file in the first place since a is a plain text file? Note that I have the same filesystem mounted via NFS and over an NFS mount it works as expected. Any illumination is appreciated, jbh_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=ut35qIIMxjZMX3obFJ2xtUMng4MtGtKz4YHxpkgQbak&s=cNt66GjRD6rVhq7nGcvT76l-0_u2C3UTz9SfwzHf1xw&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=HrR-mBJ82ubcbtBin7NGVl2VenLj726Fcah6-3XFvDs&s=d5YiAyXz4el9bF0zjGL9gVjnTfbX4z-qelZodxRqlz0&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From griznog at gmail.com Wed Feb 14 14:20:32 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 06:20:32 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: Hi Luis, GPFS is 4.2.3 (gpfs.base-4.2.3-6.x86_64), All servers (8 in front of a DDN SFA12K) are RHEL 7.3 (stock DDN setup). All 47 clients are CentOS 7.4. GPFS mount: # mount | grep gpfs gsfs0 on /srv/gsfs0 type gpfs (rw,relatime) NFS mount: mount | grep $HOME 10.210.15.57:/srv/gsfs0/home/griznog on /home/griznog type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.210.15.57,mountvers=3,mountport=20048,mountproto=tcp,local_lock=none,addr=10.210.15.57) Example script: #!/bin/bash cat pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt > /srv/gsfs0/projects/pipetest.tmp.txt grep L1 /srv/gsfs0/projects/pipetest.tmp.txt | wc -l cat pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt > $HOME/pipetest.tmp.txt grep L1 $HOME/pipetest.tmp.txt | wc -l Example output: # ./pipetest.sh 1 1836 # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 /srv/gsfs0/projects/pipetest.tmp.txt We can "fix" the user case that exposed this by not using a temp file or inserting a sleep, but I'd still like to know why GPFS is behaving this way and make it stop. mmlsconfig below. Thanks, jbh mmlsconfig Configuration data for cluster SCG-GS.scg-gs0: ---------------------------------------------- clusterName SCG-GS.scg-gs0 clusterId 8456032987852400706 dmapiFileHandleSize 32 maxblocksize 4096K cnfsSharedRoot /srv/gsfs0/GS-NFS cnfsMountdPort 597 socketMaxListenConnections 1024 fileHeatPeriodMinutes 1440 fileHeatLossPercent 1 pingPeriod 5 minMissedPingTimeout 30 afmHashVersion 1 minReleaseLevel 4.2.0.1 [scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7] nsdbufspace 70 [common] healthCheckInterval 20 maxStatCache 512 maxFilesToCache 50000 nsdMinWorkerThreads 512 nsdMaxWorkerThreads 1024 deadlockDetectionThreshold 0 deadlockOverloadThreshold 0 prefetchThreads 288 worker1Threads 320 maxMBpS 2000 [scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7] maxMBpS 24000 [common] atimeDeferredSeconds 300 pitWorkerThreadsPerNode 2 cipherList AUTHONLY pagepool 1G [scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7] pagepool 8G [common] cnfsNFSDprocs 256 nfsPrefetchStrategy 1 autoload yes adminMode central File systems in cluster SCG-GS.scg-gs0: --------------------------------------- /dev/gsfs0 On Tue, Feb 13, 2018 at 10:53 PM, Luis Bolinches wrote: > Sorry > > With cat > > [root at specscale01 IBM_REPO]# cp test a > [root at specscale01 IBM_REPO]# cat a a a a > test && grep ATAG test | wc -l > && sleep 4 && grep ATAG test | wc -l > 0 > 0 > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > Luis Bolinches > Consultant IT Specialist > Mobile Phone: +358503112585 <+358%2050%203112585> > https://www.youracclaim.com/user/luis-bolinches > > "If you always give you will always have" -- Anonymous > > > > From: Luis Bolinches > To: gpfsug main discussion list > Date: 14/02/2018 08:49 > Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by > grep. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi > > This seems to be setup specific > > Care to explain a bit more of the setup. Number of nodes GPFS versions, > number of FS, Networking, running from admin node, server / client, number > of NSD, separated meta and data, etc? > > I got interested and run a quick test on a gpfs far from powerful cluster > of 3 nodes on KVM > > [root at specscale01 IBM_REPO]# echo "a a a a a a a a a a" > test && grep > ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l > 0 > 0 > [root at specscale01 IBM_REPO]# > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > Luis Bolinches > Consultant IT Specialist > Mobile Phone: +358503112585 <+358%2050%203112585> > *https://www.youracclaim.com/user/luis-bolinches* > > > "If you always give you will always have" -- Anonymous > > > > From: John Hanks > To: gpfsug-discuss > Date: 14/02/2018 07:33 > Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi, > > We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty > straightforward run of the mill stuff. But are seeing this odd behavior. If > I do this in a shell script, given a file called "a" > > cat a a a a a a a a a a > /path/to/gpfs/mount/test > grep ATAG /path/to/gpfs/mount/test | wc -l > sleep 4 > grep ATAG /path/to/gpfs/mount/test | wc -l > > The first grep | wc -l returns 1, because grep outputs "Binary file > /path/to/gpfs/mount/test matches" > > The second grep | wc -l returns the correct count of ATAG in the file. > > Why does it take 4 seconds (3 isn't enough) for that file to be properly > recognized as a text file and/or why is it seen as a binary file in the > first place since a is a plain text file? > > Note that I have the same filesystem mounted via NFS and over an NFS mount > it works as expected. > > Any illumination is appreciated, > > jbh_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=ut35qIIMxjZMX3obFJ2xtUMng4MtGtKz4YHxpkgQbak&s=cNt66GjRD6rVhq7nGcvT76l-0_u2C3UTz9SfwzHf1xw&e=* > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > 1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=HrR- > mBJ82ubcbtBin7NGVl2VenLj726Fcah6-3XFvDs&s=d5YiAyXz4el9bF0zjGL9gVjnTfbX4z > -qelZodxRqlz0&e= > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Feb 14 15:08:10 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 14 Feb 2018 10:08:10 -0500 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: <11815.1518620890@turing-police.cc.vt.edu> On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > /srv/gsfs0/projects/pipetest.tmp.txt > > We can "fix" the user case that exposed this by not using a temp file or > inserting a sleep, but I'd still like to know why GPFS is behaving this way > and make it stop. May be related to replication, or other behind-the-scenes behavior. Consider this example - 4.2.3.6, data and metadata replication both set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full fiberchannel mesh to 3 Dell MD34something arrays. % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test 4096+0 records in 4096+0 records out 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test Notice that the first /bin/ls shouldn't be starting until after the dd has completed - at which point it's only allocated half the blocks needed to hold the 4M of data at one site. 5 seconds later, it's allocated the blocks at both sites and thus shows the full 8M needed for 2 copies. I've also seen (but haven't replicated it as I write this) a small file (4-8K or so) showing first one full-sized block, then a second full-sized block, and then dropping back to what's needed for 2 1/32nd fragments. That had me scratching my head Having said that, that's all metadata fun and games, while your case appears to have some problems with data integrity (which is a whole lot scarier). It would be *really* nice if we understood the problem here. The scariest part is: > The first grep | wc -l returns 1, because grep outputs ?"Binary file /path/to/ > gpfs/mount/test matches" which seems to be implying that we're failing on semantic consistency. Basically, your 'cat' command is completing and closing the file, but then a temporally later open of the same find is reading something other that only the just-written data. My first guess is that it's a race condition similar to the following: The cat command is causing a write on one NSD server, and the first grep results in a read from a *different* NSD server, returning the data that *used* to be in the block because the read actually happens before the first NSD server actually completes the write. It may be interesting to replace the grep's with pairs of 'ls -ls / dd' commands to grab the raw data and its size, and check the following: 1) does the size (both blocks allocated and logical length) reported by ls match the amount of data actually read by the dd? 2) Is the file length as actually read equal to the written length, or does it overshoot and read all the way to the next block boundary? 3) If the length is correct, what's wrong with the data that's telling grep that it's a binary file? ( od -cx is your friend here). 4) If it overshoots, is the remainder all-zeros (good) or does it return semi-random "what used to be there" data (bad, due to data exposure issues)? (It's certainly not the most perplexing data consistency issue I've hit in 4 decades - the winner *has* to be a intermittent data read corruption on a GPFS 3.5 cluster that had us, IBM, SGI, DDN, and at least one vendor of networking gear all chasing our tails for 18 months before we finally tracked it down. :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From griznog at gmail.com Wed Feb 14 15:21:52 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 07:21:52 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: <11815.1518620890@turing-police.cc.vt.edu> References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Hi Valdis, I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think this is a data integrity issue, thankfully: $ ./pipetestls.sh 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 /srv/gsfs0/projects/pipetest.tmp.txt 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt $ ./pipetestmd5.sh 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt And replacing grep with 'file' even properly sees the files as ASCII: $ ./pipetestfile.sh /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines I'll poke a little harder at grep next and see what the difference in strace of each reveals. Thanks, jbh On Wed, Feb 14, 2018 at 7:08 AM, wrote: > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this > way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep > 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the dd has > completed - at which point it's only allocated half the blocks needed to > hold > the 4M of data at one site. 5 seconds later, it's allocated the blocks at > both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small file > (4-8K > or so) showing first one full-sized block, then a second full-sized block, > and > then dropping back to what's needed for 2 1/32nd fragments. That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier). It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs "Binary file > /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, but then > a > temporally later open of the same find is reading something other that > only the > just-written data. My first guess is that it's a race condition similar > to the > following: The cat command is causing a write on one NSD server, and the > first > grep results in a read from a *different* NSD server, returning the data > that > *used* to be in the block because the read actually happens before the > first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / dd' > commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, or > does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's telling > grep that > it's a binary file? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it return > semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've hit in > 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear all > chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Feb 14 15:33:24 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 14 Feb 2018 17:33:24 +0200 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Hi not going to mention much on DDN setups but first thing that makes my eyes blurry a bit is minReleaseLevel 4.2.0.1 when you mention your whole cluster is already on 4.2.3 -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: John Hanks To: gpfsug main discussion list Date: 14/02/2018 17:22 Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Valdis, I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think this is a data integrity issue, thankfully: $ ./pipetestls.sh 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 /srv/gsfs0/projects/pipetest.tmp.txt 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt $ ./pipetestmd5.sh 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt And replacing grep with 'file' even properly sees the files as ASCII: $ ./pipetestfile.sh /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines I'll poke a little harder at grep next and see what the difference in strace of each reveals. Thanks, jbh On Wed, Feb 14, 2018 at 7:08 AM, wrote: On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > /srv/gsfs0/projects/pipetest.tmp.txt > > We can "fix" the user case that exposed this by not using a temp file or > inserting a sleep, but I'd still like to know why GPFS is behaving this way > and make it stop. May be related to replication, or other behind-the-scenes behavior. Consider this example - 4.2.3.6, data and metadata replication both set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full fiberchannel mesh to 3 Dell MD34something arrays. % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test 4096+0 records in 4096+0 records out 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test Notice that the first /bin/ls shouldn't be starting until after the dd has completed - at which point it's only allocated half the blocks needed to hold the 4M of data at one site. 5 seconds later, it's allocated the blocks at both sites and thus shows the full 8M needed for 2 copies. I've also seen (but haven't replicated it as I write this) a small file (4-8K or so) showing first one full-sized block, then a second full-sized block, and then dropping back to what's needed for 2 1/32nd fragments. That had me scratching my head Having said that, that's all metadata fun and games, while your case appears to have some problems with data integrity (which is a whole lot scarier). It would be *really* nice if we understood the problem here. The scariest part is: > The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/ > gpfs/mount/test matches" which seems to be implying that we're failing on semantic consistency. Basically, your 'cat' command is completing and closing the file, but then a temporally later open of the same find is reading something other that only the just-written data. My first guess is that it's a race condition similar to the following: The cat command is causing a write on one NSD server, and the first grep results in a read from a *different* NSD server, returning the data that *used* to be in the block because the read actually happens before the first NSD server actually completes the write. It may be interesting to replace the grep's with pairs of 'ls -ls / dd' commands to grab the raw data and its size, and check the following: 1) does the size (both blocks allocated and logical length) reported by ls match the amount of data actually read by the dd? 2) Is the file length as actually read equal to the written length, or does it overshoot and read all the way to the next block boundary? 3) If the length is correct, what's wrong with the data that's telling grep that it's a binary file? ( od -cx is your friend here). 4) If it overshoots, is the remainder all-zeros (good) or does it return semi-random "what used to be there" data (bad, due to data exposure issues)? (It's certainly not the most perplexing data consistency issue I've hit in 4 decades - the winner *has* to be a intermittent data read corruption on a GPFS 3.5 cluster that had us, IBM, SGI, DDN, and at least one vendor of networking gear all chasing our tails for 18 months before we finally tracked it down. :) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_UFKMxNklx_00YDdSlmEr9lCvnUC9AWFsTVbTn6yAr4&s=JUVyUiTIfln67di06lb-hvwpA8207JNkioGxY1ayAlE&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Feb 14 17:51:04 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 14 Feb 2018 12:51:04 -0500 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Just speculating here (also known as making things up) but I wonder if grep is somehow using the file's size in its determination of binary status. I also see mmap in the strace so maybe there's some issue with mmap where some internal GPFS buffer is getting truncated inappropriately but leaving a bunch of null values which gets returned to grep. -Aaron On 2/14/18 10:21 AM, John Hanks wrote: > Hi Valdis, > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > this is a data integrity issue, thankfully: > > $ ./pipetestls.sh? > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > /srv/gsfs0/projects/pipetest.tmp.txt > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt > > $ ./pipetestmd5.sh? > 15cb81a85c9e450bdac8230309453a0a? /srv/gsfs0/projects/pipetest.tmp.txt > 15cb81a85c9e450bdac8230309453a0a? /home/griznog/pipetest.tmp.txt > > And replacing grep with 'file' even properly sees the files as ASCII: > $ ./pipetestfile.sh? > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > I'll poke a little harder at grep next and see what the difference in > strace of each reveals. > > Thanks, > > jbh > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > wrote: > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > #? ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the > dd has > completed - at which point it's only allocated half the blocks > needed to hold > the 4M of data at one site.? 5 seconds later, it's allocated the > blocks at both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small > file (4-8K > or so) showing first one full-sized block, then a second full-sized > block, and > then dropping back to what's needed for 2 1/32nd fragments.? That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier).? It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs ?"Binary file /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, > but then a > temporally later open of the same find is reading something other > that only the > just-written data.? My first guess is that it's a race condition > similar to the > following: The cat command is causing a write on one NSD server, and > the first > grep results in a read from a *different* NSD server, returning the > data that > *used* to be in the block because the read actually happens before > the first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / > dd' commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, > or does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's > telling grep that > it's a binary file?? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it > return semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've > hit in 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear > all chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From griznog at gmail.com Wed Feb 14 18:30:39 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 10:30:39 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Straces are interesting, but don't immediately open my eyes: strace of grep on NFS (works as expected) openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 530721 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3bf6c43000 write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 strace on GPFS (thinks file is binary) openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 262144 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd45ee88000 close(3) = 0 write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches ) = 72 Do the lseek() results indicate that the grep on the GPFS mounted version thinks the file is a sparse file? For comparison I strace'd md5sum in place of the grep and it does not lseek() with SEEK_HOLE, it's access in both cases look identical, like: open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb7d2c2b000 read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 ...[reads clipped]... read(3, "", 24576) = 0 lseek(3, 0, SEEK_CUR) = 530721 close(3) = 0 jbh On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister wrote: > Just speculating here (also known as making things up) but I wonder if > grep is somehow using the file's size in its determination of binary > status. I also see mmap in the strace so maybe there's some issue with > mmap where some internal GPFS buffer is getting truncated > inappropriately but leaving a bunch of null values which gets returned > to grep. > > -Aaron > > On 2/14/18 10:21 AM, John Hanks wrote: > > Hi Valdis, > > > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > > this is a data integrity issue, thankfully: > > > > $ ./pipetestls.sh > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > > /srv/gsfs0/projects/pipetest.tmp.txt > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 > /home/griznog/pipetest.tmp.txt > > > > $ ./pipetestmd5.sh > > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > > > And replacing grep with 'file' even properly sees the files as ASCII: > > $ ./pipetestfile.sh > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > > > I'll poke a little harder at grep next and see what the difference in > > strace of each reveals. > > > > Thanks, > > > > jbh > > > > > > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > > wrote: > > > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt > $HOME/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > > > We can "fix" the user case that exposed this by not using a temp > file or > > > inserting a sleep, but I'd still like to know why GPFS is behaving > this way > > > and make it stop. > > > > May be related to replication, or other behind-the-scenes behavior. > > > > Consider this example - 4.2.3.6, data and metadata replication both > > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > > a full > > fiberchannel mesh to 3 Dell MD34something arrays. > > > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > > 4096+0 records in > > 4096+0 records out > > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > > Notice that the first /bin/ls shouldn't be starting until after the > > dd has > > completed - at which point it's only allocated half the blocks > > needed to hold > > the 4M of data at one site. 5 seconds later, it's allocated the > > blocks at both > > sites and thus shows the full 8M needed for 2 copies. > > > > I've also seen (but haven't replicated it as I write this) a small > > file (4-8K > > or so) showing first one full-sized block, then a second full-sized > > block, and > > then dropping back to what's needed for 2 1/32nd fragments. That > had me > > scratching my head > > > > Having said that, that's all metadata fun and games, while your case > > appears to have some problems with data integrity (which is a whole > lot > > scarier). It would be *really* nice if we understood the problem > here. > > > > The scariest part is: > > > > > The first grep | wc -l returns 1, because grep outputs "Binary > file /path/to/ > > > gpfs/mount/test matches" > > > > which seems to be implying that we're failing on semantic > consistency. > > Basically, your 'cat' command is completing and closing the file, > > but then a > > temporally later open of the same find is reading something other > > that only the > > just-written data. My first guess is that it's a race condition > > similar to the > > following: The cat command is causing a write on one NSD server, and > > the first > > grep results in a read from a *different* NSD server, returning the > > data that > > *used* to be in the block because the read actually happens before > > the first > > NSD server actually completes the write. > > > > It may be interesting to replace the grep's with pairs of 'ls -ls / > > dd' commands to grab the > > raw data and its size, and check the following: > > > > 1) does the size (both blocks allocated and logical length) reported > by > > ls match the amount of data actually read by the dd? > > > > 2) Is the file length as actually read equal to the written length, > > or does it > > overshoot and read all the way to the next block boundary? > > > > 3) If the length is correct, what's wrong with the data that's > > telling grep that > > it's a binary file? ( od -cx is your friend here). > > > > 4) If it overshoots, is the remainder all-zeros (good) or does it > > return semi-random > > "what used to be there" data (bad, due to data exposure issues)? > > > > (It's certainly not the most perplexing data consistency issue I've > > hit in 4 decades - the > > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > > cluster that > > had us, IBM, SGI, DDN, and at least one vendor of networking gear > > all chasing our > > tails for 18 months before we finally tracked it down. :) > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 14 09:00:10 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 14 Feb 2018 09:00:10 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions Message-ID: I am sure this is a known behavior and I am going to feel very foolish in a few minutes... We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Feb 14 18:38:41 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 14 Feb 2018 18:38:41 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions In-Reply-To: References: Message-ID: Is it an AFM cache? We see this sort of behaviour occasionally where the cache has an "old" view of the directory. Doing an ls, it evidently goes back to home but by then you already have weird stuff. The next ls is usually fine. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of john.hearns at asml.com [john.hearns at asml.com] Sent: 14 February 2018 09:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] Odd d????????? permissions I am sure this is a known behavior and I am going to feel very foolish in a few minutes? We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From bbanister at jumptrading.com Wed Feb 14 18:48:32 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 14 Feb 2018 18:48:32 +0000 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Hi all, We found this a while back and IBM fixed it. Here?s your answer: http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385 Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hanks Sent: Wednesday, February 14, 2018 12:31 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. Note: External Email ________________________________ Straces are interesting, but don't immediately open my eyes: strace of grep on NFS (works as expected) openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 530721 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3bf6c43000 write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 strace on GPFS (thinks file is binary) openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 262144 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd45ee88000 close(3) = 0 write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches ) = 72 Do the lseek() results indicate that the grep on the GPFS mounted version thinks the file is a sparse file? For comparison I strace'd md5sum in place of the grep and it does not lseek() with SEEK_HOLE, it's access in both cases look identical, like: open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb7d2c2b000 read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 ...[reads clipped]... read(3, "", 24576) = 0 lseek(3, 0, SEEK_CUR) = 530721 close(3) = 0 jbh On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister > wrote: Just speculating here (also known as making things up) but I wonder if grep is somehow using the file's size in its determination of binary status. I also see mmap in the strace so maybe there's some issue with mmap where some internal GPFS buffer is getting truncated inappropriately but leaving a bunch of null values which gets returned to grep. -Aaron On 2/14/18 10:21 AM, John Hanks wrote: > Hi Valdis, > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > this is a data integrity issue, thankfully: > > $ ./pipetestls.sh > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > /srv/gsfs0/projects/pipetest.tmp.txt > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt > > $ ./pipetestmd5.sh > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > And replacing grep with 'file' even properly sees the files as ASCII: > $ ./pipetestfile.sh > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > I'll poke a little harder at grep next and see what the difference in > strace of each reveals. > > Thanks, > > jbh > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > >> wrote: > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the > dd has > completed - at which point it's only allocated half the blocks > needed to hold > the 4M of data at one site. 5 seconds later, it's allocated the > blocks at both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small > file (4-8K > or so) showing first one full-sized block, then a second full-sized > block, and > then dropping back to what's needed for 2 1/32nd fragments. That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier). It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, > but then a > temporally later open of the same find is reading something other > that only the > just-written data. My first guess is that it's a race condition > similar to the > following: The cat command is causing a write on one NSD server, and > the first > grep results in a read from a *different* NSD server, returning the > data that > *used* to be in the block because the read actually happens before > the first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / > dd' commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, > or does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's > telling grep that > it's a binary file? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it > return semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've > hit in 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear > all chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From griznog at gmail.com Wed Feb 14 19:17:19 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 11:17:19 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Thanks Bryan, mystery solved :) We also stumbled across these related items, in case anyone else wanders into this thread. http://bug-grep.gnu.narkive.com/Y8cfvWDt/bug-27666-grep-on-gpfs-filesystem-seek-hole-problem https://www.ibm.com/developerworks/community/forums/html/topic?id=c2a94433-9ec0-4a4b-abfe-d0a1e721d630 GPFS, the gift that keeps on giving ... me more things to do instead of doing the things I want to be doing. Thanks all, jbh On Wed, Feb 14, 2018 at 10:48 AM, Bryan Banister wrote: > Hi all, > > > > We found this a while back and IBM fixed it. Here?s your answer: > http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385 > > > > Cheers, > > -Bryan > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss- > bounces at spectrumscale.org] *On Behalf Of *John Hanks > *Sent:* Wednesday, February 14, 2018 12:31 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Odd behavior with cat followed by grep. > > > > *Note: External Email* > ------------------------------ > > Straces are interesting, but don't immediately open my eyes: > > > > strace of grep on NFS (works as expected) > > > > openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl > for device) > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > lseek(3, 32768, SEEK_HOLE) = 530721 > > lseek(3, 32768, SEEK_SET) = 32768 > > fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7f3bf6c43000 > > write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 > > > > strace on GPFS (thinks file is binary) > > > > openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl > for device) > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > lseek(3, 32768, SEEK_HOLE) = 262144 > > lseek(3, 32768, SEEK_SET) = 32768 > > fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7fd45ee88000 > > close(3) = 0 > > write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file > /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches > > ) = 72 > > > > Do the lseek() results indicate that the grep on the GPFS mounted version > thinks the file is a sparse file? For comparison I strace'd md5sum in place > of the grep and it does not lseek() with SEEK_HOLE, it's access in both > cases look identical, like: > > > > open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7fb7d2c2b000 > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > ...[reads clipped]... > > read(3, "", 24576) = 0 > > lseek(3, 0, SEEK_CUR) = 530721 > > close(3) = 0 > > > > > > jbh > > > > > > On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister > wrote: > > Just speculating here (also known as making things up) but I wonder if > grep is somehow using the file's size in its determination of binary > status. I also see mmap in the strace so maybe there's some issue with > mmap where some internal GPFS buffer is getting truncated > inappropriately but leaving a bunch of null values which gets returned > to grep. > > -Aaron > > On 2/14/18 10:21 AM, John Hanks wrote: > > Hi Valdis, > > > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > > this is a data integrity issue, thankfully: > > > > $ ./pipetestls.sh > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > > /srv/gsfs0/projects/pipetest.tmp.txt > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 > /home/griznog/pipetest.tmp.txt > > > > $ ./pipetestmd5.sh > > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > > > And replacing grep with 'file' even properly sees the files as ASCII: > > $ ./pipetestfile.sh > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > > > I'll poke a little harder at grep next and see what the difference in > > strace of each reveals. > > > > Thanks, > > > > jbh > > > > > > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > > > wrote: > > > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt > $HOME/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > > > We can "fix" the user case that exposed this by not using a temp > file or > > > inserting a sleep, but I'd still like to know why GPFS is behaving > this way > > > and make it stop. > > > > May be related to replication, or other behind-the-scenes behavior. > > > > Consider this example - 4.2.3.6, data and metadata replication both > > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > > a full > > fiberchannel mesh to 3 Dell MD34something arrays. > > > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > > 4096+0 records in > > 4096+0 records out > > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > > Notice that the first /bin/ls shouldn't be starting until after the > > dd has > > completed - at which point it's only allocated half the blocks > > needed to hold > > the 4M of data at one site. 5 seconds later, it's allocated the > > blocks at both > > sites and thus shows the full 8M needed for 2 copies. > > > > I've also seen (but haven't replicated it as I write this) a small > > file (4-8K > > or so) showing first one full-sized block, then a second full-sized > > block, and > > then dropping back to what's needed for 2 1/32nd fragments. That > had me > > scratching my head > > > > Having said that, that's all metadata fun and games, while your case > > appears to have some problems with data integrity (which is a whole > lot > > scarier). It would be *really* nice if we understood the problem > here. > > > > The scariest part is: > > > > > The first grep | wc -l returns 1, because grep outputs "Binary > file /path/to/ > > > gpfs/mount/test matches" > > > > which seems to be implying that we're failing on semantic > consistency. > > Basically, your 'cat' command is completing and closing the file, > > but then a > > temporally later open of the same find is reading something other > > that only the > > just-written data. My first guess is that it's a race condition > > similar to the > > following: The cat command is causing a write on one NSD server, and > > the first > > grep results in a read from a *different* NSD server, returning the > > data that > > *used* to be in the block because the read actually happens before > > the first > > NSD server actually completes the write. > > > > It may be interesting to replace the grep's with pairs of 'ls -ls / > > dd' commands to grab the > > raw data and its size, and check the following: > > > > 1) does the size (both blocks allocated and logical length) reported > by > > ls match the amount of data actually read by the dd? > > > > 2) Is the file length as actually read equal to the written length, > > or does it > > overshoot and read all the way to the next block boundary? > > > > 3) If the length is correct, what's wrong with the data that's > > telling grep that > > it's a binary file? ( od -cx is your friend here). > > > > 4) If it overshoots, is the remainder all-zeros (good) or does it > > return semi-random > > "what used to be there" data (bad, due to data exposure issues)? > > > > (It's certainly not the most perplexing data consistency issue I've > > hit in 4 decades - the > > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > > cluster that > > had us, IBM, SGI, DDN, and at least one vendor of networking gear > > all chasing our > > tails for 18 months before we finally tracked it down. :) > > > > _______________________________________________ > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Feb 14 20:54:04 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 14 Feb 2018 20:54:04 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions In-Reply-To: References: Message-ID: <9C726F78-D870-4E1E-92B6-96F495F53D54@vanderbilt.edu> Hi John, We had a similar incident happen just a week or so ago here, although in our case it was that certain files within a directory showed up with the question marks, while others didn?t. The problem was simply that the node had been run out of RAM and the GPFS daemon couldn?t allocate memory. Killing the offending process(es) and restarting GPFS fixed the issue. We saw hundreds of messages like: 2018-02-07_16:35:13.267-0600: [E] Failed to allocate 92274688 bytes in memory pool, err -1 In the GPFS log when this was happening. HTHAL? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Feb 14, 2018, at 12:38 PM, Simon Thompson (IT Research Support) > wrote: Is it an AFM cache? We see this sort of behaviour occasionally where the cache has an "old" view of the directory. Doing an ls, it evidently goes back to home but by then you already have weird stuff. The next ls is usually fine. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of john.hearns at asml.com [john.hearns at asml.com] Sent: 14 February 2018 09:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] Odd d????????? permissions I am sure this is a known behavior and I am going to feel very foolish in a few minutes? We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9df4b4d88544447ac29608d573da2d51%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636542303262503651&sdata=v6pnBIEvu6lyP3mGkkRX7hSj58H8vvkUl6R%2FCsq6gmc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From coetzee.ray at gmail.com Wed Feb 14 20:59:52 2018 From: coetzee.ray at gmail.com (Ray Coetzee) Date: Wed, 14 Feb 2018 20:59:52 +0000 Subject: [gpfsug-discuss] Scale 5, filesystem guidelines Message-ID: Since Scale 5.0 was released I've not seen much guidelines provided on how to make the best of the new filesystem layout. For example, is dedicated metadata SSD's still recommended or does the Scale 5 improvements mean we can just do metadata and data pools now? I'd be interested to hear of anyone's experience so far. Kind regards Ray Coetzee -------------- next part -------------- An HTML attachment was scrubbed... URL: From sxiao at us.ibm.com Wed Feb 14 21:53:17 2018 From: sxiao at us.ibm.com (Steve Xiao) Date: Wed, 14 Feb 2018 16:53:17 -0500 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. (John Hanks) In-Reply-To: References: Message-ID: This could be related to the following flash: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1012054 You should contact IBM service to obtain the fix for your release. Steve Y. Xiao gpfsug-discuss-bounces at spectrumscale.org wrote on 02/14/2018 02:18:02 PM: > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 02/14/2018 02:18 PM > Subject: gpfsug-discuss Digest, Vol 73, Issue 36 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Odd behavior with cat followed by grep. (John Hanks) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 14 Feb 2018 11:17:19 -0800 > From: John Hanks > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Thanks Bryan, mystery solved :) > > We also stumbled across these related items, in case anyone else wanders > into this thread. > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__bug-2Dgrep.gnu.narkive.com_Y8cfvWDt_bug-2D27666-2Dgrep-2Don-2Dgpfs-2Dfilesystem-2Dseek-2Dhole-2Dproblem&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=FgxYBxqHZ0bHdWirEs1U_B3oDpeHJe8iRd- > TYrXh6FI&e= > > https://www.ibm.com/developerworks/community/forums/html/topic? > id=c2a94433-9ec0-4a4b-abfe-d0a1e721d630 > > GPFS, the gift that keeps on giving ... me more things to do instead of > doing the things I want to be doing. > > Thanks all, > > jbh > > On Wed, Feb 14, 2018 at 10:48 AM, Bryan Banister > wrote: > > > Hi all, > > > > > > > > We found this a while back and IBM fixed it. Here?s your answer: > > http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385 > > > > > > > > Cheers, > > > > -Bryan > > > > > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss- > > bounces at spectrumscale.org] *On Behalf Of *John Hanks > > *Sent:* Wednesday, February 14, 2018 12:31 PM > > *To:* gpfsug main discussion list > > *Subject:* Re: [gpfsug-discuss] Odd behavior with cat followed by grep. > > > > > > > > *Note: External Email* > > ------------------------------ > > > > Straces are interesting, but don't immediately open my eyes: > > > > > > > > strace of grep on NFS (works as expected) > > > > > > > > openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > > > ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl > > for device) > > > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > > > lseek(3, 32768, SEEK_HOLE) = 530721 > > > > lseek(3, 32768, SEEK_SET) = 32768 > > > > fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 > > > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0x7f3bf6c43000 > > > > write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 > > > > > > > > strace on GPFS (thinks file is binary) > > > > > > > > openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > > > ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl > > for device) > > > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > > > lseek(3, 32768, SEEK_HOLE) = 262144 > > > > lseek(3, 32768, SEEK_SET) = 32768 > > > > fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 > > > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0x7fd45ee88000 > > > > close(3) = 0 > > > > write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file > > /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches > > > > ) = 72 > > > > > > > > Do the lseek() results indicate that the grep on the GPFS mounted version > > thinks the file is a sparse file? For comparison I strace'd md5sum in place > > of the grep and it does not lseek() with SEEK_HOLE, it's access in both > > cases look identical, like: > > > > > > > > open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > > > fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0x7fb7d2c2b000 > > > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > > > ...[reads clipped]... > > > > read(3, "", 24576) = 0 > > > > lseek(3, 0, SEEK_CUR) = 530721 > > > > close(3) = 0 > > > > > > > > > > > > jbh > > > > > > > > > > > > On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister > > wrote: > > > > Just speculating here (also known as making things up) but I wonder if > > grep is somehow using the file's size in its determination of binary > > status. I also see mmap in the strace so maybe there's some issue with > > mmap where some internal GPFS buffer is getting truncated > > inappropriately but leaving a bunch of null values which gets returned > > to grep. > > > > -Aaron > > > > On 2/14/18 10:21 AM, John Hanks wrote: > > > Hi Valdis, > > > > > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > > > this is a data integrity issue, thankfully: > > > > > > $ ./pipetestls.sh > > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 > > /home/griznog/pipetest.tmp.txt > > > > > > $ ./pipetestmd5.sh > > > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > > > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > > > > > And replacing grep with 'file' even properly sees the files as ASCII: > > > $ ./pipetestfile.sh > > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > > > > > I'll poke a little harder at grep next and see what the difference in > > > strace of each reveals. > > > > > > Thanks, > > > > > > jbh > > > > > > > > > > > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > > > > > wrote: > > > > > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > > > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt > > $HOME/pipetest.tmp.txt > > > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > > /home/griznog/pipetest.tmp.txt > > > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > > > > > We can "fix" the user case that exposed this by not using a temp > > file or > > > > inserting a sleep, but I'd still like to know why GPFS is behaving > > this way > > > > and make it stop. > > > > > > May be related to replication, or other behind-the-scenes behavior. > > > > > > Consider this example - 4.2.3.6, data and metadata replication both > > > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > > > a full > > > fiberchannel mesh to 3 Dell MD34something arrays. > > > > > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > > > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > > > 4096+0 records in > > > 4096+0 records out > > > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > > > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > > > > Notice that the first /bin/ls shouldn't be starting until after the > > > dd has > > > completed - at which point it's only allocated half the blocks > > > needed to hold > > > the 4M of data at one site. 5 seconds later, it's allocated the > > > blocks at both > > > sites and thus shows the full 8M needed for 2 copies. > > > > > > I've also seen (but haven't replicated it as I write this) a small > > > file (4-8K > > > or so) showing first one full-sized block, then a second full-sized > > > block, and > > > then dropping back to what's needed for 2 1/32nd fragments. That > > had me > > > scratching my head > > > > > > Having said that, that's all metadata fun and games, while your case > > > appears to have some problems with data integrity (which is a whole > > lot > > > scarier). It would be *really* nice if we understood the problem > > here. > > > > > > The scariest part is: > > > > > > > The first grep | wc -l returns 1, because grep outputs "Binary > > file /path/to/ > > > > gpfs/mount/test matches" > > > > > > which seems to be implying that we're failing on semantic > > consistency. > > > Basically, your 'cat' command is completing and closing the file, > > > but then a > > > temporally later open of the same find is reading something other > > > that only the > > > just-written data. My first guess is that it's a race condition > > > similar to the > > > following: The cat command is causing a write on one NSD server, and > > > the first > > > grep results in a read from a *different* NSD server, returning the > > > data that > > > *used* to be in the block because the read actually happens before > > > the first > > > NSD server actually completes the write. > > > > > > It may be interesting to replace the grep's with pairs of 'ls -ls / > > > dd' commands to grab the > > > raw data and its size, and check the following: > > > > > > 1) does the size (both blocks allocated and logical length) reported > > by > > > ls match the amount of data actually read by the dd? > > > > > > 2) Is the file length as actually read equal to the written length, > > > or does it > > > overshoot and read all the way to the next block boundary? > > > > > > 3) If the length is correct, what's wrong with the data that's > > > telling grep that > > > it's a binary file? ( od -cx is your friend here). > > > > > > 4) If it overshoots, is the remainder all-zeros (good) or does it > > > return semi-random > > > "what used to be there" data (bad, due to data exposure issues)? > > > > > > (It's certainly not the most perplexing data consistency issue I've > > > hit in 4 decades - the > > > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > > > cluster that > > > had us, IBM, SGI, DDN, and at least one vendor of networking gear > > > all chasing our > > > tails for 18 months before we finally tracked it down. :) > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > > > gpfsug-discuss at spectrumscale.org urldefense.proofpoint.com/v2/url? > u=http-3A__spectrumscale.org&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=jUBFb8C9yai1TUTu1BVnNTNcOnJXGxupWiEKkEjT4pM&e= > > > > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > > > > > > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > > > > ------------------------------ > > > > Note: This email is for the confidential use of the named addressee(s) > > only and may contain proprietary, confidential or privileged information. > > If you are not the intended recipient, you are hereby notified that any > > review, dissemination or copying of this email is strictly prohibited, and > > to please notify the sender immediately and destroy this email and any > > attachments. Email transmission cannot be guaranteed to be secure or > > error-free. The Company, therefore, does not make any guarantees as to the > > completeness or accuracy of this email or any attachments. This email is > > for informational purposes only and does not constitute a recommendation, > > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > > or perform any type of transaction of a financial product. > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20180214_d62fc203_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=nUcKIKr84CRhS0EbxV5vwjSlEr4p3Wf6Is3EDKvOjJg&e= > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > End of gpfsug-discuss Digest, Vol 73, Issue 36 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Feb 14 21:54:36 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 14 Feb 2018 21:54:36 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> <1518529381.3326.93.camel@strath.ac.uk> Message-ID: <90827aa7-e03c-7f2c-229a-c9db4c7dc8be@strath.ac.uk> On 13/02/18 15:56, Buterbaugh, Kevin L wrote: > Hi JAB, > > OK, let me try one more time to clarify. I?m not naming the vendor ? > they?re a small maker of commodity storage and we?ve been using their > stuff for years and, overall, it?s been very solid. The problem in > this specific case is that a major version firmware upgrade is > required ? if the controllers were only a minor version apart we > could do it live. > That makes more sense, but still do tell which vendor so I can avoid them. It's 2018 I expect never to need to take my storage down for *ANY* firmware upgrade *EVER* - period. Any vendor that falls short of that needs to go on my naughty list, for specific checking that this is no longer the case before I ever purchase any of their kit. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan at buzzard.me.uk Wed Feb 14 21:47:38 2018 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 14 Feb 2018 21:47:38 +0000 Subject: [gpfsug-discuss] Scale 5, filesystem guidelines In-Reply-To: References: Message-ID: On 14/02/18 20:59, Ray Coetzee wrote: > Since Scale 5.0 was released I've not seen much guidelines provided on > how to make the best of the new filesystem layout. > > For example, is dedicated metadata SSD's still recommended or does the > Scale 5 improvements mean we can just do metadata and data?pools now? > > I'd be interested to?hear of anyone's experience so far. > Well given metadata performance is heavily related to random IO performance I would suspect that dedicated metadata SSD's are still recommended. That is unless you have an all SSD based file system :-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From kkr at lbl.gov Thu Feb 15 01:47:26 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 14 Feb 2018 17:47:26 -0800 Subject: [gpfsug-discuss] RDMA data from Zimon Message-ID: Hi, Can one of the IBMers tell me if port_xmit_data and port_rcv_data from Zimon can be interpreted as RDMA Bytes/sec? Ideally, also how this data is being collected? I?m looking here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1hlp_monnetworksmetrics.htm But then I also look here: https://community.mellanox.com/docs/DOC-2751 and see "Total number of data octets, divided by 4 (lanes), received on all VLs. This is 64 bit counter.? So I wasn?t sure if some multiplication by 4 was in order. Please advise. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Feb 15 09:28:42 2018 From: john.hearns at asml.com (John Hearns) Date: Thu, 15 Feb 2018 09:28:42 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions In-Reply-To: <9C726F78-D870-4E1E-92B6-96F495F53D54@vanderbilt.edu> References: <9C726F78-D870-4E1E-92B6-96F495F53D54@vanderbilt.edu> Message-ID: Simon, Kevin Thankyou for your responses. Simon, indeed we do see this behavior on AFM filesets which have an ?old? view ? and we can watch the AFM fileset change as the information is updated. In this case, this filesystem is not involved with AFM. I Changed the locking semantics from NFSv4 to Posix and the report is that this has solved the problem. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: Wednesday, February 14, 2018 9:54 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Odd d????????? permissions Hi John, We had a similar incident happen just a week or so ago here, although in our case it was that certain files within a directory showed up with the question marks, while others didn?t. The problem was simply that the node had been run out of RAM and the GPFS daemon couldn?t allocate memory. Killing the offending process(es) and restarting GPFS fixed the issue. We saw hundreds of messages like: 2018-02-07_16:35:13.267-0600: [E] Failed to allocate 92274688 bytes in memory pool, err -1 In the GPFS log when this was happening. HTHAL? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Feb 14, 2018, at 12:38 PM, Simon Thompson (IT Research Support) > wrote: Is it an AFM cache? We see this sort of behaviour occasionally where the cache has an "old" view of the directory. Doing an ls, it evidently goes back to home but by then you already have weird stuff. The next ls is usually fine. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of john.hearns at asml.com [john.hearns at asml.com] Sent: 14 February 2018 09:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] Odd d????????? permissions I am sure this is a known behavior and I am going to feel very foolish in a few minutes? We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9df4b4d88544447ac29608d573da2d51%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636542303262503651&sdata=v6pnBIEvu6lyP3mGkkRX7hSj58H8vvkUl6R%2FCsq6gmc%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Feb 15 09:31:34 2018 From: john.hearns at asml.com (John Hearns) Date: Thu, 15 Feb 2018 09:31:34 +0000 Subject: [gpfsug-discuss] Thankyou - d?????? issue Message-ID: Simon, Kevin Thankyou for your responses. Simon, indeed we do see this behavior on AFM filesets which have an 'old' view - and we can watch the AFM fileset change as the information is updated. In this case, this filesystem is not involved with AFM. I changed the locking semantics from NFSv4 to Posix and the report is that this has solved the problem. Sorry for not replying on the thread. The mailing list software reckons I am not who I say I am. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Thu Feb 15 11:58:05 2018 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Thu, 15 Feb 2018 11:58:05 +0000 Subject: [gpfsug-discuss] Registration open for UK SSUG Message-ID: <8f1e98c75e688acf894fc8bb11fe0335@webmail.gpfsug.org> Dear members, The registration page for the next UK Spectrum Scale user group meeting is now live. We're looking forward to seeing you in London on 18th and 19th April where you will have the opportunity to hear the latest Spectrum Scale updates from filesystem experts as well as hear from other users on their experiences. Similar to previous years, we're also holding smaller interactive workshops to allow for more detailed discussion. Thank you for the kind sponsorship from all our sponsors IBM, DDN, E8, Ellexus, Lenovo, NEC, and OCF without which the event would not be possible. To register, please visit the Eventbrite registration page: https://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-2018-registration-41489952565?aff=MailingList [1] We look forward to seeing you in London! -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org Wireless? Wired connection for presenter (for live demo/webcasting?) Are there cameras in the rooms for webcasting at all? Links: ------ [1] https://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-2018-registration-41489952565?aff=MailingList -------------- next part -------------- An HTML attachment was scrubbed... URL: From agar at us.ibm.com Thu Feb 15 17:08:08 2018 From: agar at us.ibm.com (Eric Agar) Date: Thu, 15 Feb 2018 12:08:08 -0500 Subject: [gpfsug-discuss] RDMA data from Zimon In-Reply-To: References: Message-ID: Kristy, I experimented a bit with this some months ago and looked at the ZIMon source code. I came to the conclusion that ZIMon is reporting values obtained from the IB counters (actually, delta values adjusted for time) and that yes, for port_xmit_data and port_rcv_data, one would need to multiply the values by 4 to make sense of them. To obtain a port_xmit_data value, the ZIMon sensor first looks for /sys/class/infiniband//ports//counters_ext/port_xmit_data_64, and if that is not found then looks for /sys/class/infiniband//ports//counters/port_xmit_data. Similarly for other counters/metrics. Full disclosure: I am not an IB expert nor a ZIMon developer. I hope this helps. Eric M. Agar agar at us.ibm.com From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 02/14/2018 08:47 PM Subject: [gpfsug-discuss] RDMA data from Zimon Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Can one of the IBMers tell me if port_xmit_data and port_rcv_data from Zimon can be interpreted as RDMA Bytes/sec? Ideally, also how this data is being collected? I?m looking here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1hlp_monnetworksmetrics.htm But then I also look here: https://community.mellanox.com/docs/DOC-2751 and see "Total number of data octets, divided by 4 (lanes), received on all VLs. This is 64 bit counter.? So I wasn?t sure if some multiplication by 4 was in order. Please advise. Cheers, Kristy_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=zIRb70L9sx_FvvC9IcWVKLOSOOFnx-hIGfjw0kUN7bw&s=D1g4YTG5WeUiHI3rCPr_kkPxbG9V9E-18UGXBeCvfB8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From G.Horton at bham.ac.uk Fri Feb 16 10:28:48 2018 From: G.Horton at bham.ac.uk (Gareth Horton) Date: Fri, 16 Feb 2018 10:28:48 +0000 Subject: [gpfsug-discuss] Hello Message-ID: <85BF558D-7F13-4059-834E-7D655BD17107@bham.ac.uk> Hi All, A short note to introduce myself to all members My name is Gareth Horton and I work at Birmingham University within the Research Computing 'Architecture, Infrastructure and Systems? team I am new to GPFS and HPC, coming from a general Windows / Unix / Linux sys admin background, before moving into VMware server virtualisation and SAN & NAS storage admin. We use GPFS to provide storage and archiving services to researchers for both traditional HPC and cloud (Openstack) environments I?m currently a GPFS novice and I?m hoping to learn a lot from the experience and knowledge of the group and its members Regards Gareth Horton Architecture, Infrastructure and Systems Research Computing- IT Services Computer Centre G5, Elms Road, University of Birmingham B15 2TT g.horton at bham.ac.uk| www.bear.bham.ac.uk| -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Feb 16 18:17:18 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 16 Feb 2018 10:17:18 -0800 Subject: [gpfsug-discuss] Hello In-Reply-To: <85BF558D-7F13-4059-834E-7D655BD17107@bham.ac.uk> References: <85BF558D-7F13-4059-834E-7D655BD17107@bham.ac.uk> Message-ID: <256528BD-CAAC-4D8B-9DD4-B90992D7EFBC@lbl.gov> Welcome Gareth. As a person coming in with fresh eyes, it would be helpful if you let us know if you run into anything that makes you think ?it would be great if there were ?? ?particular documentation, information about UG events, etc. Thanks, Kristy > On Feb 16, 2018, at 2:28 AM, Gareth Horton wrote: > > Hi All, > > A short note to introduce myself to all members > > My name is Gareth Horton and I work at Birmingham University within the Research Computing 'Architecture, Infrastructure and Systems? team > > I am new to GPFS and HPC, coming from a general Windows / Unix / Linux sys admin background, before moving into VMware server virtualisation and SAN & NAS storage admin. > > We use GPFS to provide storage and archiving services to researchers for both traditional HPC and cloud (Openstack) environments > > I?m currently a GPFS novice and I?m hoping to learn a lot from the experience and knowledge of the group and its members > > Regards > > Gareth Horton > > Architecture, Infrastructure and Systems > Research Computing- IT Services > Computer Centre G5, > Elms Road, University of Birmingham > B15 2TT > g.horton at bham.ac.uk | www.bear.bham.ac.uk | > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.raimbach at googlemail.com Mon Feb 19 12:16:43 2018 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Mon, 19 Feb 2018 12:16:43 +0000 Subject: [gpfsug-discuss] GUI reports erroneous NIC errors Message-ID: Hi GUI whizzes, I have a couple of AFM nodes in my cluster with dual-port MLX cards for RDMA. Only the first port on the card is connected to the fabric and the cluster configuration seems correct to me: # mmlsconfig ---8<--- [nsdNodes] verbsPorts mlx5_1/1 [afm] verbsPorts mlx4_1/1 [afm,nsdNodes] verbsRdma enable --->8--- The cluster is working fine, and the mmlfs.log shows me what I expect, i.e. RDMA connections being made over the correct interfaces. Nevertheless the GUI tells me such lies as "Node Degraded" and "ib_rdma_nic_unrecognised" for the second port on the card (which is not explicitly used). Event details are: Event name: ib_rdma_nic_unrecognized Component: Network Entity type: Node Entity name: afm01 Event time: 19/02/18 12:53:39 Message: IB RDMA NIC mlx4_1/2 was not recognized Description: The specified IB RDMA NIC was not correctly recognized for usage by Spectrum Scale Cause: The specified IB RDMA NIC is not reported in 'mmfsadm dump verbs' User action: N/A Reporting node: afm01 Event type: Active health state of an entity which is monitored by the system. Naturally the GUI is for those who like to see reports and this incorrect entry would likely generate a high volume of unwanted questions from such report viewers. How can I bring the GUI reporting back in line with reality? Thanks, Luke. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Feb 19 14:00:49 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 19 Feb 2018 09:00:49 -0500 Subject: [gpfsug-discuss] Configuration advice In-Reply-To: <20180212151155.GD23944@cefeid.wcss.wroc.pl> References: <20180212151155.GD23944@cefeid.wcss.wroc.pl> Message-ID: As I think you understand we can only provide general guidance as regards your questions. If you want a detailed examination of your requirements and a proposal for a solution you will need to engage the appropriate IBM services team. My personal recommendation is to use as few file systems as possible, preferably just one. The reason is that makes general administration, and storage management, easier. If you do use filesets I suggest you use independent filesets because they offer more administrative control than dependent filesets. As for the number of nodes in the cluster that depends on your requirements for performance and availability. If you do have only 2 then you will need a tiebreaker disk to resolve quorum issues should the network between the nodes have problems. If you intend to continue to use HSM I would suggest you use the GPFS policy engine to drive the migrations because it should be more efficient than using HSM directly. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Pawel Dziekonski To: gpfsug-discuss at spectrumscale.org Date: 02/12/2018 10:18 AM Subject: [gpfsug-discuss] Configuration advice Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I inherited from previous admin 2 separate gpfs machines. All hardware+software is old so I want to switch to new servers, new disk arrays, new gpfs version and new gpfs "design". Each machine has 4 gpfs filesystems and runs a TSM HSM client that migrates data to tapes using separate TSM servers: GPFS+HSM no 1 -> TSM server no 1 -> tapes GPFS+HSM no 2 -> TSM server no 2 -> tapes Migration is done by HSM (not GPFS policies). All filesystems are used for archiving results from HPC system and other files (a kind of backup - don't ask...). Data is written by users via nfs shares. There are 8 nfs mount points corresponding to 8 gpfs filesystems, but there is no real reason for that. 4 filesystems are large and heavily used, 4 remaining are almost not used. The question is how to configure new gpfs infrastructure? My initial impression is that I should create a GPFS cluster of 2+ nodes and export NFS using CES. The most important question is how many filesystem do I need? Maybe just 2 and 8 filesets? Or how to do that in a flexible way and not to lock myself in stupid configuration? any hints? thanks, Pawel ps. I will recall all data and copy it to new infrastructure. Yes, that's the way I want to do that. :) -- Pawel Dziekonski , https://urldefense.proofpoint.com/v2/url?u=http-3A__www.wcss.pl&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=-wyO42O-5SDJQfYoGpqeObZNSlFzduC9mlXhsZb65HI&s=__3QSrBGRtG4Rja-QzbpqALX2o8l-67gtrqePi0NrfE&e= Wroclaw Centre for Networking & Supercomputing, HPC Department _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=-wyO42O-5SDJQfYoGpqeObZNSlFzduC9mlXhsZb65HI&s=32gAuk8HDIPkjMjY4L7DB1tFqmJxeaP4ZWIYA_Ya3ts&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 21 09:01:39 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 09:01:39 +0000 Subject: [gpfsug-discuss] GPFS Downloads Message-ID: Would someone else kindly go to this webpage: https://www.ibm.com/support/home/product/10000060/IBM%20Spectrum%20Scale Click on Downloads then confirm you get a choice of two identical Spectrum Scale products. Neither of which has a version fix level you can select on the check box below. I have tried this in Internet Explorer and Chrome. My apology if this is stupidity on my part, but I really would like to download the latest 4.2.3 version with the APAR we need. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Feb 21 09:23:10 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 21 Feb 2018 09:23:10 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: Message-ID: Same for me. What I normally do is just go straight to Fix Central and navigate from there. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 21 February 2018 09:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] GPFS Downloads Would someone else kindly go to this webpage: https://www.ibm.com/support/home/product/10000060/IBM%20Spectrum%20Scale Click on Downloads then confirm you get a choice of two identical Spectrum Scale products. Neither of which has a version fix level you can select on the check box below. I have tried this in Internet Explorer and Chrome. My apology if this is stupidity on my part, but I really would like to download the latest 4.2.3 version with the APAR we need. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 21 08:54:41 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 08:54:41 +0000 Subject: [gpfsug-discuss] Finding all bulletins and APARs Message-ID: Firstly, let me apologise for not thanking people who hav ereplied to me on this list with help. I have indeed replied and thanked you - however the list software has taken a dislike to my email address. I am currently on the myibm support site. I am looking for a specific APAR on Spectrum Scale. However I want to be able to get a list of all APARs and bulletins for Spectrum Scale, right up to date. I do get email alerts but somehow I suspect I am not getting them all, and it is a pain to search back in your email. Thanks John H -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Feb 21 09:31:25 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 21 Feb 2018 09:31:25 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From anencizo at us.ibm.com Wed Feb 21 17:19:09 2018 From: anencizo at us.ibm.com (Angela Encizo) Date: Wed, 21 Feb 2018 17:19:09 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14571047317701.png Type: image/png Size: 6645 bytes Desc: not available URL: From carlz at us.ibm.com Wed Feb 21 19:54:31 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 21 Feb 2018 19:54:31 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: Message-ID: It does look like that link is broken, thanks for letting us know. If you click on the Menu dropdown at the top of the page that says "Downloads" you'll see a link to Fix Central that takes you to the right place. Carl Zetie Offering Manager for Spectrum Scale, IBM (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From valdis.kletnieks at vt.edu Wed Feb 21 20:20:16 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 21 Feb 2018 15:20:16 -0500 Subject: [gpfsug-discuss] GPFS and Wireshark.. Message-ID: <51481.1519244416@turing-police.cc.vt.edu> Has anybody out there done a Wireshark protocol filter for GPFS? Or know where to find enough documentation of the on-the-wire data formats to write even a basic one? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From juantellez at mx1.ibm.com Wed Feb 21 21:20:44 2018 From: juantellez at mx1.ibm.com (Juan Ignacio Tellez Vilchis) Date: Wed, 21 Feb 2018 21:20:44 +0000 Subject: [gpfsug-discuss] SOBAR restore Message-ID: An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Wed Feb 21 21:23:50 2018 From: lgayne at us.ibm.com (Lyle Gayne) Date: Wed, 21 Feb 2018 16:23:50 -0500 Subject: [gpfsug-discuss] SOBAR restore In-Reply-To: References: Message-ID: April Brown should be able to assist. Lyle From: "Juan Ignacio Tellez Vilchis" To: gpfsug-discuss at spectrumscale.org Date: 02/21/2018 04:21 PM Subject: [gpfsug-discuss] SOBAR restore Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, Is there anybody that has some experience with GPFS filesystem restore using SOBAR? I already back filesystem out using SOBAR, but having some troubles with dsmc restore command. Any help would be appreciated! Juan Ignacio Tellez Vilchis Storage Consultant Lab. Services IBM Systems Hardware Phone: 52-55-5270-3218 | Mobile: 52-55-10160692 IBM E-mail: juantellez at mx1.ibm.com Find me on: LinkedIn: http://mx.linkedin.com/in/Ignaciotellez1and within IBM on: IBM Connections: Alfonso Napoles Gandara https://w3-connections.ibm.com/profiles/html/profileView.do?key=2ce9da3f-33ae-4262-9e22-50433170ea46 3111 Mexico City, DIF 01210 Mexico _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=F5mU6o96aI7N9_U21xmoWIM5YmGNLLIi66Drt1r75UY&s=C_BZnOZwvJjElYiXC-xlyQLCNkoD3tUr4qZ2SdPfxok&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57328677.jpg Type: image/jpeg Size: 518 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57450745.jpg Type: image/jpeg Size: 1208 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57307813.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From john.hearns at asml.com Wed Feb 21 16:11:54 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 16:11:54 +0000 Subject: [gpfsug-discuss] mmfind will not exec Message-ID: I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won't work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +' So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Feb 22 01:26:22 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 21 Feb 2018 20:26:22 -0500 Subject: [gpfsug-discuss] mmfind will not exec In-Reply-To: References: Message-ID: Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 21 16:22:07 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 16:22:07 +0000 Subject: [gpfsug-discuss] mmfind - a ps. Message-ID: Ps. Her is how to get mmfind to run some operation on the files it finds. (I installed mmfind in /usr/local/bin) I find this very hacky, though I suppose it is idiomatic bash #!/bin/bash while read filename do echo -n $filename " " done <<< "`/usr/local/bin/mmfind /hpc/bscratch -type f`" -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ola.Pontusson at kb.se Thu Feb 22 06:23:37 2018 From: Ola.Pontusson at kb.se (Ola Pontusson) Date: Thu, 22 Feb 2018 06:23:37 +0000 Subject: [gpfsug-discuss] SOBAR restore In-Reply-To: References: Message-ID: Hi The SOBAR is documented with Spectrum Scale on IBMs website and if you follow thoose instructions there should be no problem (unless you bump into some of the errors in SOBAR). Have you done your mmimgbackup with TSM and sent the image to TSM and that?s why you try the dsmc restore? The only time I used dsmc restore is if I send the image to TSM. If you don?t send to TSM the image is where you put it and can be moved where you want it. The whole point of SOBAR is to use dsmmigrate so all files as HSM out to TSM not backuped. Just one question, if you do a mmlsfs filesystem ?V which version is your filesystem created with and what level is your Spectrum Scale running where you tries to perform restore? Sincerely, Ola Pontusson IT-Specialist National Library of Sweden Fr?n: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] F?r Juan Ignacio Tellez Vilchis Skickat: den 21 februari 2018 22:21 Till: gpfsug-discuss at spectrumscale.org ?mne: [gpfsug-discuss] SOBAR restore Hello, Is there anybody that has some experience with GPFS filesystem restore using SOBAR? I already back filesystem out using SOBAR, but having some troubles with dsmc restore command. Any help would be appreciated! Juan Ignacio Tellez Vilchis Storage Consultant Lab. Services IBM Systems Hardware ________________________________ Phone: 52-55-5270-3218 | Mobile: 52-55-10160692 E-mail: juantellez at mx1.ibm.com Find me on: [LinkedIn: http://mx.linkedin.com/in/Ignaciotellez1] and within IBM on: [IBM Connections: https://w3-connections.ibm.com/profiles/html/profileView.do?key=2ce9da3f-33ae-4262-9e22-50433170ea46] [IBM] Alfonso Napoles Gandara 3111 Mexico City, DIF 01210 Mexico -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Feb 22 09:01:43 2018 From: john.hearns at asml.com (John Hearns) Date: Thu, 22 Feb 2018 09:01:43 +0000 Subject: [gpfsug-discuss] mmfind will not exec In-Reply-To: References: Message-ID: Stupid me. The space between the {} and \; is significant. /usr/local/bin/mmfind /hpc/bscratch -type f -exec /bin/ls {} \; Still would be nice to have the documentation clarified please. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Thursday, February 22, 2018 2:26 AM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind will not exec Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns > To: gpfsug main discussion list > Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 22 14:20:32 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 22 Feb 2018 09:20:32 -0500 Subject: [gpfsug-discuss] mmfind -ls In-Reply-To: References: Message-ID: Leaving aside the -exec option, and whether you choose classic find or mmfind, why not just use the -ls option - same output, less overhead... mmfind pathname -type f -ls From: John Hearns To: gpfsug main discussion list Cc: "gpfsug-discuss-bounces at spectrumscale.org" Date: 02/22/2018 04:03 AM Subject: Re: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org Stupid me. The space between the {} and \; is significant. /usr/local/bin/mmfind /hpc/bscratch -type f -exec /bin/ls {} \; Still would be nice to have the documentation clarified please. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Thursday, February 22, 2018 2:26 AM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind will not exec Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=77Whh54a5VWNFaaczlMhEzn7B802MGX9m-C2xj4sP1k&s=L4bZlOcrZLwkyth7maRTEmms7Ftarchh_DkBvdTEF7w&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 22 14:27:28 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 22 Feb 2018 09:27:28 -0500 Subject: [gpfsug-discuss] mmfind - Use mmfind ... -xargs In-Reply-To: References: Message-ID: More recent versions of mmfind support an -xargs option... Run mmfind --help and see: -xargs [-L maxlines] [-I rplstr] COMMAND Similar to find ... | xargs [-L x] [-I r] COMMAND but COMMAND executions may run in parallel. This is preferred to -exec. With -xargs mmfind will run the COMMANDs in phase subject to mmapplypolicy options -m, -B, -N. Must be the last option to mmfind This gives you the fully parallelized power of mmapplypolicy without having to write SQL rules nor scripts. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 11:00 PM Subject: [gpfsug-discuss] mmfind - a ps. Sent by: gpfsug-discuss-bounces at spectrumscale.org Ps. Her is how to get mmfind to run some operation on the files it finds. (I installed mmfind in /usr/local/bin) I find this very hacky, though I suppose it is idiomatic bash #!/bin/bash while read filename do echo -n $filename " " done <<< "`/usr/local/bin/mmfind /hpc/bscratch -type f`" -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=vbcae5NoH6gMQCovOqRVJVgj9jJ2USmq47GHxVn6En8&s=F_GqjJRzSzubUSXpcjysWCwCjhVKO9YrbUdzjusY0SY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Thu Feb 22 19:58:48 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 14:58:48 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Message-ID: Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. At the end of this .. the workflow that i am targeting is where: ? If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. ? I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. I have tried LROC too, but it does not behave the same way as i guess AFM works. Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. Regards, Lohit -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Thu Feb 22 20:26:58 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 15:26:58 -0500 Subject: [gpfsug-discuss] GPFS, MMAP and Pagepool Message-ID: Hi all, I wanted to know, how does mmap interact with GPFS pagepool with respect to filesystem block-size? Does the efficiency depend on the mmap read size and the block-size of the filesystem even if all the data is cached in pagepool? GPFS 4.2.3.2 and CentOS7. Here is what i observed: I was testing a user script that uses mmap to read from 100M to 500MB files. The above files are stored on 3 different filesystems. Compute nodes - 10G pagepool and 5G seqdiscardthreshold. 1. 4M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the required files fully cached" from the above GPFS cluster as home. Data and Metadata together on SSDs 3. 16M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs When i run the script first time for ?each" filesystem: I see that GPFS reads from the files, and caches into the pagepool as it reads, from mmdiag -- iohist When i run the second time, i see that there are no IO requests from the compute node to GPFS NSD servers, which is expected since all the data from the 3 filesystems is cached. However - the time taken for the script to run for the files in the 3 different filesystems is different - although i know that they are just "mmapping"/reading from pagepool/cache and not from disk. Here is the difference in time, for IO just from pagepool: 20s 4M block size 15s 1M block size 40S 16M block size. Why do i see a difference when trying to mmap reads from different block-size filesystems, although i see that the IO requests are not hitting disks and just the pagepool? I am willing to share the strace output and mmdiag outputs if needed. Thanks, Lohit -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Feb 22 20:59:27 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 22 Feb 2018 20:59:27 +0000 Subject: [gpfsug-discuss] GPFS, MMAP and Pagepool In-Reply-To: References: Message-ID: Hi Lohit, i am working with ray on a mmap performance improvement right now, which most likely has the same root cause as yours , see --> http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html the thread above is silent after a couple of back and rorth, but ray and i have active communication in the background and will repost as soon as there is something new to share. i am happy to look at this issue after we finish with ray's workload if there is something missing, but first let's finish his, get you try the same fix and see if there is something missing. btw. if people would share their use of MMAP , what applications they use (home grown, just use lmdb which uses mmap under the cover, etc) please let me know so i get a better picture on how wide the usage is with GPFS. i know a lot of the ML/DL workloads are using it, but i would like to know what else is out there i might not think about. feel free to drop me a personal note, i might not reply to it right away, but eventually. thx. sven On Thu, Feb 22, 2018 at 12:33 PM wrote: > Hi all, > > I wanted to know, how does mmap interact with GPFS pagepool with respect > to filesystem block-size? > Does the efficiency depend on the mmap read size and the block-size of the > filesystem even if all the data is cached in pagepool? > > GPFS 4.2.3.2 and CentOS7. > > Here is what i observed: > > I was testing a user script that uses mmap to read from 100M to 500MB > files. > > The above files are stored on 3 different filesystems. > > Compute nodes - 10G pagepool and 5G seqdiscardthreshold. > > 1. 4M block size GPFS filesystem, with separate metadata and data. Data on > Near line and metadata on SSDs > 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the > required files fully cached" from the above GPFS cluster as home. Data and > Metadata together on SSDs > 3. 16M block size GPFS filesystem, with separate metadata and data. Data > on Near line and metadata on SSDs > > When i run the script first time for ?each" filesystem: > I see that GPFS reads from the files, and caches into the pagepool as it > reads, from mmdiag -- iohist > > When i run the second time, i see that there are no IO requests from the > compute node to GPFS NSD servers, which is expected since all the data from > the 3 filesystems is cached. > > However - the time taken for the script to run for the files in the 3 > different filesystems is different - although i know that they are just > "mmapping"/reading from pagepool/cache and not from disk. > > Here is the difference in time, for IO just from pagepool: > > 20s 4M block size > 15s 1M block size > 40S 16M block size. > > Why do i see a difference when trying to mmap reads from different > block-size filesystems, although i see that the IO requests are not hitting > disks and just the pagepool? > > I am willing to share the strace output and mmdiag outputs if needed. > > Thanks, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Feb 22 21:08:06 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 22 Feb 2018 16:08:06 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: References: Message-ID: I do not think AFM is intended to solve the problem you are trying to solve. If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. If that is true that would not be wise especially if you are going to do many metadata operations. I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: valleru at cbio.mskcc.org To: gpfsug main discussion list Date: 02/22/2018 03:11 PM Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. At the end of this .. the workflow that i am targeting is where: ? If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. ? I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. I have tried LROC too, but it does not behave the same way as i guess AFM works. Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. Regards, Lohit _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Thu Feb 22 21:19:08 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 16:19:08 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: References: Message-ID: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> Thank you. I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. Please do let me know, if i understood it wrong. On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , wrote: > I do not think AFM is intended to solve the problem you are trying to solve. ?If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. ?If that is true that would not be wise especially if you are going to do many metadata operations. ?I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. ?You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? ?I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. ?If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact ?1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: ? ? ? ?valleru at cbio.mskcc.org > To: ? ? ? ?gpfsug main discussion list > Date: ? ? ? ?02/22/2018 03:11 PM > Subject: ? ? ? ?[gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > ? > If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. > The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. > ? > > > I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. > I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM works. > > Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. > > Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Feb 22 21:52:01 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 22 Feb 2018 16:52:01 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> Message-ID: My apologies for not being more clear on the flash storage pool. I meant that this would be just another GPFS storage pool in the same cluster, so no separate AFM cache cluster. You would then use the file heat feature to ensure more frequently accessed files are migrated to that all flash storage pool. As for LROC could you please clarify what you mean by a few headers/stubs of the file? In reading the LROC documentation and the LROC variables available in the mmchconfig command I think you might want to take a look a the lrocDataStubFileSize variable since it seems to apply to your situation. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: valleru at cbio.mskcc.org To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Date: 02/22/2018 04:21 PM Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Sent by: gpfsug-discuss-bounces at spectrumscale.org Thank you. I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. Please do let me know, if i understood it wrong. On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , wrote: I do not think AFM is intended to solve the problem you are trying to solve. If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. If that is true that would not be wise especially if you are going to do many metadata operations. I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: valleru at cbio.mskcc.org To: gpfsug main discussion list Date: 02/22/2018 03:11 PM Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. At the end of this .. the workflow that i am targeting is where: ? If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. ? I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. I have tried LROC too, but it does not behave the same way as i guess AFM works. Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. Regards, Lohit _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 23 00:48:12 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 19:48:12 -0500 Subject: [gpfsug-discuss] GPFS, MMAP and Pagepool In-Reply-To: References: Message-ID: Thanks a lot Sven. I was trying out all the scenarios that Ray mentioned, with respect to lroc and all flash GPFS cluster and nothing seemed to be effective. As of now, we are deploying a new test cluster on GPFS 5.0 and it would be good to know the respective features that could be enabled and see if it improves anything. On the other side, i have seen various cases in my past 6 years with GPFS, where different tools do frequently use mmap. This dates back to 2013..?http://www.spectrumscale.org/pipermail/gpfsug-discuss/2013-May/000253.html?when one of my colleagues asked the same question. At that time, it was a homegrown application that was using mmap, along with few other genomic pipelines. An year ago, we had issue with mmap and lot of threads where GPFS would just hang without any traces or logs, which was fixed recently. That was related to relion : https://sbgrid.org/software/titles/relion The issue that we are seeing now is ML/DL workloads, and is related to implementing external tools such as openslide (http://openslide.org/), pytorch (http://pytorch.org/) with field of application being deep learning for thousands of image patches. The IO is really slow when accessed from hard disk, and thus i was trying out other options such as LROC and flash cluster/afm cluster. But everything has a limitation as Ray mentioned. Thanks, Lohit On Feb 22, 2018, 3:59 PM -0500, Sven Oehme , wrote: > Hi Lohit, > > i am working with ray on a mmap performance improvement right now, which most likely has the same root cause as yours , see -->??http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html > the thread above is silent after a couple of back and rorth, but ray and i have active communication in the background and will repost as soon as there is something new to share. > i am happy to look at this issue after we finish with ray's workload if there is something missing, but first let's finish his, get you try the same fix and see if there is something missing. > > btw. if people would share their use of MMAP , what applications they use (home grown, just use lmdb which uses mmap under the cover, etc) please let me know so i get a better picture on how wide the usage is with GPFS. i know a lot of the ML/DL workloads are using it, but i would like to know what else is out there i might not think about. feel free to drop me a personal note, i might not reply to it right away, but eventually. > > thx. sven > > > > On Thu, Feb 22, 2018 at 12:33 PM wrote: > > > Hi all, > > > > > > I wanted to know, how does mmap interact with GPFS pagepool with respect to filesystem block-size? > > > Does the efficiency depend on the mmap read size and the block-size of the filesystem even if all the data is cached in pagepool? > > > > > > GPFS 4.2.3.2 and CentOS7. > > > > > > Here is what i observed: > > > > > > I was testing a user script that uses mmap to read from 100M to 500MB files. > > > > > > The above files are stored on 3 different filesystems. > > > > > > Compute nodes - 10G pagepool and 5G seqdiscardthreshold. > > > > > > 1. 4M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs > > > 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the required files fully cached" from the above GPFS cluster as home. Data and Metadata together on SSDs > > > 3. 16M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs > > > > > > When i run the script first time for ?each" filesystem: > > > I see that GPFS reads from the files, and caches into the pagepool as it reads, from mmdiag -- iohist > > > > > > When i run the second time, i see that there are no IO requests from the compute node to GPFS NSD servers, which is expected since all the data from the 3 filesystems is cached. > > > > > > However - the time taken for the script to run for the files in the 3 different filesystems is different - although i know that they are just "mmapping"/reading from pagepool/cache and not from disk. > > > > > > Here is the difference in time, for IO just from pagepool: > > > > > > 20s 4M block size > > > 15s 1M block size > > > 40S 16M block size. > > > > > > Why do i see a difference when trying to mmap reads from different block-size filesystems, although i see that the IO requests are not hitting disks and just the pagepool? > > > > > > I am willing to share the strace output and mmdiag outputs if needed. > > > > > > Thanks, > > > Lohit > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 23 01:27:58 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 20:27:58 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> Message-ID: <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> Thanks, I will try the file heat feature but i am really not sure, if it would work - since the code can access cold files too, and not necessarily files recently accessed/hot files. With respect to LROC. Let me explain as below: The use case is that - The code initially reads headers (small region of data) from thousands of files as the first step. For example about 30,000 of them with each about 300MB to 500MB in size. After the first step, with the help of those headers - it mmaps/seeks across various regions of a set of files in parallel. Since its all small IOs and it was really slow at reading from GPFS over the network directly from disks - Our idea was to use AFM which i believe fetches all file data into flash/ssds, once the initial few blocks of the files are read. But again - AFM seems to not solve the problem, so i want to know if LROC behaves in the same way as AFM, where all of the file data is prefetched in full block size utilizing all the worker threads ?- if few blocks of the file is read initially. Thanks, Lohit On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale , wrote: > My apologies for not being more clear on the flash storage pool. ?I meant that this would be just another GPFS storage pool in the same cluster, so no separate AFM cache cluster. ?You would then use the file heat feature to ensure more frequently accessed files are migrated to that all flash storage pool. > > As for LROC could you please clarify what you mean by a few headers/stubs of the file? ?In reading the LROC documentation and the LROC variables available in the mmchconfig command I think you might want to take a look a the lrocDataStubFileSize variable since it seems to apply to your situation. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact ?1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: ? ? ? ?valleru at cbio.mskcc.org > To: ? ? ? ?gpfsug main discussion list > Cc: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > Date: ? ? ? ?02/22/2018 04:21 PM > Subject: ? ? ? ?Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > > > > Thank you. > > I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. > I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. > You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? > The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? > I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. > Please do let me know, if i understood it wrong. > > On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , wrote: > I do not think AFM is intended to solve the problem you are trying to solve. ?If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. ?If that is true that would not be wise especially if you are going to do many metadata operations. ?I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. ?You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? ?I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. ?If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact ?1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: ? ? ? ?valleru at cbio.mskcc.org > To: ? ? ? ?gpfsug main discussion list > Date: ? ? ? ?02/22/2018 03:11 PM > Subject: ? ? ? ?[gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > ? > If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. > The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. > ? > > > I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. > I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM works. > > Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. > > Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Feb 23 03:17:26 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 22 Feb 2018 22:17:26 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory Message-ID: I've been exploring the idea for a while of writing a SLURM SPANK plugin to allow users to dynamically change the pagepool size on a node. Every now and then we have some users who would benefit significantly from a much larger pagepool on compute nodes but by default keep it on the smaller side to make as much physmem available as possible to batch work. In testing, though, it seems as though reducing the pagepool doesn't quite release all of the memory. I don't really understand it because I've never before seen memory that was previously resident become un-resident but still maintain the virtual memory allocation. Here's what I mean. Let's take a node with 128G and a 1G pagepool. If I do the following to simulate what might happen as various jobs tweak the pagepool: - tschpool 64G - tschpool 1G - tschpool 32G - tschpool 1G - tschpool 32G I end up with this: mmfsd thinks there's 32G resident but 64G virt # ps -o vsz,rss,comm -p 24397 VSZ RSS COMMAND 67589400 33723236 mmfsd however, linux thinks there's ~100G used # free -g total used free shared buffers cached Mem: 125 100 25 0 0 0 -/+ buffers/cache: 98 26 Swap: 7 0 7 I can jump back and forth between 1G and 32G *after* allocating 64G pagepool and the overall amount of memory in use doesn't balloon but I can't seem to shed that original 64G. I don't understand what's going on... :) Any ideas? This is with Scale 4.2.3.6. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Fri Feb 23 03:24:00 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 22 Feb 2018 22:24:00 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: This is also interesting (although I don't know what it really means). Looking at pmap run against mmfsd I can see what happens after each step: # baseline 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] Total: 1613580K 1191020K 1189650K 1171836K 0K # tschpool 64G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp [anon] Total: 67706636K 67284108K 67282625K 67264920K 0K # tschpool 1G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] Total: 67706636K 1223820K 1222451K 1204632K 0K Even though mmfsd has that 64G chunk allocated there's none of it *used*. I wonder why Linux seems to be accounting it as allocated. -Aaron On 2/22/18 10:17 PM, Aaron Knister wrote: > I've been exploring the idea for a while of writing a SLURM SPANK plugin > to allow users to dynamically change the pagepool size on a node. Every > now and then we have some users who would benefit significantly from a > much larger pagepool on compute nodes but by default keep it on the > smaller side to make as much physmem available as possible to batch work. > > In testing, though, it seems as though reducing the pagepool doesn't > quite release all of the memory. I don't really understand it because > I've never before seen memory that was previously resident become > un-resident but still maintain the virtual memory allocation. > > Here's what I mean. Let's take a node with 128G and a 1G pagepool. > > If I do the following to simulate what might happen as various jobs > tweak the pagepool: > > - tschpool 64G > - tschpool 1G > - tschpool 32G > - tschpool 1G > - tschpool 32G > > I end up with this: > > mmfsd thinks there's 32G resident but 64G virt > # ps -o vsz,rss,comm -p 24397 > ?? VSZ?? RSS COMMAND > 67589400 33723236 mmfsd > > however, linux thinks there's ~100G used > > # free -g > ???????????? total?????? used?????? free???? shared??? buffers???? cached > Mem:?????????? 125??????? 100???????? 25????????? 0????????? 0????????? 0 > -/+ buffers/cache:???????? 98???????? 26 > Swap:??????????? 7????????? 0????????? 7 > > I can jump back and forth between 1G and 32G *after* allocating 64G > pagepool and the overall amount of memory in use doesn't balloon but I > can't seem to shed that original 64G. > > I don't understand what's going on... :) Any ideas? This is with Scale > 4.2.3.6. > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From john.hearns at asml.com Fri Feb 23 09:37:08 2018 From: john.hearns at asml.com (John Hearns) Date: Fri, 23 Feb 2018 09:37:08 +0000 Subject: [gpfsug-discuss] mmfind -ls In-Reply-To: References: Message-ID: Hi. I hope this reply comes through. I often get bounced when replying here. In fact the reason is because I am not running ls. This was just an example. I am running mmgetlocation to get the chunks allocation on each NSD of a file. Secondly my problem is that a space is needed: mmfind /mountpoint -type f -exec mmgetlocation -D myproblemnsd -f {} \; Note the space before the \ TO my shame this is the same as in the normal find command From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, February 22, 2018 3:21 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind -ls Leaving aside the -exec option, and whether you choose classic find or mmfind, why not just use the -ls option - same output, less overhead... mmfind pathname -type f -ls From: John Hearns > To: gpfsug main discussion list > Cc: "gpfsug-discuss-bounces at spectrumscale.org" > Date: 02/22/2018 04:03 AM Subject: Re: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Stupid me. The space between the {} and \; is significant. /usr/local/bin/mmfind /hpc/bscratch -type f -exec /bin/ls {} \; Still would be nice to have the documentation clarified please. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Thursday, February 22, 2018 2:26 AM To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind will not exec Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns > To: gpfsug main discussion list > Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=77Whh54a5VWNFaaczlMhEzn7B802MGX9m-C2xj4sP1k&s=L4bZlOcrZLwkyth7maRTEmms7Ftarchh_DkBvdTEF7w&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Feb 23 14:35:41 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 23 Feb 2018 09:35:41 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: AFAIK you can increase the pagepool size dynamically but you cannot shrink it dynamically. To shrink it you must restart the GPFS daemon. Also, could you please provide the actual pmap commands you executed? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Aaron Knister To: Date: 02/22/2018 10:30 PM Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all memory Sent by: gpfsug-discuss-bounces at spectrumscale.org This is also interesting (although I don't know what it really means). Looking at pmap run against mmfsd I can see what happens after each step: # baseline 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] Total: 1613580K 1191020K 1189650K 1171836K 0K # tschpool 64G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp [anon] Total: 67706636K 67284108K 67282625K 67264920K 0K # tschpool 1G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] Total: 67706636K 1223820K 1222451K 1204632K 0K Even though mmfsd has that 64G chunk allocated there's none of it *used*. I wonder why Linux seems to be accounting it as allocated. -Aaron On 2/22/18 10:17 PM, Aaron Knister wrote: > I've been exploring the idea for a while of writing a SLURM SPANK plugin > to allow users to dynamically change the pagepool size on a node. Every > now and then we have some users who would benefit significantly from a > much larger pagepool on compute nodes but by default keep it on the > smaller side to make as much physmem available as possible to batch work. > > In testing, though, it seems as though reducing the pagepool doesn't > quite release all of the memory. I don't really understand it because > I've never before seen memory that was previously resident become > un-resident but still maintain the virtual memory allocation. > > Here's what I mean. Let's take a node with 128G and a 1G pagepool. > > If I do the following to simulate what might happen as various jobs > tweak the pagepool: > > - tschpool 64G > - tschpool 1G > - tschpool 32G > - tschpool 1G > - tschpool 32G > > I end up with this: > > mmfsd thinks there's 32G resident but 64G virt > # ps -o vsz,rss,comm -p 24397 > VSZ RSS COMMAND > 67589400 33723236 mmfsd > > however, linux thinks there's ~100G used > > # free -g > total used free shared buffers cached > Mem: 125 100 25 0 0 0 > -/+ buffers/cache: 98 26 > Swap: 7 0 7 > > I can jump back and forth between 1G and 32G *after* allocating 64G > pagepool and the overall amount of memory in use doesn't balloon but I > can't seem to shed that original 64G. > > I don't understand what's going on... :) Any ideas? This is with Scale > 4.2.3.6. > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OrZQeEmI6chBdguG-h4YPHsxXZ4gTU3CtIuN4e3ijdY&s=hvVIRG5kB1zom2Iql2_TOagchsgl99juKiZfJt5S1tM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Fri Feb 23 14:44:21 2018 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Fri, 23 Feb 2018 15:44:21 +0100 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> hi all, we had the same idea long ago, afaik the issue we had was due to the pinned memory the pagepool uses when RDMA is enabled. at some point we restarted gpfs on the compute nodes for each job, similar to the way we do swapoff/swapon; but in certain scenarios gpfs really did not like it; so we gave up on it. the other issue that needs to be resolved is that the pagepool needs to be numa aware, so the pagepool is nicely allocated across all numa domains, instead of using the first ones available. otherwise compute jobs might start that only do non-local doamin memeory access. stijn On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: > AFAIK you can increase the pagepool size dynamically but you cannot shrink > it dynamically. To shrink it you must restart the GPFS daemon. Also, > could you please provide the actual pmap commands you executed? > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Aaron Knister > To: > Date: 02/22/2018 10:30 PM > Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all > memory > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > This is also interesting (although I don't know what it really means). > Looking at pmap run against mmfsd I can see what happens after each step: > > # baseline > 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] > Total: 1613580K 1191020K 1189650K 1171836K 0K > > # tschpool 64G > 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp > [anon] > Total: 67706636K 67284108K 67282625K 67264920K 0K > > # tschpool 1G > 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] > 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] > 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] > Total: 67706636K 1223820K 1222451K 1204632K 0K > > Even though mmfsd has that 64G chunk allocated there's none of it > *used*. I wonder why Linux seems to be accounting it as allocated. > > -Aaron > > On 2/22/18 10:17 PM, Aaron Knister wrote: >> I've been exploring the idea for a while of writing a SLURM SPANK plugin > >> to allow users to dynamically change the pagepool size on a node. Every >> now and then we have some users who would benefit significantly from a >> much larger pagepool on compute nodes but by default keep it on the >> smaller side to make as much physmem available as possible to batch > work. >> >> In testing, though, it seems as though reducing the pagepool doesn't >> quite release all of the memory. I don't really understand it because >> I've never before seen memory that was previously resident become >> un-resident but still maintain the virtual memory allocation. >> >> Here's what I mean. Let's take a node with 128G and a 1G pagepool. >> >> If I do the following to simulate what might happen as various jobs >> tweak the pagepool: >> >> - tschpool 64G >> - tschpool 1G >> - tschpool 32G >> - tschpool 1G >> - tschpool 32G >> >> I end up with this: >> >> mmfsd thinks there's 32G resident but 64G virt >> # ps -o vsz,rss,comm -p 24397 >> VSZ RSS COMMAND >> 67589400 33723236 mmfsd >> >> however, linux thinks there's ~100G used >> >> # free -g >> total used free shared buffers > cached >> Mem: 125 100 25 0 0 > 0 >> -/+ buffers/cache: 98 26 >> Swap: 7 0 7 >> >> I can jump back and forth between 1G and 32G *after* allocating 64G >> pagepool and the overall amount of memory in use doesn't balloon but I >> can't seem to shed that original 64G. >> >> I don't understand what's going on... :) Any ideas? This is with Scale >> 4.2.3.6. >> >> -Aaron >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From makaplan at us.ibm.com Fri Feb 23 16:53:26 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 23 Feb 2018 11:53:26 -0500 Subject: [gpfsug-discuss] mmfind -ls, -exec but use -xargs wherever you can. In-Reply-To: References: Message-ID: So much the more reasons to use mmfind ... -xargs ... Which, for large number of files, gives you a very much more performant and parallelized execution of the classic find ... | xargs ... The difference is exec is run in line with the evaluation of the other find conditionals (like -type f) but spawns a new command shell for each evaluation of exec... Whereas -xargs is run after the pathnames of all of the (matching) files are discovered ... Like classic xargs, if your command can take a list of files, you save overhead there BUT -xargs also runs multiple instances of your command in multiple parallel processes on multiple nodes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Feb 23 23:41:52 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 23 Feb 2018 15:41:52 -0800 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th In-Reply-To: References: Message-ID: Agenda work for the US Spring meeting is still underway and in addition to Bob?s request below, let me ask you to comment on what you?d like to hear about from IBM developers, and/or other topics of interest. Even if you can?t attend the event, feel free to contribute ideas as the talks will be posted online after the event. Just reply to the list to generate any follow-on discussion or brainstorming about topics. Best, Kristy Kristy Kallback-Rose Sr HPC Storage Systems Analyst NERSC/LBL > On Feb 8, 2018, at 12:34 PM, Oesterlin, Robert wrote: > > We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! > > I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. > > Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > SSUG Co-principal > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Feb 24 12:01:08 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 24 Feb 2018 12:01:08 +0000 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> Message-ID: On 23/02/18 01:27, valleru at cbio.mskcc.org wrote: > Thanks, I will try the file heat feature but i am really not sure, if it > would work - since the code can access cold files too, and not > necessarily files recently accessed/hot files. > > With respect to LROC. Let me explain as below: > > The use case is that - > The code initially reads headers (small region of data) from thousands > of files as the first step. For example about 30,000 of them with each > about 300MB to 500MB in size. > After the first step, with the help of those headers - it mmaps/seeks > across various regions of a set of files in parallel. > Since its all small IOs and it was really slow at reading from GPFS over > the network directly from disks - Our idea was to use AFM which i > believe fetches all file data into flash/ssds, once the initial few > blocks of the files are read. > But again - AFM seems to not solve the problem, so i want to know if > LROC behaves in the same way as AFM, where all of the file data is > prefetched in full block size utilizing all the worker threads ?- if few > blocks of the file is read initially. > Imagine a single GPFS file system, metadata in SSD, a fast data pool and a slow data pool (fast and slow being really good names to avoid the 8 character nonsense). Now if your fast data pool is appropriately sized then your slow data pool will normally be doing diddly squat. We are talking under 10 I/O's per second. Frankly under 5 I/O's per second is more like it from my experience. If your slow pool is 8-10PB in size, then it has thousands of spindles in it, and should be able to absorb the start of the job without breaking sweat. For numbers a 7.2K RPM disk can do around 120 random I/O's per second, so using RAID6 and 8TB disks that's 130 LUN's so around 15,000 random I/O's per second spare overhead, more if it's not random. It should take all of around 1-2s to read in those headers. Therefore unless these jobs only run for a few seconds or you have dozens of them starting every minute it should not be an issue. Finally if GPFS is taking ages to read the files over the network, then it sounds like your network needs an upgrade or GPFS needs tuning which may or may not require a larger fast storage pool. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From aaron.s.knister at nasa.gov Sun Feb 25 16:45:10 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 25 Feb 2018 11:45:10 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: <65453649-77df-2efa-8776-eb2775ca9efa@nasa.gov> Hmm...interesting. It sure seems to try :) The pmap command was this: pmap $(pidof mmfsd) | sort -n -k3 | tail -Aaron On 2/23/18 9:35 AM, IBM Spectrum Scale wrote: > AFAIK you can increase the pagepool size dynamically but you cannot > shrink it dynamically. ?To shrink it you must restart the GPFS daemon. > Also, could you please provide the actual pmap commands you executed? > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks Forum > at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please contact > ?1-800-237-5511 in the United States or your local IBM Service Center > in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Aaron Knister > To: > Date: 02/22/2018 10:30 PM > Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all memory > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > This is also interesting (although I don't know what it really means). > Looking at pmap run against mmfsd I can see what happens after each step: > > # baseline > 00007fffe4639000 ?59164K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 00007fffd837e000 ?61960K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 0000020000000000 1048576K 1048576K 1048576K 1048576K ? ? ?0K rwxp [anon] > Total: ? ? ? ? ? 1613580K 1191020K 1189650K 1171836K ? ? ?0K > > # tschpool 64G > 00007fffe4639000 ?59164K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 00007fffd837e000 ?61960K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 0000020000000000 67108864K 67108864K 67108864K 67108864K ?0K rwxp [anon] > Total: ? ? ? ? ? 67706636K 67284108K 67282625K 67264920K ? ? ?0K > > # tschpool 1G > 00007fffe4639000 ?59164K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 00007fffd837e000 ?61960K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 0000020001400000 139264K 139264K 139264K 139264K ? ? ?0K rwxp [anon] > 0000020fc9400000 897024K 897024K 897024K 897024K ? ? ?0K rwxp [anon] > 0000020009c00000 66052096K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K rwxp [anon] > Total: ? ? ? ? ? 67706636K 1223820K 1222451K 1204632K ? ? ?0K > > Even though mmfsd has that 64G chunk allocated there's none of it > *used*. I wonder why Linux seems to be accounting it as allocated. > > -Aaron > > On 2/22/18 10:17 PM, Aaron Knister wrote: > > I've been exploring the idea for a while of writing a SLURM SPANK plugin > > to allow users to dynamically change the pagepool size on a node. Every > > now and then we have some users who would benefit significantly from a > > much larger pagepool on compute nodes but by default keep it on the > > smaller side to make as much physmem available as possible to batch work. > > > > In testing, though, it seems as though reducing the pagepool doesn't > > quite release all of the memory. I don't really understand it because > > I've never before seen memory that was previously resident become > > un-resident but still maintain the virtual memory allocation. > > > > Here's what I mean. Let's take a node with 128G and a 1G pagepool. > > > > If I do the following to simulate what might happen as various jobs > > tweak the pagepool: > > > > - tschpool 64G > > - tschpool 1G > > - tschpool 32G > > - tschpool 1G > > - tschpool 32G > > > > I end up with this: > > > > mmfsd thinks there's 32G resident but 64G virt > > # ps -o vsz,rss,comm -p 24397 > > ??? VSZ?? RSS COMMAND > > 67589400 33723236 mmfsd > > > > however, linux thinks there's ~100G used > > > > # free -g > > total?????? used free???? shared??? buffers cached > > Mem:?????????? 125 100???????? 25 0????????? 0 0 > > -/+ buffers/cache: 98???????? 26 > > Swap: 7????????? 0 7 > > > > I can jump back and forth between 1G and 32G *after* allocating 64G > > pagepool and the overall amount of memory in use doesn't balloon but I > > can't seem to shed that original 64G. > > > > I don't understand what's going on... :) Any ideas? This is with Scale > > 4.2.3.6. > > > > -Aaron > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OrZQeEmI6chBdguG-h4YPHsxXZ4gTU3CtIuN4e3ijdY&s=hvVIRG5kB1zom2Iql2_TOagchsgl99juKiZfJt5S1tM&e= > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Sun Feb 25 16:54:06 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 25 Feb 2018 11:54:06 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> References: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> Message-ID: Hi Stijn, Thanks for sharing your experiences-- I'm glad I'm not the only one whose had the idea (and come up empty handed). About the pagpool and numa awareness, I'd remembered seeing something about that somewhere and I did some googling and found there's a parameter called numaMemoryInterleave that "starts mmfsd with numactl --interleave=all". Do you think that provides the kind of numa awareness you're looking for? -Aaron On 2/23/18 9:44 AM, Stijn De Weirdt wrote: > hi all, > > we had the same idea long ago, afaik the issue we had was due to the > pinned memory the pagepool uses when RDMA is enabled. > > at some point we restarted gpfs on the compute nodes for each job, > similar to the way we do swapoff/swapon; but in certain scenarios gpfs > really did not like it; so we gave up on it. > > the other issue that needs to be resolved is that the pagepool needs to > be numa aware, so the pagepool is nicely allocated across all numa > domains, instead of using the first ones available. otherwise compute > jobs might start that only do non-local doamin memeory access. > > stijn > > On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: >> AFAIK you can increase the pagepool size dynamically but you cannot shrink >> it dynamically. To shrink it you must restart the GPFS daemon. Also, >> could you please provide the actual pmap commands you executed? >> >> Regards, The Spectrum Scale (GPFS) team >> >> ------------------------------------------------------------------------------------------------------------------ >> If you feel that your question can benefit other users of Spectrum Scale >> (GPFS), then please post it to the public IBM developerWroks Forum at >> https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 >> . >> >> If your query concerns a potential software error in Spectrum Scale (GPFS) >> and you have an IBM software maintenance contract please contact >> 1-800-237-5511 in the United States or your local IBM Service Center in >> other countries. >> >> The forum is informally monitored as time permits and should not be used >> for priority messages to the Spectrum Scale (GPFS) team. >> >> >> >> From: Aaron Knister >> To: >> Date: 02/22/2018 10:30 PM >> Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all >> memory >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> This is also interesting (although I don't know what it really means). >> Looking at pmap run against mmfsd I can see what happens after each step: >> >> # baseline >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] >> 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] >> Total: 1613580K 1191020K 1189650K 1171836K 0K >> >> # tschpool 64G >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] >> 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp >> [anon] >> Total: 67706636K 67284108K 67282625K 67264920K 0K >> >> # tschpool 1G >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] >> 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] >> 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] >> 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] >> Total: 67706636K 1223820K 1222451K 1204632K 0K >> >> Even though mmfsd has that 64G chunk allocated there's none of it >> *used*. I wonder why Linux seems to be accounting it as allocated. >> >> -Aaron >> >> On 2/22/18 10:17 PM, Aaron Knister wrote: >>> I've been exploring the idea for a while of writing a SLURM SPANK plugin >> >>> to allow users to dynamically change the pagepool size on a node. Every >>> now and then we have some users who would benefit significantly from a >>> much larger pagepool on compute nodes but by default keep it on the >>> smaller side to make as much physmem available as possible to batch >> work. >>> >>> In testing, though, it seems as though reducing the pagepool doesn't >>> quite release all of the memory. I don't really understand it because >>> I've never before seen memory that was previously resident become >>> un-resident but still maintain the virtual memory allocation. >>> >>> Here's what I mean. Let's take a node with 128G and a 1G pagepool. >>> >>> If I do the following to simulate what might happen as various jobs >>> tweak the pagepool: >>> >>> - tschpool 64G >>> - tschpool 1G >>> - tschpool 32G >>> - tschpool 1G >>> - tschpool 32G >>> >>> I end up with this: >>> >>> mmfsd thinks there's 32G resident but 64G virt >>> # ps -o vsz,rss,comm -p 24397 >>> VSZ RSS COMMAND >>> 67589400 33723236 mmfsd >>> >>> however, linux thinks there's ~100G used >>> >>> # free -g >>> total used free shared buffers >> cached >>> Mem: 125 100 25 0 0 >> 0 >>> -/+ buffers/cache: 98 26 >>> Swap: 7 0 7 >>> >>> I can jump back and forth between 1G and 32G *after* allocating 64G >>> pagepool and the overall amount of memory in use doesn't balloon but I >>> can't seem to shed that original 64G. >>> >>> I don't understand what's going on... :) Any ideas? This is with Scale >>> 4.2.3.6. >>> >>> -Aaron >>> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Sun Feb 25 16:59:45 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 25 Feb 2018 11:59:45 -0500 Subject: [gpfsug-discuss] [non-nasa source] Re: pagepool shrink doesn't release all memory In-Reply-To: References: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> Message-ID: <79885b2d-947d-4098-89bd-09b764635847@nasa.gov> Oh, and I think you're absolutely right about the rdma interaction. If I stop the infiniband service on a node and try the same exercise again, I can jump between 100G and 1G several times and the free'd memory is actually released. -Aaron On 2/25/18 11:54 AM, Aaron Knister wrote: > Hi Stijn, > > Thanks for sharing your experiences-- I'm glad I'm not the only one > whose had the idea (and come up empty handed). > > About the pagpool and numa awareness, I'd remembered seeing something > about that somewhere and I did some googling and found there's a > parameter called numaMemoryInterleave that "starts mmfsd with numactl > --interleave=all". Do you think that provides the kind of numa awareness > you're looking for? > > -Aaron > > On 2/23/18 9:44 AM, Stijn De Weirdt wrote: >> hi all, >> >> we had the same idea long ago, afaik the issue we had was due to the >> pinned memory the pagepool uses when RDMA is enabled. >> >> at some point we restarted gpfs on the compute nodes for each job, >> similar to the way we do swapoff/swapon; but in certain scenarios gpfs >> really did not like it; so we gave up on it. >> >> the other issue that needs to be resolved is that the pagepool needs to >> be numa aware, so the pagepool is nicely allocated across all numa >> domains, instead of using the first ones available. otherwise compute >> jobs might start that only do non-local doamin memeory access. >> >> stijn >> >> On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: >>> AFAIK you can increase the pagepool size dynamically but you cannot >>> shrink >>> it dynamically.? To shrink it you must restart the GPFS daemon.?? Also, >>> could you please provide the actual pmap commands you executed? >>> >>> Regards, The Spectrum Scale (GPFS) team >>> >>> ------------------------------------------------------------------------------------------------------------------ >>> >>> If you feel that your question can benefit other users of? Spectrum >>> Scale >>> (GPFS), then please post it to the public IBM developerWroks Forum at >>> https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 >>> >>> . >>> >>> If your query concerns a potential software error in Spectrum Scale >>> (GPFS) >>> and you have an IBM software maintenance contract please contact >>> 1-800-237-5511 in the United States or your local IBM Service Center in >>> other countries. >>> >>> The forum is informally monitored as time permits and should not be used >>> for priority messages to the Spectrum Scale (GPFS) team. >>> >>> >>> >>> From:?? Aaron Knister >>> To:???? >>> Date:?? 02/22/2018 10:30 PM >>> Subject:??????? Re: [gpfsug-discuss] pagepool shrink doesn't release all >>> memory >>> Sent by:??????? gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> This is also interesting (although I don't know what it really means). >>> Looking at pmap run against mmfsd I can see what happens after each >>> step: >>> >>> # baseline >>> 00007fffe4639000? 59164K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 00007fffd837e000? 61960K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 0000020000000000 1048576K 1048576K 1048576K 1048576K????? 0K rwxp [anon] >>> Total:?????????? 1613580K 1191020K 1189650K 1171836K????? 0K >>> >>> # tschpool 64G >>> 00007fffe4639000? 59164K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 00007fffd837e000? 61960K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 0000020000000000 67108864K 67108864K 67108864K 67108864K????? 0K rwxp >>> [anon] >>> Total:?????????? 67706636K 67284108K 67282625K 67264920K????? 0K >>> >>> # tschpool 1G >>> 00007fffe4639000? 59164K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 00007fffd837e000? 61960K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 0000020001400000 139264K 139264K 139264K 139264K????? 0K rwxp [anon] >>> 0000020fc9400000 897024K 897024K 897024K 897024K????? 0K rwxp [anon] >>> 0000020009c00000 66052096K????? 0K????? 0K????? 0K????? 0K rwxp [anon] >>> Total:?????????? 67706636K 1223820K 1222451K 1204632K????? 0K >>> >>> Even though mmfsd has that 64G chunk allocated there's none of it >>> *used*. I wonder why Linux seems to be accounting it as allocated. >>> >>> -Aaron >>> >>> On 2/22/18 10:17 PM, Aaron Knister wrote: >>>> I've been exploring the idea for a while of writing a SLURM SPANK >>>> plugin >>> >>>> to allow users to dynamically change the pagepool size on a node. Every >>>> now and then we have some users who would benefit significantly from a >>>> much larger pagepool on compute nodes but by default keep it on the >>>> smaller side to make as much physmem available as possible to batch >>> work. >>>> >>>> In testing, though, it seems as though reducing the pagepool doesn't >>>> quite release all of the memory. I don't really understand it because >>>> I've never before seen memory that was previously resident become >>>> un-resident but still maintain the virtual memory allocation. >>>> >>>> Here's what I mean. Let's take a node with 128G and a 1G pagepool. >>>> >>>> If I do the following to simulate what might happen as various jobs >>>> tweak the pagepool: >>>> >>>> - tschpool 64G >>>> - tschpool 1G >>>> - tschpool 32G >>>> - tschpool 1G >>>> - tschpool 32G >>>> >>>> I end up with this: >>>> >>>> mmfsd thinks there's 32G resident but 64G virt >>>> # ps -o vsz,rss,comm -p 24397 >>>> ???? VSZ?? RSS COMMAND >>>> 67589400 33723236 mmfsd >>>> >>>> however, linux thinks there's ~100G used >>>> >>>> # free -g >>>> ?????????????? total?????? used?????? free???? shared??? buffers >>> cached >>>> Mem:?????????? 125??????? 100???????? 25????????? 0????????? 0 >>> 0 >>>> -/+ buffers/cache:???????? 98???????? 26 >>>> Swap:??????????? 7????????? 0????????? 7 >>>> >>>> I can jump back and forth between 1G and 32G *after* allocating 64G >>>> pagepool and the overall amount of memory in use doesn't balloon but I >>>> can't seem to shed that original 64G. >>>> >>>> I don't understand what's going on... :) Any ideas? This is with Scale >>>> 4.2.3.6. >>>> >>>> -Aaron >>>> >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Sun Feb 25 17:49:38 2018 From: oehmes at gmail.com (Sven Oehme) Date: Sun, 25 Feb 2018 17:49:38 +0000 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> Message-ID: Hi, i guess you saw that in some of my presentations about communication code overhaul. we started in 4.2.X and since then added more and more numa awareness to GPFS. Version 5.0 also has enhancements in this space. sven On Sun, Feb 25, 2018 at 8:54 AM Aaron Knister wrote: > Hi Stijn, > > Thanks for sharing your experiences-- I'm glad I'm not the only one > whose had the idea (and come up empty handed). > > About the pagpool and numa awareness, I'd remembered seeing something > about that somewhere and I did some googling and found there's a > parameter called numaMemoryInterleave that "starts mmfsd with numactl > --interleave=all". Do you think that provides the kind of numa awareness > you're looking for? > > -Aaron > > On 2/23/18 9:44 AM, Stijn De Weirdt wrote: > > hi all, > > > > we had the same idea long ago, afaik the issue we had was due to the > > pinned memory the pagepool uses when RDMA is enabled. > > > > at some point we restarted gpfs on the compute nodes for each job, > > similar to the way we do swapoff/swapon; but in certain scenarios gpfs > > really did not like it; so we gave up on it. > > > > the other issue that needs to be resolved is that the pagepool needs to > > be numa aware, so the pagepool is nicely allocated across all numa > > domains, instead of using the first ones available. otherwise compute > > jobs might start that only do non-local doamin memeory access. > > > > stijn > > > > On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: > >> AFAIK you can increase the pagepool size dynamically but you cannot > shrink > >> it dynamically. To shrink it you must restart the GPFS daemon. Also, > >> could you please provide the actual pmap commands you executed? > >> > >> Regards, The Spectrum Scale (GPFS) team > >> > >> > ------------------------------------------------------------------------------------------------------------------ > >> If you feel that your question can benefit other users of Spectrum > Scale > >> (GPFS), then please post it to the public IBM developerWroks Forum at > >> > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > >> . > >> > >> If your query concerns a potential software error in Spectrum Scale > (GPFS) > >> and you have an IBM software maintenance contract please contact > >> 1-800-237-5511 <(800)%20237-5511> in the United States or your local > IBM Service Center in > >> other countries. > >> > >> The forum is informally monitored as time permits and should not be used > >> for priority messages to the Spectrum Scale (GPFS) team. > >> > >> > >> > >> From: Aaron Knister > >> To: > >> Date: 02/22/2018 10:30 PM > >> Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all > >> memory > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> > >> > >> > >> This is also interesting (although I don't know what it really means). > >> Looking at pmap run against mmfsd I can see what happens after each > step: > >> > >> # baseline > >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > >> 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] > >> Total: 1613580K 1191020K 1189650K 1171836K 0K > >> > >> # tschpool 64G > >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > >> 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp > >> [anon] > >> Total: 67706636K 67284108K 67282625K 67264920K 0K > >> > >> # tschpool 1G > >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > >> 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] > >> 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] > >> 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] > >> Total: 67706636K 1223820K 1222451K 1204632K 0K > >> > >> Even though mmfsd has that 64G chunk allocated there's none of it > >> *used*. I wonder why Linux seems to be accounting it as allocated. > >> > >> -Aaron > >> > >> On 2/22/18 10:17 PM, Aaron Knister wrote: > >>> I've been exploring the idea for a while of writing a SLURM SPANK > plugin > >> > >>> to allow users to dynamically change the pagepool size on a node. Every > >>> now and then we have some users who would benefit significantly from a > >>> much larger pagepool on compute nodes but by default keep it on the > >>> smaller side to make as much physmem available as possible to batch > >> work. > >>> > >>> In testing, though, it seems as though reducing the pagepool doesn't > >>> quite release all of the memory. I don't really understand it because > >>> I've never before seen memory that was previously resident become > >>> un-resident but still maintain the virtual memory allocation. > >>> > >>> Here's what I mean. Let's take a node with 128G and a 1G pagepool. > >>> > >>> If I do the following to simulate what might happen as various jobs > >>> tweak the pagepool: > >>> > >>> - tschpool 64G > >>> - tschpool 1G > >>> - tschpool 32G > >>> - tschpool 1G > >>> - tschpool 32G > >>> > >>> I end up with this: > >>> > >>> mmfsd thinks there's 32G resident but 64G virt > >>> # ps -o vsz,rss,comm -p 24397 > >>> VSZ RSS COMMAND > >>> 67589400 33723236 mmfsd > >>> > >>> however, linux thinks there's ~100G used > >>> > >>> # free -g > >>> total used free shared buffers > >> cached > >>> Mem: 125 100 25 0 0 > >> 0 > >>> -/+ buffers/cache: 98 26 > >>> Swap: 7 0 7 > >>> > >>> I can jump back and forth between 1G and 32G *after* allocating 64G > >>> pagepool and the overall amount of memory in use doesn't balloon but I > >>> can't seem to shed that original 64G. > >>> > >>> I don't understand what's going on... :) Any ideas? This is with Scale > >>> 4.2.3.6. > >>> > >>> -Aaron > >>> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Feb 26 12:20:52 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 26 Feb 2018 20:20:52 +0800 Subject: [gpfsug-discuss] Finding all bulletins and APARs In-Reply-To: References: Message-ID: Hi John, For all Flashes, alerts and bulletins for IBM Spectrum Scale, please check this link: https://www.ibm.com/support/home/search-results/10000060/system_storage/storage_software/software_defined_storage/ibm_spectrum_scale?filter=DC.Type_avl:CT792,CT555,CT755&sortby=-dcdate_sortrange&ct=fab For any other content which you got in the notification, please check this link: https://www.ibm.com/support/home/search-results/10000060/IBM_Spectrum_Scale?docOnly=true&sortby=-dcdate_sortrange&ct=rc Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 05:28 PM Subject: [gpfsug-discuss] Finding all bulletins and APARs Sent by: gpfsug-discuss-bounces at spectrumscale.org Firstly, let me apologise for not thanking people who hav ereplied to me on this list with help. I have indeed replied and thanked you ? however the list software has taken a dislike to my email address. I am currently on the myibm support site. I am looking for a specific APAR on Spectrum Scale. However I want to be able to get a list of all APARs and bulletins for Spectrum Scale, right up to date. I do get email alerts but somehow I suspect I am not getting them all, and it is a pain to search back in your email. Thanks John H -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=v0fVzSMP-N6VctcEcAQKTLJlrvu0WUry8rSo41ia-mY&s=_zoOdAst7NdP-PByM7WrniXyNLofARAf9hayK0BF5rU&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jan.sundermann at kit.edu Mon Feb 26 16:38:46 2018 From: jan.sundermann at kit.edu (Sundermann, Jan Erik (SCC)) Date: Mon, 26 Feb 2018 16:38:46 +0000 Subject: [gpfsug-discuss] Problems with remote mount via routed IB Message-ID: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Dear all we are currently trying to remote mount a file system in a routed Infiniband test setup and face problems with dropped RDMA connections. The setup is the following: - Spectrum Scale Cluster 1 is setup on four servers which are connected to the same infiniband network. Additionally they are connected to a fast ethernet providing ip communication in the network 192.168.11.0/24. - Spectrum Scale Cluster 2 is setup on four additional servers which are connected to a second infiniband network. These servers have IPs on their IB interfaces in the network 192.168.12.0/24. - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated machine. - We have a dedicated IB hardware router connected to both IB subnets. We tested that the routing, both IP and IB, is working between the two clusters without problems and that RDMA is working fine both for internal communication inside cluster 1 and cluster 2 When trying to remote mount a file system from cluster 1 in cluster 2, RDMA communication is not working as expected. Instead we see error messages on the remote host (cluster 2) 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 1 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 0 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 2 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 and in the cluster with the file system (cluster 1) 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 Any advice on how to configure the setup in a way that would allow the remote mount via routed IB would be very appreciated. Thank you and best regards Jan Erik -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5252 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Feb 26 19:16:34 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 26 Feb 2018 14:16:34 -0500 Subject: [gpfsug-discuss] Problems with remote mount via routed IB In-Reply-To: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> References: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Message-ID: Hi Jan Erik, It was my understanding that the IB hardware router required RDMA CM to work. By default GPFS doesn't use the RDMA Connection Manager but it can be enabled (e.g. verbsRdmaCm=enable). I think this requires a restart on clients/servers (in both clusters) to take effect. Maybe someone else on the list can comment in more detail-- I've been told folks have successfully deployed IB routers with GPFS. -Aaron On 2/26/18 11:38 AM, Sundermann, Jan Erik (SCC) wrote: > > Dear all > > we are currently trying to remote mount a file system in a routed Infiniband test setup and face problems with dropped RDMA connections. The setup is the following: > > - Spectrum Scale Cluster 1 is setup on four servers which are connected to the same infiniband network. Additionally they are connected to a fast ethernet providing ip communication in the network 192.168.11.0/24. > > - Spectrum Scale Cluster 2 is setup on four additional servers which are connected to a second infiniband network. These servers have IPs on their IB interfaces in the network 192.168.12.0/24. > > - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated machine. > > - We have a dedicated IB hardware router connected to both IB subnets. > > > We tested that the routing, both IP and IB, is working between the two clusters without problems and that RDMA is working fine both for internal communication inside cluster 1 and cluster 2 > > When trying to remote mount a file system from cluster 1 in cluster 2, RDMA communication is not working as expected. Instead we see error messages on the remote host (cluster 2) > > > 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 > 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 > 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 1 > 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 > 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1 > 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 0 > 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 > 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0 > 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 2 > 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 > 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 > 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > > > and in the cluster with the file system (cluster 1) > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > > > > Any advice on how to configure the setup in a way that would allow the remote mount via routed IB would be very appreciated. > > > Thank you and best regards > Jan Erik > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From john.hearns at asml.com Tue Feb 27 09:17:36 2018 From: john.hearns at asml.com (John Hearns) Date: Tue, 27 Feb 2018 09:17:36 +0000 Subject: [gpfsug-discuss] Problems with remote mount via routed IB In-Reply-To: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> References: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Message-ID: Jan Erik, Can you clarify if you are routing IP traffic between the two Infiniband networks. Or are you routing Infiniband traffic? If I can be of help I manage an Infiniband network which connects to other IP networks using Mellanox VPI gateways, which proxy arp between IB and Ethernet. But I am not running GPFS traffic over these. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sundermann, Jan Erik (SCC) Sent: Monday, February 26, 2018 5:39 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Problems with remote mount via routed IB Dear all we are currently trying to remote mount a file system in a routed Infiniband test setup and face problems with dropped RDMA connections. The setup is the following: - Spectrum Scale Cluster 1 is setup on four servers which are connected to the same infiniband network. Additionally they are connected to a fast ethernet providing ip communication in the network 192.168.11.0/24. - Spectrum Scale Cluster 2 is setup on four additional servers which are connected to a second infiniband network. These servers have IPs on their IB interfaces in the network 192.168.12.0/24. - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated machine. - We have a dedicated IB hardware router connected to both IB subnets. We tested that the routing, both IP and IB, is working between the two clusters without problems and that RDMA is working fine both for internal communication inside cluster 1 and cluster 2 When trying to remote mount a file system from cluster 1 in cluster 2, RDMA communication is not working as expected. Instead we see error messages on the remote host (cluster 2) 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 1 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 0 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 2 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 and in the cluster with the file system (cluster 1) 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 Any advice on how to configure the setup in a way that would allow the remote mount via routed IB would be very appreciated. Thank you and best regards Jan Erik -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From alex at calicolabs.com Tue Feb 27 22:25:30 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Tue, 27 Feb 2018 14:25:30 -0800 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> Message-ID: Hi, My experience has been that you could spend the same money to just make your main pool more performant. Instead of doing two data transfers (one from cold pool to AFM or hot pools, one from AFM/hot to client), you can just make the direct access of the data faster by adding more resources to your main pool. Regards, Alex On Thu, Feb 22, 2018 at 5:27 PM, wrote: > Thanks, I will try the file heat feature but i am really not sure, if it > would work - since the code can access cold files too, and not necessarily > files recently accessed/hot files. > > With respect to LROC. Let me explain as below: > > The use case is that - > The code initially reads headers (small region of data) from thousands of > files as the first step. For example about 30,000 of them with each about > 300MB to 500MB in size. > After the first step, with the help of those headers - it mmaps/seeks > across various regions of a set of files in parallel. > Since its all small IOs and it was really slow at reading from GPFS over > the network directly from disks - Our idea was to use AFM which i believe > fetches all file data into flash/ssds, once the initial few blocks of the > files are read. > But again - AFM seems to not solve the problem, so i want to know if LROC > behaves in the same way as AFM, where all of the file data is prefetched in > full block size utilizing all the worker threads - if few blocks of the > file is read initially. > > Thanks, > Lohit > > On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale , > wrote: > > My apologies for not being more clear on the flash storage pool. I meant > that this would be just another GPFS storage pool in the same cluster, so > no separate AFM cache cluster. You would then use the file heat feature to > ensure more frequently accessed files are migrated to that all flash > storage pool. > > As for LROC could you please clarify what you mean by a few headers/stubs > of the file? In reading the LROC documentation and the LROC variables > available in the mmchconfig command I think you might want to take a look a > the lrocDataStubFileSize variable since it seems to apply to your situation. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: valleru at cbio.mskcc.org > To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org > Date: 02/22/2018 04:21 PM > Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > storage > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Thank you. > > I am sorry if i was not clear, but the metadata pool is all on SSDs in the > GPFS clusters that we use. Its just the data pool that is on Near-Line > Rotating disks. > I understand that AFM might not be able to solve the issue, and I will try > and see if file heat works for migrating the files to flash tier. > You mentioned an all flash storage pool for heavily used files - so you > mean a different GPFS cluster just with flash storage, and to manually copy > the files to flash storage whenever needed? > The IO performance that i am talking is prominently for reads, so you > mention that LROC can work in the way i want it to? that is prefetch all > the files into LROC cache, after only few headers/stubs of data are read > from those files? > I thought LROC only keeps that block of data that is prefetched from the > disk, and will not prefetch the whole file if a stub of data is read. > Please do let me know, if i understood it wrong. > > On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , > wrote: > I do not think AFM is intended to solve the problem you are trying to > solve. If I understand your scenario correctly you state that you are > placing metadata on NL-SAS storage. If that is true that would not be wise > especially if you are going to do many metadata operations. I suspect your > performance issues are partially due to the fact that metadata is being > stored on NL-SAS storage. You stated that you did not think the file heat > feature would do what you intended but have you tried to use it to see if > it could solve your problem? I would think having metadata on SSD/flash > storage combined with a all flash storage pool for your heavily used files > would perform well. If you expect IO usage will be such that there will be > far more reads than writes then LROC should be beneficial to your overall > performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > *https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479* > > . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: valleru at cbio.mskcc.org > To: gpfsug main discussion list > Date: 02/22/2018 03:11 PM > Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage > in front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. > The backend storage will/can be tuned to give out large streaming bandwidth > and enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS > SSD cluster in front end that uses AFM and acts as a cache cluster with the > backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > ? > If the compute nodes read headers of thousands of large files ranging from > 100MB to 1GB, the AFM cluster should be able to bring up enough threads to > bring up all of the files from the backend to the faster SSD/Flash GPFS > cluster. > The working set might be about 100T, at a time which i want to be on a > faster/low latency tier, and the rest of the files to be in slower tier > until they are read by the compute nodes. > ? > > > I do not want to use GPFS policies to achieve the above, is because i am > not sure - if policies could be written in a way, that files are moved from > the slower tier to faster tier depending on how the jobs interact with the > files. > I know that the policies could be written depending on the heat, and > size/format but i don?t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM > cache cluster before the near line storage. However the AFM cluster was > really really slow, It took it about few hours to copy the files from near > line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not > tuned to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM > works. > > Has anyone tried or know if GPFS supports an architecture - where the fast > tier can bring up thousands of threads and copy the files almost > instantly/asynchronously from the slow tier, whenever the jobs from compute > nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be > really fast, as well as the network between the AFM cluster and the backend > cluster. > > Please do also let me know, if the above workflow can be done using GPFS > policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______ > ________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z > 6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From coetzee.ray at gmail.com Tue Feb 27 23:54:17 2018 From: coetzee.ray at gmail.com (Ray Coetzee) Date: Tue, 27 Feb 2018 23:54:17 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 73, Issue 60 In-Reply-To: References: Message-ID: Hi Lohit Using mmap based applications against GPFS has a number of challenges. For me the main challenge is that mmap threads can fragment the IO into multiple strided reads at random offsets which defeats GPFS's attempts in prefetching the file contents. LROC, as the name implies, is only a Local Read Only Cache and functions as an extension of your local page pool on the client. You would only see a performance improvement if the file(s) have been read into the local pagepool on a previous occasion. Depending on the dataset size & the NVMe/SSDs you have for LROC, you could look at using a pre-job to read the file(s) in their entirety on the compute node before the mmap process starts, as this would ensure the relevant data blocks are in the local pagepool or LROC. Another solution I've seen is to stage the dataset into tmpfs. Sven is working on improvements for mmap on GPFS that may make it into a production release so keep an eye out for an update. Kind regards Ray Coetzee On Tue, Feb 27, 2018 at 10:25 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Problems with remote mount via routed IB (John Hearns) > 2. Re: GPFS and Flash/SSD Storage tiered storage (Alex Chekholko) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 27 Feb 2018 09:17:36 +0000 > From: John Hearns > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Problems with remote mount via routed IB > Message-ID: > eurprd02.prod.outlook.com> > > Content-Type: text/plain; charset="us-ascii" > > Jan Erik, > Can you clarify if you are routing IP traffic between the two > Infiniband networks. > Or are you routing Infiniband traffic? > > > If I can be of help I manage an Infiniband network which connects to other > IP networks using Mellanox VPI gateways, which proxy arp between IB and > Ethernet. But I am not running GPFS traffic over these. > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss- > bounces at spectrumscale.org] On Behalf Of Sundermann, Jan Erik (SCC) > Sent: Monday, February 26, 2018 5:39 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Problems with remote mount via routed IB > > > Dear all > > we are currently trying to remote mount a file system in a routed > Infiniband test setup and face problems with dropped RDMA connections. The > setup is the following: > > - Spectrum Scale Cluster 1 is setup on four servers which are connected to > the same infiniband network. Additionally they are connected to a fast > ethernet providing ip communication in the network 192.168.11.0/24. > > - Spectrum Scale Cluster 2 is setup on four additional servers which are > connected to a second infiniband network. These servers have IPs on their > IB interfaces in the network 192.168.12.0/24. > > - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated > machine. > > - We have a dedicated IB hardware router connected to both IB subnets. > > > We tested that the routing, both IP and IB, is working between the two > clusters without problems and that RDMA is working fine both for internal > communication inside cluster 1 and cluster 2 > > When trying to remote mount a file system from cluster 1 in cluster 2, > RDMA communication is not working as expected. Instead we see error > messages on the remote host (cluster 2) > > > 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 2 > 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 3 > 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to > 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 1 > 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 3 > 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 > 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 1 > 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to > 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 0 > 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 > 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 0 > 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to > 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 2 > 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 2 > 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 3 > 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 3 > > > and in the cluster with the file system (cluster 1) > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > > > > Any advice on how to configure the setup in a way that would allow the > remote mount via routed IB would be very appreciated. > > > Thank you and best regards > Jan Erik > > > -- The information contained in this communication and any attachments is > confidential and may be privileged, and is for the sole use of the intended > recipient(s). Any unauthorized review, use, disclosure or distribution is > prohibited. Unless explicitly stated otherwise in the body of this > communication or the attachment thereto (if any), the information is > provided on an AS-IS basis without any express or implied warranties or > liabilities. To the extent you are relying on this information, you are > doing so at your own risk. If you are not the intended recipient, please > notify the sender immediately by replying to this message and destroy all > copies of this message and any attachments. Neither the sender nor the > company/group of companies he or she represents shall be liable for the > proper and complete transmission of the information contained in this > communication, or for any delay in its receipt. > > > ------------------------------ > > Message: 2 > Date: Tue, 27 Feb 2018 14:25:30 -0800 > From: Alex Chekholko > To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org > Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > storage > Message-ID: > mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > My experience has been that you could spend the same money to just make > your main pool more performant. Instead of doing two data transfers (one > from cold pool to AFM or hot pools, one from AFM/hot to client), you can > just make the direct access of the data faster by adding more resources to > your main pool. > > Regards, > Alex > > On Thu, Feb 22, 2018 at 5:27 PM, wrote: > > > Thanks, I will try the file heat feature but i am really not sure, if it > > would work - since the code can access cold files too, and not > necessarily > > files recently accessed/hot files. > > > > With respect to LROC. Let me explain as below: > > > > The use case is that - > > The code initially reads headers (small region of data) from thousands of > > files as the first step. For example about 30,000 of them with each about > > 300MB to 500MB in size. > > After the first step, with the help of those headers - it mmaps/seeks > > across various regions of a set of files in parallel. > > Since its all small IOs and it was really slow at reading from GPFS over > > the network directly from disks - Our idea was to use AFM which i believe > > fetches all file data into flash/ssds, once the initial few blocks of the > > files are read. > > But again - AFM seems to not solve the problem, so i want to know if LROC > > behaves in the same way as AFM, where all of the file data is prefetched > in > > full block size utilizing all the worker threads - if few blocks of the > > file is read initially. > > > > Thanks, > > Lohit > > > > On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale , > > wrote: > > > > My apologies for not being more clear on the flash storage pool. I meant > > that this would be just another GPFS storage pool in the same cluster, so > > no separate AFM cache cluster. You would then use the file heat feature > to > > ensure more frequently accessed files are migrated to that all flash > > storage pool. > > > > As for LROC could you please clarify what you mean by a few headers/stubs > > of the file? In reading the LROC documentation and the LROC variables > > available in the mmchconfig command I think you might want to take a > look a > > the lrocDataStubFileSize variable since it seems to apply to your > situation. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------ > > ------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > > (GPFS), then please post it to the public IBM developerWroks Forum at > > https://www.ibm.com/developerworks/community/ > > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) > > and you have an IBM software maintenance contract please contact > > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > > Service Center in other countries. > > > > The forum is informally monitored as time permits and should not be used > > for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: valleru at cbio.mskcc.org > > To: gpfsug main discussion list > > > Cc: gpfsug-discuss-bounces at spectrumscale.org > > Date: 02/22/2018 04:21 PM > > Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > > storage > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------ > > > > > > > > Thank you. > > > > I am sorry if i was not clear, but the metadata pool is all on SSDs in > the > > GPFS clusters that we use. Its just the data pool that is on Near-Line > > Rotating disks. > > I understand that AFM might not be able to solve the issue, and I will > try > > and see if file heat works for migrating the files to flash tier. > > You mentioned an all flash storage pool for heavily used files - so you > > mean a different GPFS cluster just with flash storage, and to manually > copy > > the files to flash storage whenever needed? > > The IO performance that i am talking is prominently for reads, so you > > mention that LROC can work in the way i want it to? that is prefetch all > > the files into LROC cache, after only few headers/stubs of data are read > > from those files? > > I thought LROC only keeps that block of data that is prefetched from the > > disk, and will not prefetch the whole file if a stub of data is read. > > Please do let me know, if i understood it wrong. > > > > On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , > > wrote: > > I do not think AFM is intended to solve the problem you are trying to > > solve. If I understand your scenario correctly you state that you are > > placing metadata on NL-SAS storage. If that is true that would not be > wise > > especially if you are going to do many metadata operations. I suspect > your > > performance issues are partially due to the fact that metadata is being > > stored on NL-SAS storage. You stated that you did not think the file > heat > > feature would do what you intended but have you tried to use it to see if > > it could solve your problem? I would think having metadata on SSD/flash > > storage combined with a all flash storage pool for your heavily used > files > > would perform well. If you expect IO usage will be such that there will > be > > far more reads than writes then LROC should be beneficial to your overall > > performance. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------ > > ------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > > (GPFS), then please post it to the public IBM developerWroks Forum at > > *https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479* > > forums/html/forum?id=11111111-0000-0000-0000-000000000479> > > . > > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) > > and you have an IBM software maintenance contract please contact > > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > > Service Center in other countries. > > > > The forum is informally monitored as time permits and should not be used > > for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: valleru at cbio.mskcc.org > > To: gpfsug main discussion list > > > Date: 02/22/2018 03:11 PM > > Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > storage > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------ > > > > > > > > Hi All, > > > > I am trying to figure out a GPFS tiering architecture with flash storage > > in front end and near line storage as backend, for Supercomputing > > > > The Backend storage will be a GPFS storage on near line of about 8-10PB. > > The backend storage will/can be tuned to give out large streaming > bandwidth > > and enough metadata disks to make the stat of all these files fast > enough. > > > > I was thinking if it would be possible to use a GPFS flash cluster or > GPFS > > SSD cluster in front end that uses AFM and acts as a cache cluster with > the > > backend GPFS cluster. > > > > At the end of this .. the workflow that i am targeting is where: > > > > > > ? > > If the compute nodes read headers of thousands of large files ranging > from > > 100MB to 1GB, the AFM cluster should be able to bring up enough threads > to > > bring up all of the files from the backend to the faster SSD/Flash GPFS > > cluster. > > The working set might be about 100T, at a time which i want to be on a > > faster/low latency tier, and the rest of the files to be in slower tier > > until they are read by the compute nodes. > > ? > > > > > > I do not want to use GPFS policies to achieve the above, is because i am > > not sure - if policies could be written in a way, that files are moved > from > > the slower tier to faster tier depending on how the jobs interact with > the > > files. > > I know that the policies could be written depending on the heat, and > > size/format but i don?t think thes policies work in a similar way as > above. > > > > I did try the above architecture, where an SSD GPFS cluster acts as an > AFM > > cache cluster before the near line storage. However the AFM cluster was > > really really slow, It took it about few hours to copy the files from > near > > line storage to AFM cache cluster. > > I am not sure if AFM is not designed to work this way, or if AFM is not > > tuned to work as fast as it should. > > > > I have tried LROC too, but it does not behave the same way as i guess AFM > > works. > > > > Has anyone tried or know if GPFS supports an architecture - where the > fast > > tier can bring up thousands of threads and copy the files almost > > instantly/asynchronously from the slow tier, whenever the jobs from > compute > > nodes reads few blocks from these files? > > I understand that with respect to hardware - the AFM cluster should be > > really fast, as well as the network between the AFM cluster and the > backend > > cluster. > > > > Please do also let me know, if the above workflow can be done using GPFS > > policies and be as fast as it is needed to be. > > > > Regards, > > Lohit > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_ > listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s= > AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=* > > listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s= > AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=> > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______ > > ________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_ > iaSHvJObTbx-siA1ZOg&r= > > IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z > > 6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20180227/be7c09c4/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 73, Issue 60 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuartb at 4gh.net Wed Feb 28 17:49:47 2018 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 28 Feb 2018 12:49:47 -0500 (EST) Subject: [gpfsug-discuss] Problems with remote mount via routed IB In-Reply-To: References: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Message-ID: The problem with CM is that it seems to require configuring IP over Infiniband. I'm rather strongly opposed to IP over IB. We did run IPoIB years ago, but pulled it out of our environment as adding unneeded complexity. It requires provisioning IP addresses across the Infiniband infrastructure and possibly adding routers to other portions of the IP infrastructure. It was also confusing some users due to multiple IPs on the compute infrastructure. We have recently been in discussions with a vendor about their support for GPFS over IB and they kept directing us to using CM (which still didn't work). CM wasn't necessary once we found out about the actual problem (we needed the undocumented verbsRdmaUseGidIndexZero configuration option among other things due to their use of SR-IOV based virtual IB interfaces). We don't use routed Infiniband and it might be that CM and IPoIB is required for IB routing, but I doubt it. It sounds like the OP is keeping IB and IP infrastructure separate. Stuart Barkley On Mon, 26 Feb 2018 at 14:16 -0000, Aaron Knister wrote: > Date: Mon, 26 Feb 2018 14:16:34 > From: Aaron Knister > Reply-To: gpfsug main discussion list > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Problems with remote mount via routed IB > > Hi Jan Erik, > > It was my understanding that the IB hardware router required RDMA CM to work. > By default GPFS doesn't use the RDMA Connection Manager but it can be enabled > (e.g. verbsRdmaCm=enable). I think this requires a restart on clients/servers > (in both clusters) to take effect. Maybe someone else on the list can comment > in more detail-- I've been told folks have successfully deployed IB routers > with GPFS. > > -Aaron > > On 2/26/18 11:38 AM, Sundermann, Jan Erik (SCC) wrote: > > > > Dear all > > > > we are currently trying to remote mount a file system in a routed Infiniband > > test setup and face problems with dropped RDMA connections. The setup is the > > following: > > > > - Spectrum Scale Cluster 1 is setup on four servers which are connected to > > the same infiniband network. Additionally they are connected to a fast > > ethernet providing ip communication in the network 192.168.11.0/24. > > > > - Spectrum Scale Cluster 2 is setup on four additional servers which are > > connected to a second infiniband network. These servers have IPs on their IB > > interfaces in the network 192.168.12.0/24. > > > > - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated > > machine. > > > > - We have a dedicated IB hardware router connected to both IB subnets. > > > > > > We tested that the routing, both IP and IB, is working between the two > > clusters without problems and that RDMA is working fine both for internal > > communication inside cluster 1 and cluster 2 > > > > When trying to remote mount a file system from cluster 1 in cluster 2, RDMA > > communication is not working as expected. Instead we see error messages on > > the remote host (cluster 2) > > > > > > 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > > 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 2 > > 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to > > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 3 > > 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > > 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to > > 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 1 > > 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 3 > > 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 > > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 > > 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 > > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 1 > > 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to > > 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 0 > > 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 > > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 > > 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 > > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 0 > > 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to > > 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 2 > > 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > > 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 2 > > 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to > > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 3 > > 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 3 > > > > > > and in the cluster with the file system (cluster 1) > > > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > > > > > > > Any advice on how to configure the setup in a way that would allow the > > remote mount via routed IB would be very appreciated. > > > > > > Thank you and best regards > > Jan Erik > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From kkr at lbl.gov Thu Feb 1 18:10:46 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 1 Feb 2018 10:10:46 -0800 Subject: [gpfsug-discuss] Grafana Bridge/OpenTSDB-related question Message-ID: <00D3A984-5CAE-4A17-8948-A3063901701C@lbl.gov> Sorry this is slightly OT from GPFS, but it is an issue I?m bumping up against trying to use Grafana with the IBM-provided OpenTSDB bridge for Zimon stats. My issue is very similar to the one posted here, which comes to a dead end (https://community.grafana.com/t/one-alert-for-group-of-hosts/2090 ) I?d like to use the Grafana alert functionality to monitor for thresholds on individual nodes, NSDs etc. The ugly way to do this would be to add a metric and alert for each node, NSD or whatever I want to watch for threshold crossing. The better way to do this would be to let a query report back the node, NSD, whatever so I can generate an alert such as ?CPU approaching 100% on ? So my question is does anyone have a clever workaround or alternate approach to achieve this goal? Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Feb 2 16:43:51 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 2 Feb 2018 16:43:51 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> Message-ID: The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfosburg at mdanderson.org Fri Feb 2 16:49:36 2018 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Fri, 2 Feb 2018 16:49:36 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> Message-ID: I?ve just reached out to our GPFS architect at IBM. From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Feb 2 17:04:14 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 2 Feb 2018 17:04:14 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Message-ID: <90EF00A9-E89D-48EA-A04B-B069BF81E188@nuance.com> Link takes a bit to be active ? it?s there now. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [EXTERNAL] [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfosburg at mdanderson.org Fri Feb 2 17:03:00 2018 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Fri, 2 Feb 2018 17:03:00 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> Message-ID: <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> The document is now up. From: on behalf of Jonathan Fosburgh Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) I?ve just reached out to our GPFS architect at IBM. From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Feb 2 17:45:36 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 2 Feb 2018 17:45:36 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> , <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> Message-ID: Good stuff. Thanks all. Get Outlook for Android ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Fosburgh,Jonathan Sent: Friday, February 2, 2018 5:03:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The document is now up. From: on behalf of Jonathan Fosburgh Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) I?ve just reached out to our GPFS architect at IBM. From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From SAnderson at convergeone.com Fri Feb 2 19:59:14 2018 From: SAnderson at convergeone.com (Shaun Anderson) Date: Fri, 2 Feb 2018 19:59:14 +0000 Subject: [gpfsug-discuss] In place upgrade of ESS? Message-ID: <1517601554597.83665@convergeone.com> I haven't found a firm answer yet. Is it possible to in place upgrade say, a GL2 to a GL4 and subsequently a GL6? ? Do we know if this feature is coming? SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments hereto may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Feb 2 20:23:36 2018 From: ewahl at osc.edu (Edward Wahl) Date: Fri, 2 Feb 2018 15:23:36 -0500 Subject: [gpfsug-discuss] policy ilm features? In-Reply-To: References: <20180119163803.79fddbeb@osc.edu> Message-ID: <20180202152336.03e8bab7@osc.edu> Thanks John, this was the path I was HOPING to go down as I do similar things already, but there appears to be no extended attribute in ILM for what I want. Data block replication flag exists in the ILM, but not MetaData, or balance. Yet these states ARE reported by mmlsattr, so there must be a flag somewhere. bad MD replication & balance example: mmlsattr -L /fs/scratch/sysp/ed/180days.pol file name: /fs/scratch/sysp/ed/180days.pol metadata replication: 1 max 2 data replication: 1 max 2 flags: illreplicated,unbalanced Encrypted: yes File next to it for comparison. note proper MD replication and balance. mmlsattr -L /fs/scratch/sysp/ed/120days.pol file name: /fs/scratch/sysp/ed/120days.pol metadata replication: 2 max 2 data replication: 1 max 2 flags: Encrypted: yes misc_attributes flags from a policy run showing no difference in status: FJAEu -- /fs/scratch/sysp/ed/180days.pol FJAEu -- /fs/scratch/sysp/ed/120days.pol File system has MD replication enabled, but not Data, so ALL files show "J" ilm flag mmlsfs scratch -m flag value description ------------------- ------------------------ ----------------------------------- -m 2 Default number of metadata replicas mmlsfs scratch -r flag value description ------------------- ------------------------ ----------------------------------- -r 1 Default number of data replicas I poked around a little trying to find out if perhaps using GetXattr would work and show me what I wanted, it does not. All I sem to be able to get is the File Encryption Key. I was hoping perhaps someone had found a cheaper way for this to work rather than hundreds of millions of 'mmlsattr' execs. :-( On the plus side, I've only run across a few of these and all appear to be from before we did the MD replication and re-striping. On the minus, I have NO idea where they are, and they appears to be on both of our filesystems. So several hundred million files to check. Ed On Mon, 22 Jan 2018 08:29:42 +0000 John Hearns wrote: > Ed, > This is not a perfect answer. You need to look at policies for this. I have > been doing something similar recently. > > Something like: > > RULE 'list_file' EXTERNAL LIST 'all-files' EXEC > '/var/mmfs/etc/mmpolicyExec-list' RULE 'listall' list 'all-files' > SHOW( varchar(kb_allocated) || ' ' || varchar(file_size) || ' ' || > varchar(misc_attributes) || ' ' || name || ' ' || fileset_name ) WHERE > REGEX(misc_attributes,'[J]') > > > So this policy shows the kbytes allocates, file size, the miscellaneous > attributes, name and fileset name For all files with miscellaneous > attributes of 'J' which means 'Some data blocks might be ill replicated' > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Edward Wahl > Sent: Friday, January 19, 2018 10:38 PM To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] policy ilm features? > > > This one has been on my list a long time so I figured I'd ask here first > before I open an apar or request an enhancement (most likely). > > Is there a way using the policy engine to determine the following? > > -metadata replication total/current > -unbalanced file > > Looking to catch things like this that stand out on my filesystem without > having to run several hundred million 'mmlsattr's. > > metadata replication: 1 max 2 > flags: unbalanced > > Ed > > > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C056e34c5a8df4d8f10fd08d55f91e73c%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=dnt7vV4TCd68l7fSJnY35eyNM%2B8pNrZElImSZeZbit8%3D&reserved=0 > -- The information contained in this communication and any attachments is > confidential and may be privileged, and is for the sole use of the intended > recipient(s). Any unauthorized review, use, disclosure or distribution is > prohibited. Unless explicitly stated otherwise in the body of this > communication or the attachment thereto (if any), the information is provided > on an AS-IS basis without any express or implied warranties or liabilities. > To the extent you are relying on this information, you are doing so at your > own risk. If you are not the intended recipient, please notify the sender > immediately by replying to this message and destroy all copies of this > message and any attachments. Neither the sender nor the company/group of > companies he or she represents shall be liable for the proper and complete > transmission of the information contained in this communication, or for any > delay in its receipt. _______________________________________________ > gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From S.J.Thompson at bham.ac.uk Fri Feb 2 20:41:42 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 2 Feb 2018 20:41:42 +0000 Subject: [gpfsug-discuss] In place upgrade of ESS? In-Reply-To: <1517601554597.83665@convergeone.com> References: <1517601554597.83665@convergeone.com> Message-ID: If you mean adding storage shelves to increase capacity to an ESS, then no I don't believe it is supported. I think it is supported on the Lenovo DSS-G models, though you have to have a separate DA for each shelf increment so the performance may different between an upgraded Vs complete solution. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of SAnderson at convergeone.com [SAnderson at convergeone.com] Sent: 02 February 2018 19:59 To: gpfsug main discussion list Subject: [gpfsug-discuss] In place upgrade of ESS? I haven't found a firm answer yet. Is it possible to in place upgrade say, a GL2 to a GL4 and subsequently a GL6? ? Do we know if this feature is coming? SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments hereto may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. From aaron.s.knister at nasa.gov Fri Feb 2 20:46:27 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 2 Feb 2018 15:46:27 -0500 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> Message-ID: Has anyone asked for the efix and gotten it? I'm not having much luck so far. -Aaron On 2/2/18 12:45 PM, Sobey, Richard A wrote: > Good stuff. Thanks all. > > Get Outlook for Android > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at spectrumscale.org > on behalf of > Fosburgh,Jonathan > *Sent:* Friday, February 2, 2018 5:03:00 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > ? > > The document is now up. > > ? > > *From: * on behalf of Jonathan > Fosburgh > *Reply-To: *gpfsug main discussion list > *Date: *Friday, February 2, 2018 at 10:59 AM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > > ? > > I?ve just reached out to our GPFS architect at IBM. > > ? > > *From: * on behalf of "Sobey, > Richard A" > *Reply-To: *gpfsug main discussion list > *Date: *Friday, February 2, 2018 at 10:44 AM > *To: *"'gpfsug-discuss at spectrumscale.org'" > > *Subject: *[gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > > ? > > The link goes nowhere ? can anyone point us in the right direction? > > ? > > Thanks > > Richard > > ? > > *From:* IBM My Notifications [mailto:mynotify at stg.events.ihost.com] > *Sent:* 02 February 2018 16:39 > *To:* Sobey, Richard A > *Subject:* FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of > archived sparse files (Linux) (2018.02.02) > > ? > > ? > > *Storage * > > IBM My Notifications > > Check out the *IBM Electronic > Support* > > > > ? > > > > ? > > IBM Spectrum Scale > > > > *: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse > files > (Linux)*** > > > > IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for > Linux environments, in which a sparse file may be silently corrupted > during archival, resulting in the file being restored incorrectly. > > > > ? > > *Subscribe or Unsubscribe*| > *Feedback*| > *Follow us on Twitter*. > > Your support Notifications display in English by default. Machine > translation based on your IBM profile language setting is added if you > specify this option in My defaults within My Notifications. (Note: Not > all languages are available at this time, and the English version always > takes precedence over the machine translated version.) > > > > Get help with technical questions on the dW Answers > forum > > To ensure proper delivery please add > mynotify at stg.events.ihost.comto > your address book. > > You received this email because you are subscribed to IBM My > Notifications as: > r.sobey at imperial.ac.uk** > > Please do not reply to this message as it is generated by an automated > service machine. > > > > > ?International Business Machines Corporation 2018. All rights reserved. > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > ? > > The information contained in this e-mail message may be privileged, > confidential, and/or protected from disclosure. This e-mail message may > contain protected health information (PHI); dissemination of PHI should > comply with applicable federal and state laws. If you are not the > intended recipient, or an authorized representative of the intended > recipient, any further review, disclosure, use, dissemination, > distribution, or copying of this message or any attachment (or the > information contained therein) is strictly prohibited. If you think that > you have received this e-mail message in error, please notify the sender > by return e-mail and delete all references to it and its contents from > your systems. > > The information contained in this e-mail message may be privileged, > confidential, and/or protected from disclosure. This e-mail message may > contain protected health information (PHI); dissemination of PHI should > comply with applicable federal and state laws. If you are not the > intended recipient, or an authorized representative of the intended > recipient, any further review, disclosure, use, dissemination, > distribution, or copying of this message or any attachment (or the > information contained therein) is strictly prohibited. If you think that > you have received this e-mail message in error, please notify the sender > by return e-mail and delete all references to it and its contents from > your systems. > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From ewahl at osc.edu Fri Feb 2 22:17:47 2018 From: ewahl at osc.edu (Edward Wahl) Date: Fri, 2 Feb 2018 17:17:47 -0500 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: <90EF00A9-E89D-48EA-A04B-B069BF81E188@nuance.com> References: <90EF00A9-E89D-48EA-A04B-B069BF81E188@nuance.com> Message-ID: <20180202171747.5e7adeb2@osc.edu> Should we even ask if Spectrum Protect (TSM) is affected? Ed On Fri, 2 Feb 2018 17:04:14 +0000 "Oesterlin, Robert" wrote: > Link takes a bit to be active ? it?s there now. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > From: on behalf of "Sobey, Richard > A" Reply-To: gpfsug main discussion list > Date: Friday, February 2, 2018 at 10:44 AM > To: "'gpfsug-discuss at spectrumscale.org'" > Subject: [EXTERNAL] [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > > The link goes nowhere ? can anyone point us in the right direction? > > Thanks > Richard > > From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] > Sent: 02 February 2018 16:39 > To: Sobey, Richard A > Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived > sparse files (Linux) (2018.02.02) > > > > > Storage > > IBM My Notifications > > Check out the IBM Electronic > Support > > > > > > > IBM Spectrum Scale > > : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files > (Linux) > > IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux > environments, in which a sparse file may be silently corrupted during > archival, resulting in the file being restored incorrectly. > > > Subscribe or > Unsubscribe > | > Feedback > | Follow us on > Twitter. > > Your support Notifications display in English by default. Machine translation > based on your IBM profile language setting is added if you specify this > option in My defaults within My Notifications. (Note: Not all languages are > available at this time, and the English version always takes precedence over > the machine translated version.) > > > Get help with technical questions on the dW Answers > forum > > To ensure proper delivery please add > mynotify at stg.events.ihost.com to your > address book. > > You received this email because you are subscribed to IBM My Notifications as: > r.sobey at imperial.ac.uk > > Please do not reply to this message as it is generated by an automated > service machine. > > > > > ?International Business Machines Corporation 2018. All rights reserved. > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From duersch at us.ibm.com Sat Feb 3 02:32:49 2018 From: duersch at us.ibm.com (Steve Duersch) Date: Fri, 2 Feb 2018 21:32:49 -0500 Subject: [gpfsug-discuss] In place upgrade of ESS? In-Reply-To: References: Message-ID: This has been on our to-do list for quite some time. We hope to have in place hardware upgrade in 2H2018. Steve Duersch Spectrum Scale IBM Poughkeepsie, New York gpfsug-discuss-bounces at spectrumscale.org wrote on 02/02/2018 03:15:33 PM: > > Message: 2 > Date: Fri, 2 Feb 2018 19:59:14 +0000 > From: Shaun Anderson > To: gpfsug main discussion list > Subject: [gpfsug-discuss] In place upgrade of ESS? > Message-ID: <1517601554597.83665 at convergeone.com> > Content-Type: text/plain; charset="iso-8859-1" > > I haven't found a firm answer yet. Is it possible to in place > upgrade say, a GL2 to a GL4 and subsequently a GL6? > > ? > > Do we know if this feature is coming? > > SHAUN ANDERSON > STORAGE ARCHITECT > O 208.577.2112 > M 214.263.7014 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Sun Feb 4 19:58:39 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Sun, 04 Feb 2018 14:58:39 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? Message-ID: <20180204145839.77101pngtlr3qacv@support.scinet.utoronto.ca> Here is what I found for versions 4 & 3.5: * Maximum Number of Dependent Filesets: 10,000 * Maximum Number of Independent Filesets: 1,000 https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets I'm having some difficulty finding published documentation on limitations for version 5: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm Any hints? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From truongv at us.ibm.com Mon Feb 5 13:20:16 2018 From: truongv at us.ibm.com (Truong Vu) Date: Mon, 5 Feb 2018 08:20:16 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: Message-ID: Hi Jamie, The limits are the same in 5.0.0. We'll look into the FAQ. Thanks, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 02/05/2018 07:00 AM Subject: gpfsug-discuss Digest, Vol 73, Issue 9 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) ---------------------------------------------------------------------- Message: 1 Date: Sun, 04 Feb 2018 14:58:39 -0500 From: "Jaime Pinto" To: "gpfsug main discussion list" Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? Message-ID: <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Here is what I found for versions 4 & 3.5: * Maximum Number of Dependent Filesets: 10,000 * Maximum Number of Independent Filesets: 1,000 https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets I'm having some difficulty finding published documentation on limitations for version 5: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm Any hints? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= End of gpfsug-discuss Digest, Vol 73, Issue 9 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Mon Feb 5 13:50:51 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 05 Feb 2018 08:50:51 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: Message-ID: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca> Thanks Truong Jaime Quoting "Truong Vu" : > > Hi Jamie, > > The limits are the same in 5.0.0. We'll look into the FAQ. > > Thanks, > Tru. > > > > > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 02/05/2018 07:00 AM > Subject: gpfsug-discuss Digest, Vol 73, Issue 9 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 04 Feb 2018 14:58:39 -0500 > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > Message-ID: > <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> > Content-Type: text/plain; charset=ISO-8859-1; > DelSp="Yes"; > format="flowed" > > Here is what I found for versions 4 & 3.5: > * Maximum Number of Dependent Filesets: 10,000 > * Maximum Number of Independent Filesets: 1,000 > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets > > > > I'm having some difficulty finding published documentation on > limitations for version 5: > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm > > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm > > > Any hints? > > Thanks > Jaime > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > > > End of gpfsug-discuss Digest, Vol 73, Issue 9 > ********************************************* > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From daniel.kidger at uk.ibm.com Mon Feb 5 14:19:39 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Mon, 5 Feb 2018 14:19:39 +0000 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca> References: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca>, Message-ID: An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon Feb 5 15:02:17 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 05 Feb 2018 10:02:17 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca>, Message-ID: <20180205100217.46131a75yav2wi61@support.scinet.utoronto.ca> We are considering moving from user/group based quotas to path based quotas with nested filesets. We also facing challenges to traverse 'Dependent Filesets' for daily TSM backups of projects and for purging scratch area. We're about to deploy a new GPFS storage cluster, some 12-15PB, 13K+ users and 5K+ groups as the baseline, with expected substantial scaling up within the next 3-5 years in all dimmensions. Therefore, decisions we make now under GPFS v4.x trough v5.x will have consequences in the very near future, if they are not the proper ones. Thanks Jaime Quoting "Daniel Kidger" : > Jamie, I believe at least one of those limits is 'maximum supported' > rather than an architectural limit. Is your use case one which > would push these boundaries? If so care to describe what you would > wish to do? Daniel > > [1] > > DR DANIEL KIDGER > IBM Technical Sales Specialist > Software Defined Solution Sales > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Truong Vu" > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > Date: Mon, Feb 5, 2018 2:56 PM > Thanks Truong > Jaime > > Quoting "Truong Vu" : > >> >> Hi Jamie, >> >> The limits are the same in 5.0.0. We'll look into the FAQ. >> >> Thanks, >> Tru. >> >> >> >> >> From: gpfsug-discuss-request at spectrumscale.org >> To: gpfsug-discuss at spectrumscale.org >> Date: 02/05/2018 07:00 AM >> Subject: gpfsug-discuss Digest, Vol 73, Issue 9 >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at spectrumscale.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e=[2] >> >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at spectrumscale.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at spectrumscale.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) >> >> >> > ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Sun, 04 Feb 2018 14:58:39 -0500 >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" > >> Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? >> Message-ID: >> <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> >> Content-Type: text/plain; charset=ISO-8859-1; >> DelSp="Yes"; >> format="flowed" >> >> Here is what I found for versions 4 & 3.5: >> * Maximum Number of Dependent Filesets: 10,000 >> * Maximum Number of Independent Filesets: 1,000 >> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets[3] >> >> >> >> I'm having some difficulty finding published documentation on >> limitations for version 5: >> >> > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm[4] >> >> >> > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm[5] >> >> >> Any hints? >> >> Thanks >> Jaime >> >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e=[6] >> >> >> >> End of gpfsug-discuss Digest, Vol 73, Issue 9 >> ********************************************* >> >> >> >> > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=Dln7axLq9ej2KttpKZJwLKuvxfSDkPErDQI5KCAQcg4&e=[7] > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=ZMGxi-PBv5-WEGj5RFm1QV0K8azswe9Z-C6rE1ey-UQ&e=[8] > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire > PO6 3AU > > > > Links: > ------ > [1] https://www.youracclaim.com/user/danel-kidger > [2] > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > [3] > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets > [4] > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm > [5] > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm > [6] > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > [7] > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=Dln7axLq9ej2KttpKZJwLKuvxfSDkPErDQI5KCAQcg4&e= > [8] > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=ZMGxi-PBv5-WEGj5RFm1QV0K8azswe9Z-C6rE1ey-UQ&e= > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From jtucker at pixitmedia.com Mon Feb 5 16:11:58 2018 From: jtucker at pixitmedia.com (Jez Tucker) Date: Mon, 5 Feb 2018 16:11:58 +0000 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca> Message-ID: Hi ? IIRC these are hard limits - at least were a year or so ago. I have a customers with ~ 7500 dependent filesets and knocking on the door of the 1000 independent fileset limit. Before independent filesets were 'a thing', projects were created with dependent filesets.? However the arrival of independent filesets, per-fileset snapshotting etc. and improved workflow makes these a per-project primary choice - but with 10x less to operate with :-/ If someone @ IBM fancied upping the #defines x10 and confirming the testing limit, that would be appreciated :-) If you need testing kit, happy to facilitate. Best, Jez On 05/02/18 14:19, Daniel Kidger wrote: > Jamie, > I believe at least one of those limits is 'maximum supported' rather > than an architectural limit. > Is your use case one which would push these?boundaries? ?If so care to > describe what you would wish to do? > Daniel > > IBM Storage Professional Badge > > > *Dr Daniel Kidger* > IBM?Technical Sales Specialist > Software Defined Solution Sales > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" > , "Truong Vu" > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > Date: Mon, Feb 5, 2018 2:56 PM > Thanks Truong > Jaime > > Quoting "Truong Vu" : > > > > > Hi Jamie, > > > > The limits are the same in 5.0.0. ?We'll look into the FAQ. > > > > Thanks, > > Tru. > > > > > > > > > > From: gpfsug-discuss-request at spectrumscale.org > > To: gpfsug-discuss at spectrumscale.org > > Date: 02/05/2018 07:00 AM > > Subject: gpfsug-discuss Digest, Vol 73, Issue 9 > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > > > Send gpfsug-discuss mailing list submissions to > > gpfsug-discuss at spectrumscale.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > > > or, via email, send a message with subject or body 'help' to > > gpfsug-discuss-request at spectrumscale.org > > > > You can reach the person managing the list at > > gpfsug-discuss-owner at spectrumscale.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of gpfsug-discuss digest..." > > > > > > Today's Topics: > > > > ? ?1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) > > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sun, 04 Feb 2018 14:58:39 -0500 > > From: "Jaime Pinto" > > To: "gpfsug main discussion list" > > Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > > Message-ID: > > <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> > > Content-Type: text/plain; charset=ISO-8859-1; > > DelSp="Yes"; > > format="flowed" > > > > Here is what I found for versions 4 & 3.5: > > * Maximum Number of Dependent Filesets: 10,000 > > * Maximum Number of Independent Filesets: 1,000 > > > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets > > > > > > > > I'm having some difficulty finding published documentation on > > limitations for version 5: > > > > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm > > > > > > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm > > > > > > Any hints? > > > > Thanks > > Jaime > > > > > > --- > > Jaime Pinto > > SciNet HPC Consortium - Compute/Calcul Canada > > www.scinet.utoronto.ca - www.computecanada.ca > > University of Toronto > > > > > > ---------------------------------------------------------------- > > This message was sent using IMP at SciNet Consortium, University of > > Toronto. > > > > > > > > > > ------------------------------ > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > > > > > > > End of gpfsug-discuss Digest, Vol 73, Issue 9 > > ********************************************* > > > > > > > > > > > > > > > ?? ? ? ? ?************************************ > ?? ? ? ? ? TELL US ABOUT YOUR SUCCESS STORIES > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=Dln7axLq9ej2KttpKZJwLKuvxfSDkPErDQI5KCAQcg4&e= > ?? ? ? ? ?************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University > of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=ZMGxi-PBv5-WEGj5RFm1QV0K8azswe9Z-C6rE1ey-UQ&e= > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Feb 7 21:28:46 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 7 Feb 2018 16:28:46 -0500 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Message-ID: I noticed something curious after migrating some nodes from 4.1 to 4.2 which is that mounts now can take foorrreeevverrr. It seems to boil down to the point in the mount process where getEFOptions is called. To highlight the difference-- 4.1: # /usr/bin/time /usr/lpp/mmfs/bin/mmcommon getEFOptions dnb02 skipMountPointCheck >/dev/null 0.16user 0.04system 0:00.43elapsed 45%CPU (0avgtext+0avgdata 9108maxresident)k 0inputs+2768outputs (0major+15404minor)pagefaults 0swaps 4.2: /usr/bin/time /usr/lpp/mmfs/bin/mmcommon getEFOptions dnb02 skipMountPointCheck >/dev/null 9.75user 3.79system 0:23.35elapsed 58%CPU (0avgtext+0avgdata 10832maxresident)k 0inputs+38104outputs (0major+3135097minor)pagefaults 0swaps that's uh...a 543x increase. Which, if you have 25+ filesystems and 3500 nodes that time really starts to add up. It looks like under 4.2 this getEFOptions function triggers a bunch of mmsdrfs parsing happens and node lists get generated whereas on 4.1 that doesn't happen. Digging in a little deeper it looks to me like the big difference is in gpfsClusterInit after the node fetches the "shadow" mmsdrs file. Here's a 4.1 node: gpfsClusterInit:mmsdrfsdef.sh[2827]> loginPrefix='' gpfsClusterInit:mmsdrfsdef.sh[2828]> [[ -n '' ]] gpfsClusterInit:mmsdrfsdef.sh[2829]> /usr/bin/scp supersecrethost:/var/mmfs/gen/mmsdrfs /var/mmfs/gen/mmsdrfs.25326 gpfsClusterInit:mmsdrfsdef.sh[2830]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2831]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2863]> [[ -f /var/mmfs/gen/mmsdrfs.25326 ]] gpfsClusterInit:mmsdrfsdef.sh[2867]> /usr/bin/diff /var/mmfs/gen/mmsdrfs.25326 /var/mmfs/gen/mmsdrfs gpfsClusterInit:mmsdrfsdef.sh[2867]> 1> /dev/null 2> /dev/null gpfsClusterInit:mmsdrfsdef.sh[2868]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2869]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2874]> sdrfsFile=/var/mmfs/gen/mmsdrfs gpfsClusterInit:mmsdrfsdef.sh[2875]> /bin/rm -f /var/mmfs/gen/mmsdrfs.25326 Here's a 4.2 node: gpfsClusterInit:mmsdrfsdef.sh[2938]> loginPrefix='' gpfsClusterInit:mmsdrfsdef.sh[2939]> [[ -n '' ]] gpfsClusterInit:mmsdrfsdef.sh[2940]> /usr/bin/scp supersecrethost:/var/mmfs/gen/mmsdrfs /var/mmfs/gen/mmsdrfs.8534 gpfsClusterInit:mmsdrfsdef.sh[2941]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2942]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2974]> /bin/rm -f /var/mmfs/tmp/cmdTmpDir.mmcommon.8534/tmpsdrfs.gpfsClusterInit gpfsClusterInit:mmsdrfsdef.sh[2975]> [[ -f /var/mmfs/gen/mmsdrfs.8534 ]] gpfsClusterInit:mmsdrfsdef.sh[2979]> /usr/bin/diff /var/mmfs/gen/mmsdrfs.8534 /var/mmfs/gen/mmsdrfs gpfsClusterInit:mmsdrfsdef.sh[2979]> 1> /dev/null 2> /dev/null gpfsClusterInit:mmsdrfsdef.sh[2980]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2981]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2986]> sdrfsFile=/var/mmfs/gen/mmsdrfs it looks like the 4.1 code deletes the shadow mmsdrfs file is it's not different from what's locally on the node where as 4.2 does *not* do that. This seems to cause a problem when checkMmfsEnvironment is called because it will return 1 if the shadow file exists which according to the function comments indicates "something is not right", triggering the environment update where the slowdown is incurred. On 4.1 checkMmfsEnvironment returned 0 because the shadow mmsdrfs file had been removed, whereas on 4.2 it returned 1 because the shadow mmsdrfs file still existed despite it being identical to the mmsdrfs on the node. I've looked at 4.2.3.6 (efix12) and it doesn't look like 4.2.3.7 has dropped yet so it may be this has been fixed there. Maybe it's time for a PMR... -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From tortay at cc.in2p3.fr Thu Feb 8 07:08:50 2018 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Thu, 8 Feb 2018 08:08:50 +0100 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: References: Message-ID: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> On 07/02/2018 22:28, Aaron Knister wrote: > I noticed something curious after migrating some nodes from 4.1 to 4.2 > which is that mounts now can take foorrreeevverrr. It seems to boil down > to the point in the mount process where getEFOptions is called. > > To highlight the difference-- > [...] > Hello, I have had this (or a very similar) issue after migrating from 4.1.1.8 to 4.2.3. There are 37 filesystems in our main cluster, which made the problem really noticeable. A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, I'm told, should be released today) actually resolve my problems (APAR IJ03192 & IJ03235). Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From Tomasz.Wolski at ts.fujitsu.com Thu Feb 8 10:35:54 2018 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Thu, 8 Feb 2018 10:35:54 +0000 Subject: [gpfsug-discuss] Inode scan optimization Message-ID: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the "hidden" config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Feb 8 12:44:35 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 8 Feb 2018 07:44:35 -0500 Subject: [gpfsug-discuss] Inode scan optimization In-Reply-To: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: You mention that all the NSDs are metadata and data but you do not say how many NSDs are defined or the type of storage used, that is are these on SAS or NL-SAS storage? I'm assuming they are not on SSDs/flash storage. Have you considered moving the metadata to separate NSDs, preferably SSD/flash storage? This is likely to give you a significant performance boost. You state that using the inode scan API you reduced the time to 40 days. Did you analyze your backup application to determine where the time was being spent for the backup? If the inode scan is a small percentage of your backup time then optimizing it will not provide much benefit. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the ?hidden? config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=y2y22xZuqjpkKfO2WSdcJsBXMaM8hOedaB_AlgFlIb0&s=DL0ZnBuH9KpvKN6XQNvoYmvwfZDbbwMlM-4rCbsAgWo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 8 13:56:42 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Feb 2018 08:56:42 -0500 Subject: [gpfsug-discuss] Inode scan optimization In-Reply-To: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: Recall that many years ago we demonstrated a Billion files scanned with mmapplypolicy in under 20 minutes... And that was on ordinary at the time, spinning disks (not SSD!)... Granted we packed about 1000 files per directory and made some other choices that might not be typical usage.... OTOH storage and nodes have improved since then... SO when you say it takes 60 days to backup 2 billion files and that's a problem.... Like any large computing job, one has to do some analysis to find out what parts of the job are taking how much time... So... what commands are you using to do the backup...? What timing statistics or measurements have you collected? If you are using mmbackup and/or mmapplypolicy, those commands can show you how much time they spend scanning the file system looking for files to backup AND then how much time they spend copying the data to backup media. In fact they operate in distinct phases... directory scan, inode scan, THEN data copying ... so it's straightforward to see which phases are taking how much time. OH... I see you also say you are using gpfs_stat_inode_with_xattrs64 -- These APIs are tricky and not a panacea.... That's why we provide you with mmapplypolicy which in fact uses those APIs in clever, patented ways -- optimized and honed with years of work.... And more recently, we provided you with samples/ilm/mmfind -- which has the functionality of the classic unix find command -- but runs in parallel - using mmapplypolicy. TRY IT on you file system! From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the ?hidden? config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 8 15:33:13 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Feb 2018 10:33:13 -0500 Subject: [gpfsug-discuss] Inode scan optimization - (Tomasz.Wolski@ts.fujitsu.com ) In-Reply-To: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: Please clarify and elaborate .... When you write "a full backup ... takes 60 days" - that seems very poor indeed. BUT you haven't stated how much data is being copied to what kind of backup media nor how much equipment or what types you are using... Nor which backup software... We have Spectrum Scale installation doing nightly backups of huge file systems using the mmbackup command with TivoliStorageManager backup, using IBM branded or approved equipment and software. From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the ?hidden? config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Feb 8 15:52:22 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 08 Feb 2018 10:52:22 -0500 Subject: [gpfsug-discuss] Inode scan optimization - (Tomasz.Wolski@ts.fujitsu.com ) In-Reply-To: References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: <9124.1518105142@turing-police.cc.vt.edu> On Thu, 08 Feb 2018 10:33:13 -0500, "Marc A Kaplan" said: > Please clarify and elaborate .... When you write "a full backup ... takes > 60 days" - that seems very poor indeed. > BUT you haven't stated how much data is being copied to what kind of > backup media nor how much equipment or what types you are using... Nor > which backup software... > > We have Spectrum Scale installation doing nightly backups of huge file > systems using the mmbackup command with TivoliStorageManager backup, using > IBM branded or approved equipment and software. How long did the *first* TSM backup take? Remember that TSM does the moral equivalent of a 'full' backup at first, and incrementals thereafter. So it's quite possible to have a very large filesystem with little data churn to do incrementals in 5-6 hours, even though the first one took several weeks. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Feb 8 15:59:44 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 8 Feb 2018 15:59:44 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Hi All, We are in a bit of a difficult situation right now with one of our non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are looking for some advice on how to deal with this unfortunate situation. We have a non-IBM FC storage array with dual-?redundant? controllers. One of those controllers is dead and the vendor is sending us a replacement. However, the replacement controller will have mis-matched firmware with the surviving controller and - long story short - the vendor says there is no way to resolve that without taking the storage array down for firmware upgrades. Needless to say there?s more to that story than what I?ve included here, but I won?t bore everyone with unnecessary details. The storage array has 5 NSDs on it, but fortunately enough they are part of our ?capacity? pool ? i.e. the only way a file lands here is if an mmapplypolicy scan moved it there because the *access* time is greater than 90 days. Filesystem data replication is set to one. So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. There?s over 250 TB of data on those 5 NSDs, so restriping off of them or setting replication to two are not options. It is very unlikely that anyone would try to access a file on those NSDs during the hour or so I?d need to do the firmware upgrades, but how would GPFS itself react to those (suspended) disks going away for a while? I?m thinking I could be OK if there was just a way to actually stop them rather than suspend them. Any undocumented options to mmchdisk that I?m not aware of??? Are there other options - besides buying IBM hardware - that I am overlooking? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Feb 8 16:23:33 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 8 Feb 2018 16:23:33 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: Sorry I can?t help? the only thing going round and round my head right now is why on earth the existing controller cannot push the required firmware to the new one when it comes online. Never heard of anything else! Feel free to name and shame so I can avoid ? Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] mmchdisk suspend / stop Hi All, We are in a bit of a difficult situation right now with one of our non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are looking for some advice on how to deal with this unfortunate situation. We have a non-IBM FC storage array with dual-?redundant? controllers. One of those controllers is dead and the vendor is sending us a replacement. However, the replacement controller will have mis-matched firmware with the surviving controller and - long story short - the vendor says there is no way to resolve that without taking the storage array down for firmware upgrades. Needless to say there?s more to that story than what I?ve included here, but I won?t bore everyone with unnecessary details. The storage array has 5 NSDs on it, but fortunately enough they are part of our ?capacity? pool ? i.e. the only way a file lands here is if an mmapplypolicy scan moved it there because the *access* time is greater than 90 days. Filesystem data replication is set to one. So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. There?s over 250 TB of data on those 5 NSDs, so restriping off of them or setting replication to two are not options. It is very unlikely that anyone would try to access a file on those NSDs during the hour or so I?d need to do the firmware upgrades, but how would GPFS itself react to those (suspended) disks going away for a while? I?m thinking I could be OK if there was just a way to actually stop them rather than suspend them. Any undocumented options to mmchdisk that I?m not aware of??? Are there other options - besides buying IBM hardware - that I am overlooking? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Feb 8 16:25:33 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 8 Feb 2018 16:25:33 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop Message-ID: Check out ?unmountOnDiskFail? config parameter perhaps? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_tuningguide.htm unmountOnDiskFail The unmountOnDiskFail specifies how the GPFS daemon responds when a disk failure is detected. The valid values of this parameter are yes, no, and meta. The default value is no. I have it set to ?meta? which prevents the file system from unmounting if an NSD fails and the metadata is still available. I have 2 replicas of metadata and one data. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Thursday, February 8, 2018 at 10:15 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] mmchdisk suspend / stop So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Feb 8 16:31:25 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 08 Feb 2018 11:31:25 -0500 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: Message-ID: <14127.1518107485@turing-police.cc.vt.edu> On Thu, 08 Feb 2018 16:25:33 +0000, "Oesterlin, Robert" said: > unmountOnDiskFail > The unmountOnDiskFail specifies how the GPFS daemon responds when a disk > failure is detected. The valid values of this parameter are yes, no, and meta. > The default value is no. I suspect that the only relevant setting there is the default 'no' - it sounds like these 5 NSD's are just one storage pool in a much larger filesystem, and Kevin doesn't want the entire thing to unmount if GPFS notices that the NSDs went walkies. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Feb 8 17:10:39 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Feb 2018 12:10:39 -0500 Subject: [gpfsug-discuss] Inode scan optimization - (Tomasz.Wolski@ts.fujitsu.com ) In-Reply-To: <9124.1518105142@turing-police.cc.vt.edu> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> <9124.1518105142@turing-police.cc.vt.edu> Message-ID: Let's give Fujitsu an opportunity to answer with some facts and re-pose their questions. When I first read the complaint, I kinda assumed they were using mmbackup and TSM -- but then I noticed words about some gpfs_XXX apis.... So it looks like this Fujitsu fellow is "rolling his own"... NOT using mmapplypolicy. And we don't know if he is backing up to an old paper tape punch device or what ! He's just saying that whatever it is that he did took 60 days... Can you get from here to there faster? Sure, take an airplane instead of walking! My other remark which had a typo was and is: There have many satisfied customers and installations of Spectrum Scale File System using mmbackup and/or Tivoli Storage Manager. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sxiao at us.ibm.com Thu Feb 8 17:17:45 2018 From: sxiao at us.ibm.com (Steve Xiao) Date: Thu, 8 Feb 2018 12:17:45 -0500 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: References: Message-ID: You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Feb 8 19:38:33 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 8 Feb 2018 19:38:33 +0000 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: References: Message-ID: <550b2cc6552f4e669d2cfee72b1a244a@jumptrading.com> I don't know or care who the hardware vendor is, but they can DEFINITELY ship you a controller with the right firmware! Just demand it, which is what I do and they have basically always complied with the request. There is the risk associated with running even longer with a single point of failure, only using the surviving controller, but if this storage system has been in production a long time (e.g. a year or so) and is generally reliable, then they should be able to get you a new, factory tested controller with the right FW versions in a couple of days. The choice is yours of course, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Steve Xiao Sent: Thursday, February 08, 2018 11:18 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) Note: External Email ________________________________ You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Feb 8 19:48:54 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 8 Feb 2018 19:48:54 +0000 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: References: Message-ID: <769B6E06-BAB5-4EDB-A5A3-54E1063A8A6D@vanderbilt.edu> Hi again all, It sounds like doing the ?mmchconfig unmountOnDiskFail=meta -i? suggested by Steve and Bob followed by using mmchdisk to stop the disks temporarily is the way we need to go. We will, as an aside, also run a mmapplypolicy first to pull any files users have started accessing again back to the ?regular? pool before doing any of this. Given that this is our ?capacity? pool and files have to have an atime > 90 days to get migrated there in the 1st place I think this is reasonable. Especially since users will get an I/O error if they happen to try to access one of those NSDs during the brief maintenance window. As to naming and shaming the vendor ? I?m not going to do that at this point in time. We?ve been using their stuff for well over a decade at this point and have had a generally positive experience with them. In fact, I have spoken with them via phone since my original post today and they have clarified that the problem with the mismatched firmware is only an issue because we are a major version off of what is current due to us choosing to not have a downtime and therefore not having done any firmware upgrades in well over 18 months. Thanks, all... Kevin On Feb 8, 2018, at 11:17 AM, Steve Xiao > wrote: You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C435bd89b3fcc4a94ee5008d56f17e49e%7C5f88b91902e3490fb772327aa8177b95%7C0%7C0%7C636537070783260582&sdata=AbY7rJQecb76rMC%2FlxrthyzHfueQDJTT%2FJuuRCac5g8%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Thu Feb 8 18:33:32 2018 From: ewahl at osc.edu (Edward Wahl) Date: Thu, 8 Feb 2018 13:33:32 -0500 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: <20180208133332.30440b89@osc.edu> I'm with Richard on this one. Sounds dubious to me. Even older style stuff could start a new controller in a 'failed' or 'service' state and push firmware back in the 20th century... ;) Ed On Thu, 8 Feb 2018 16:23:33 +0000 "Sobey, Richard A" wrote: > Sorry I can?t help? the only thing going round and round my head right now is > why on earth the existing controller cannot push the required firmware to the > new one when it comes online. Never heard of anything else! Feel free to name > and shame so I can avoid ? > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, > Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmchdisk > suspend / stop > > Hi All, > > We are in a bit of a difficult situation right now with one of our non-IBM > hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are > looking for some advice on how to deal with this unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? controllers. One of > those controllers is dead and the vendor is sending us a replacement. > However, the replacement controller will have mis-matched firmware with the > surviving controller and - long story short - the vendor says there is no way > to resolve that without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included here, but > I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are part of > our ?capacity? pool ? i.e. the only way a file lands here is if an > mmapplypolicy scan moved it there because the *access* time is greater than > 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either suspend > or (preferably) stop those NSDs, do the firmware upgrade, and resume the > NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents > the allocation of new blocks ? so, in theory, if a user suddenly decided to > start using a file they hadn?t needed for 3 months then I?ve got a problem. > Stopping all I/O to the disks is what I really want to do. However, > according to the mmchdisk man page stop cannot be used on a filesystem with > replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of them or > setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those NSDs > during the hour or so I?d need to do the firmware upgrades, but how would > GPFS itself react to those (suspended) disks going away for a while? I?m > thinking I could be OK if there was just a way to actually stop them rather > than suspend them. Any undocumented options to mmchdisk that I?m not aware > of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From aaron.s.knister at nasa.gov Thu Feb 8 20:22:52 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 8 Feb 2018 15:22:52 -0500 (EST) Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> Message-ID: Hi Loic, Thank you for that information! I have two follow up questions-- 1. Are you using ccr? 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. what's the output of "mmlsconfig mmsdrservPort" on your cluster?). -Aaron On Thu, 8 Feb 2018, Loic Tortay wrote: > On 07/02/2018 22:28, Aaron Knister wrote: >> I noticed something curious after migrating some nodes from 4.1 to 4.2 >> which is that mounts now can take foorrreeevverrr. It seems to boil down >> to the point in the mount process where getEFOptions is called. >> >> To highlight the difference-- >> > [...] >> > Hello, > I have had this (or a very similar) issue after migrating from 4.1.1.8 to > 4.2.3. There are 37 filesystems in our main cluster, which made the problem > really noticeable. > > A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, > I'm told, should be released today) actually resolve my problems (APAR > IJ03192 & IJ03235). > > > Lo?c. > -- > | Lo?c Tortay - IN2P3 Computing Centre | > From Robert.Oesterlin at nuance.com Thu Feb 8 20:34:35 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 8 Feb 2018 20:34:35 +0000 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Message-ID: We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. Bob Oesterlin Sr Principal Storage Engineer, Nuance SSUG Co-principal -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Feb 8 21:11:34 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 8 Feb 2018 21:11:34 +0000 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> Message-ID: <2dbcc01f542d40698a7ad6cc10d2dbd1@jumptrading.com> It may be related to this issue of using root squashed file system option, here are some edited comments from my colleague who stumbled upon this while chatting with a friend at a CUG: " Something I learned last week: apparently the libmount code from util-linux (used by /bin/mount) will call utimensat() on new mountpoints if access() fails (for example, on root-squashed filesystems). This is done "just to be sure" that the filesystem is really read-only. This operation can be quite expensive and (anecdotally) may cause huge slowdowns when mounting root-squashed parallel filesystems on thousands of clients. Here is the relevant code: https://github.com/karelzak/util-linux/blame/1ea4e7bd8d9d0f0ef317558c627e6fa069950e8d/libmount/src/utils.c#L222 This code has been in util-linux for years. It's not clear exactly what the impact is in our environment, but this certainly can't be helping, especially since we've grown the size of the cluster considerably. Mounting GPFS has recently really become a slow and disruptive operation ? if you try to mount many clients at once, the FS will hang for a considerable period of time. The timing varies, but here is one example from an isolated mounting operation: 12:09:11.222513 mount("", "", "gpfs", MS_MGC_VAL, "dev="...) = 0 <1.590217> 12:09:12.812777 access("", W_OK) = -1 EACCES (Permission denied) <0.000022> 12:09:12.812841 utimensat(AT_FDCWD, "", \{UTIME_NOW, \{93824994378048, 1073741822}}, 0) = -1 EPERM (Operation not permitted) <2.993689> Here, the utimensat() took ~3 seconds, almost twice as long as the mount operation! I also suspect it will slow down other clients trying to mount the filesystem since the sgmgr has to process this write attempt to the mountpoint. (Hilariously, it still returns the "wrong" answer, because this filesystem is not read-only, just squashed.) As of today, the person who originally brought the issue to my attention at CUG has raised it for discussion on the util-linux mailing list. https://marc.info/?l=util-linux-ng&m=151075932824688&w=2 " We ended up putting facls on our mountpoints like such, which hacked around this stupidity: for fs in gpfs_mnt_point ; do chmod 1755 $fs setfacl -m u:99:rwx $fs # 99 is the "nobody" uid to which root is mapped--see "mmauth" output done Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aaron Knister Sent: Thursday, February 08, 2018 2:23 PM To: Loic Tortay Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Note: External Email ------------------------------------------------- Hi Loic, Thank you for that information! I have two follow up questions-- 1. Are you using ccr? 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. what's the output of "mmlsconfig mmsdrservPort" on your cluster?). -Aaron On Thu, 8 Feb 2018, Loic Tortay wrote: > On 07/02/2018 22:28, Aaron Knister wrote: >> I noticed something curious after migrating some nodes from 4.1 to 4.2 >> which is that mounts now can take foorrreeevverrr. It seems to boil down >> to the point in the mount process where getEFOptions is called. >> >> To highlight the difference-- >> > [...] >> > Hello, > I have had this (or a very similar) issue after migrating from 4.1.1.8 to > 4.2.3. There are 37 filesystems in our main cluster, which made the problem > really noticeable. > > A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, > I'm told, should be released today) actually resolve my problems (APAR > IJ03192 & IJ03235). > > > Lo?c. > -- > | Lo?c Tortay > - IN2P3 Computing Centre | > ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tortay at cc.in2p3.fr Fri Feb 9 08:59:12 2018 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Fri, 9 Feb 2018 09:59:12 +0100 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> Message-ID: <969a0c4b-a3b0-2fdb-80f4-2913bc9b0a67@cc.in2p3.fr> On 02/08/2018 09:22 PM, Aaron Knister wrote: > Hi Loic, > > Thank you for that information! > > I have two follow up questions-- > 1. Are you using ccr? > 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. > what's the output of "mmlsconfig mmsdrservPort" on your cluster?). > Hello, We do not use CCR on this cluster (yet). We use the default port for mmsdrserv: # mmlsconfig mmsdrservPort mmsdrservPort 1191 Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From Renar.Grunenberg at huk-coburg.de Fri Feb 9 09:06:32 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 09:06:32 +0000 Subject: [gpfsug-discuss] V5 Experience Message-ID: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From frankli at us.ibm.com Fri Feb 9 11:29:17 2018 From: frankli at us.ibm.com (Frank N Lee) Date: Fri, 9 Feb 2018 05:29:17 -0600 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th In-Reply-To: References: Message-ID: Bob, Can you provide your email or shall I just reply here? Frank Frank Lee, PhD IBM Systems 314-482-5329 | @drfranknlee From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 02/08/2018 02:35 PM Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Sent by: gpfsug-discuss-bounces at spectrumscale.org We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. Bob Oesterlin Sr Principal Storage Engineer, Nuance SSUG Co-principal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HIs14G9Qcs5MqpsAFL5E0TH5hqFD-KbquYdQ_mTmTnI&m=_7q7xOAgpDoLwznJe069elHn1thk8KmxGLgXM6zuST0&s=1aWP0EJWxIsAycMNiVX7v4FWC5BsSzyx566RyllXCCM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Fri Feb 9 11:53:30 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Fri, 9 Feb 2018 12:53:30 +0100 Subject: [gpfsug-discuss] V5 Experience In-Reply-To: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Fri Feb 9 12:30:10 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 9 Feb 2018 12:30:10 +0000 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Message-ID: <1AC64CE4-BEE8-4C4B-BB7D-02A39C176621@nuance.com> Replied to Frank directly. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Frank N Lee Reply-To: gpfsug main discussion list Date: Friday, February 9, 2018 at 5:30 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Bob, Can you provide your email or shall I just reply here? Frank Frank Lee, PhD IBM Systems 314-482-5329 | @drfranknlee From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 02/08/2018 02:35 PM Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Sent by: gpfsug-discuss-bounces at spectrumscale.org We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. Bob Oesterlin Sr Principal Storage Engineer, Nuance SSUG Co-principal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HIs14G9Qcs5MqpsAFL5E0TH5hqFD-KbquYdQ_mTmTnI&m=_7q7xOAgpDoLwznJe069elHn1thk8KmxGLgXM6zuST0&s=1aWP0EJWxIsAycMNiVX7v4FWC5BsSzyx566RyllXCCM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 138 bytes Desc: image001.png URL: From YARD at il.ibm.com Fri Feb 9 13:28:49 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Fri, 9 Feb 2018 15:28:49 +0200 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: <769B6E06-BAB5-4EDB-A5A3-54E1063A8A6D@vanderbilt.edu> References: <769B6E06-BAB5-4EDB-A5A3-54E1063A8A6D@vanderbilt.edu> Message-ID: Hi Just make sure you have a backup, just in case ... Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage architect Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 02/08/2018 09:49 PM Subject: Re: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi again all, It sounds like doing the ?mmchconfig unmountOnDiskFail=meta -i? suggested by Steve and Bob followed by using mmchdisk to stop the disks temporarily is the way we need to go. We will, as an aside, also run a mmapplypolicy first to pull any files users have started accessing again back to the ?regular? pool before doing any of this. Given that this is our ?capacity? pool and files have to have an atime > 90 days to get migrated there in the 1st place I think this is reasonable. Especially since users will get an I/O error if they happen to try to access one of those NSDs during the brief maintenance window. As to naming and shaming the vendor ? I?m not going to do that at this point in time. We?ve been using their stuff for well over a decade at this point and have had a generally positive experience with them. In fact, I have spoken with them via phone since my original post today and they have clarified that the problem with the mismatched firmware is only an issue because we are a major version off of what is current due to us choosing to not have a downtime and therefore not having done any firmware upgrades in well over 18 months. Thanks, all... Kevin On Feb 8, 2018, at 11:17 AM, Steve Xiao wrote: You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C435bd89b3fcc4a94ee5008d56f17e49e%7C5f88b91902e3490fb772327aa8177b95%7C0%7C0%7C636537070783260582&sdata=AbY7rJQecb76rMC%2FlxrthyzHfueQDJTT%2FJuuRCac5g8%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=3yfKUCiWGXtAEPiwlmQNFGTjLx5h3PlCYfUXDBMGJpQ&s=-pkjeFOUVSDUGgwtKkoYbmGLADk2UHfDbUPiuWSw4gQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From knop at us.ibm.com Fri Feb 9 13:32:30 2018 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 9 Feb 2018 08:32:30 -0500 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: <2dbcc01f542d40698a7ad6cc10d2dbd1@jumptrading.com> References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> <2dbcc01f542d40698a7ad6cc10d2dbd1@jumptrading.com> Message-ID: All, For at least one of the instances reported by this group, a PMR has been opened, and a fix is being developed. For folks that are getting affected by the problem: Please contact the service team to confirm your problem is the same as the one previously reported, and for an outlook for the availability of the fix. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Bryan Banister To: gpfsug main discussion list , "Loic Tortay" Date: 02/08/2018 04:11 PM Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Sent by: gpfsug-discuss-bounces at spectrumscale.org It may be related to this issue of using root squashed file system option, here are some edited comments from my colleague who stumbled upon this while chatting with a friend at a CUG: " Something I learned last week: apparently the libmount code from util-linux (used by /bin/mount) will call utimensat() on new mountpoints if access() fails (for example, on root-squashed filesystems). This is done "just to be sure" that the filesystem is really read-only. This operation can be quite expensive and (anecdotally) may cause huge slowdowns when mounting root-squashed parallel filesystems on thousands of clients. Here is the relevant code: https://github.com/karelzak/util-linux/blame/1ea4e7bd8d9d0f0ef317558c627e6fa069950e8d/libmount/src/utils.c#L222 This code has been in util-linux for years. It's not clear exactly what the impact is in our environment, but this certainly can't be helping, especially since we've grown the size of the cluster considerably. Mounting GPFS has recently really become a slow and disruptive operation ? if you try to mount many clients at once, the FS will hang for a considerable period of time. The timing varies, but here is one example from an isolated mounting operation: 12:09:11.222513 mount("", "", "gpfs", MS_MGC_VAL, "dev="...) = 0 <1.590217> 12:09:12.812777 access("", W_OK) = -1 EACCES (Permission denied) <0.000022> 12:09:12.812841 utimensat(AT_FDCWD, "", \{UTIME_NOW, \{93824994378048, 1073741822}}, 0) = -1 EPERM (Operation not permitted) <2.993689> Here, the utimensat() took ~3 seconds, almost twice as long as the mount operation! I also suspect it will slow down other clients trying to mount the filesystem since the sgmgr has to process this write attempt to the mountpoint. (Hilariously, it still returns the "wrong" answer, because this filesystem is not read-only, just squashed.) As of today, the person who originally brought the issue to my attention at CUG has raised it for discussion on the util-linux mailing list. https://marc.info/?l=util-linux-ng&m=151075932824688&w=2 " We ended up putting facls on our mountpoints like such, which hacked around this stupidity: for fs in gpfs_mnt_point ; do chmod 1755 $fs setfacl -m u:99:rwx $fs # 99 is the "nobody" uid to which root is mapped--see "mmauth" output done Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aaron Knister Sent: Thursday, February 08, 2018 2:23 PM To: Loic Tortay Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Note: External Email ------------------------------------------------- Hi Loic, Thank you for that information! I have two follow up questions-- 1. Are you using ccr? 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. what's the output of "mmlsconfig mmsdrservPort" on your cluster?). -Aaron On Thu, 8 Feb 2018, Loic Tortay wrote: > On 07/02/2018 22:28, Aaron Knister wrote: >> I noticed something curious after migrating some nodes from 4.1 to 4.2 >> which is that mounts now can take foorrreeevverrr. It seems to boil down >> to the point in the mount process where getEFOptions is called. >> >> To highlight the difference-- >> > [...] >> > Hello, > I have had this (or a very similar) issue after migrating from 4.1.1.8 to > 4.2.3. There are 37 filesystems in our main cluster, which made the problem > really noticeable. > > A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, > I'm told, should be released today) actually resolve my problems (APAR > IJ03192 & IJ03235). > > > Lo?c. > -- > | Lo?c Tortay - IN2P3 Computing Centre | > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=C0S8WTufrOCvXbHUegB8zS9jk_1SLczALa-4aVEubu4&s=VTWKI-xcUiJ_LeMhJ-xOPmnz0Zm9IspKsU3bsxA4BNo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From carlz at us.ibm.com Fri Feb 9 13:46:51 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Fri, 9 Feb 2018 13:46:51 +0000 Subject: [gpfsug-discuss] V5 Experience In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Feb 9 13:58:58 2018 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 9 Feb 2018 08:58:58 -0500 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Uwe Falke" To: gpfsug main discussion list Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From p.ward at nhm.ac.uk Thu Feb 8 16:46:25 2018 From: p.ward at nhm.ac.uk (Paul Ward) Date: Thu, 8 Feb 2018 16:46:25 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: We tend to get the maintenance company to down-grade the firmware to match what we have for our aging hardware, before sending it to us. I assume this isn?t an option? Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] mmchdisk suspend / stop Hi All, We are in a bit of a difficult situation right now with one of our non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are looking for some advice on how to deal with this unfortunate situation. We have a non-IBM FC storage array with dual-?redundant? controllers. One of those controllers is dead and the vendor is sending us a replacement. However, the replacement controller will have mis-matched firmware with the surviving controller and - long story short - the vendor says there is no way to resolve that without taking the storage array down for firmware upgrades. Needless to say there?s more to that story than what I?ve included here, but I won?t bore everyone with unnecessary details. The storage array has 5 NSDs on it, but fortunately enough they are part of our ?capacity? pool ? i.e. the only way a file lands here is if an mmapplypolicy scan moved it there because the *access* time is greater than 90 days. Filesystem data replication is set to one. So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. There?s over 250 TB of data on those 5 NSDs, so restriping off of them or setting replication to two are not options. It is very unlikely that anyone would try to access a file on those NSDs during the hour or so I?d need to do the firmware upgrades, but how would GPFS itself react to those (suspended) disks going away for a while? I?m thinking I could be OK if there was just a way to actually stop them rather than suspend them. Any undocumented options to mmchdisk that I?m not aware of??? Are there other options - besides buying IBM hardware - that I am overlooking? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Feb 9 14:30:34 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 14:30:34 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: Felipe, all, first thanks for clarification, but what was the reason for this logic? If i upgrade to Version 5 and want to create new filesystems, and the maxblocksize is on 1M, we must shutdown the hole cluster to change this to the defaults to use the new one default. I had no understanding for that decision. We are at 7 x 24h availability with our cluster today, we had no real maintenance window here! Any circumvention are welcome. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Freitag, 9. Februar 2018 14:59 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From oehmes at gmail.com Fri Feb 9 14:47:54 2018 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 09 Feb 2018 14:47:54 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one thing we might consider doing is changing the command to use the current active maxblocksize as input for mmcrfs if maxblocksize is below current default. Sven On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar < Renar.Grunenberg at huk-coburg.de> wrote: > Felipe, all, > > first thanks for clarification, but what was the reason for this logic? If > i upgrade to Version 5 and want to create new filesystems, and the > maxblocksize is on 1M, we must shutdown the hole cluster to change this to > the defaults to use the new one default. I had no understanding for that > decision. We are at 7 x 24h availability with our cluster today, we had no > real maintenance window here! Any circumvention are welcome. > > > > Regards Renar > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *Im Auftrag von *Felipe Knop > *Gesendet:* Freitag, 9. Februar 2018 14:59 > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] V5 Experience -- maxblocksize > > > > All, > > Correct. There is no need to change the value of 'maxblocksize' for > existing clusters which are upgraded to the 5.0.0 level. If a new file > system needs to be created with a block size which exceeds the value of > maxblocksize then the mmchconfig needs to be issued to increase the value > of maxblocksize (which requires the entire cluster to be stopped). > > For clusters newly created with 5.0.0, the value of maxblocksize is set to > 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man > pages in 5.0.0 . > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 > AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe > Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default > is <>1MB, so your config parameter was properly translated. > > From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM > Subject: Re: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > I suppose the new maxBlockSize default is <>1MB, so your config parameter > was properly translated. I'd see no need to change anything. > > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 <+49%20371%2069782165> > Mobile: +49 175 575 2877 <+49%20175%205752877> > E-Mail: uwefalke at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > > Date: 02/09/2018 10:16 AM > Subject: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hallo All, > we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but > I see after the mmchconfig release=LATEST a new common parameter > ?maxblocksize 1M? > (our fs are on these blocksizes) is happening. > Ok, but if I will change this parameter the hole cluster was requestet > that: > > root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT > Verifying GPFS is stopped on all nodes ... > mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de > mmchconfig: Command failed. Examine previous error messages to determine > cause. > Can someone explain the behavior here, and same clarification in an update > plan what can we do to go to the defaults without clusterdown. > Is this a bug or a feature;-) > > Regards Renar > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: > 09561 96-44110 > Telefax: > 09561 96-44104 > E-Mail: > Renar.Grunenberg at huk-coburg.de > Internet: > www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Fri Feb 9 14:59:31 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 14:59:31 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: Hallo Sven, that stated a mmcrfs ?newfs? -B 4M is possible if the maxblocksize is 1M (from the upgrade) without the requirement to change this parameter before?? Correct or not? Regards Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sven Oehme Gesendet: Freitag, 9. Februar 2018 15:48 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one thing we might consider doing is changing the command to use the current active maxblocksize as input for mmcrfs if maxblocksize is below current default. Sven On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar > wrote: Felipe, all, first thanks for clarification, but what was the reason for this logic? If i upgrade to Version 5 and want to create new filesystems, and the maxblocksize is on 1M, we must shutdown the hole cluster to change this to the defaults to use the new one default. I had no understanding for that decision. We are at 7 x 24h availability with our cluster today, we had no real maintenance window here! Any circumvention are welcome. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Freitag, 9. Februar 2018 14:59 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Feb 9 15:08:38 2018 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 09 Feb 2018 15:08:38 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: you can only create a filesystem with a blocksize of what ever current maxblocksize is set. let me discuss with felipe what//if we can share here to solve this. sven On Fri, Feb 9, 2018 at 6:59 AM Grunenberg, Renar < Renar.Grunenberg at huk-coburg.de> wrote: > Hallo Sven, > > that stated a mmcrfs ?newfs? -B 4M is possible if the maxblocksize is 1M > (from the upgrade) without the requirement to change this parameter > before?? Correct or not? > > Regards > > > > > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *Im Auftrag von *Sven Oehme > *Gesendet:* Freitag, 9. Februar 2018 15:48 > > > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] V5 Experience -- maxblocksize > > > > Renar, > > > > if you specify the filesystem blocksize of 1M during mmcr you don't have > to restart anything. scale 5 didn't change anything on the behaviour of > maxblocksize change while the cluster is online, it only changed the > default passed to the blocksize parameter for create a new filesystem. one > thing we might consider doing is changing the command to use the current > active maxblocksize as input for mmcrfs if maxblocksize is below current > default. > > > > Sven > > > > > > On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar < > Renar.Grunenberg at huk-coburg.de> wrote: > > Felipe, all, > > first thanks for clarification, but what was the reason for this logic? If > i upgrade to Version 5 and want to create new filesystems, and the > maxblocksize is on 1M, we must shutdown the hole cluster to change this to > the defaults to use the new one default. I had no understanding for that > decision. We are at 7 x 24h availability with our cluster today, we had no > real maintenance window here! Any circumvention are welcome. > > > > Regards Renar > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > > Telefon: > > 09561 96-44110 > > Telefax: > > 09561 96-44104 > > E-Mail: > > Renar.Grunenberg at huk-coburg.de > > Internet: > > www.huk.de > > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *Im Auftrag von *Felipe Knop > *Gesendet:* Freitag, 9. Februar 2018 14:59 > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] V5 Experience -- maxblocksize > > > > All, > > Correct. There is no need to change the value of 'maxblocksize' for > existing clusters which are upgraded to the 5.0.0 level. If a new file > system needs to be created with a block size which exceeds the value of > maxblocksize then the mmchconfig needs to be issued to increase the value > of maxblocksize (which requires the entire cluster to be stopped). > > For clusters newly created with 5.0.0, the value of maxblocksize is set to > 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man > pages in 5.0.0 . > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 > AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe > Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default > is <>1MB, so your config parameter was properly translated. > > From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM > Subject: Re: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > I suppose the new maxBlockSize default is <>1MB, so your config parameter > was properly translated. I'd see no need to change anything. > > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 <+49%20371%2069782165> > Mobile: +49 175 575 2877 <+49%20175%205752877> > E-Mail: uwefalke at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > > Date: 02/09/2018 10:16 AM > Subject: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hallo All, > we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but > I see after the mmchconfig release=LATEST a new common parameter > ?maxblocksize 1M? > (our fs are on these blocksizes) is happening. > Ok, but if I will change this parameter the hole cluster was requestet > that: > > root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT > Verifying GPFS is stopped on all nodes ... > mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de > mmchconfig: Command failed. Examine previous error messages to determine > cause. > Can someone explain the behavior here, and same clarification in an update > plan what can we do to go to the defaults without clusterdown. > Is this a bug or a feature;-) > > Regards Renar > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: > 09561 96-44110 > Telefax: > 09561 96-44104 > E-Mail: > Renar.Grunenberg at huk-coburg.de > Internet: > www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Feb 9 15:07:32 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 9 Feb 2018 15:07:32 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> Hi All, Since several people have made this same suggestion, let me respond to that. We did ask the vendor - twice - to do that. Their response boils down to, ?No, the older version has bugs and we won?t send you a controller with firmware that we know has bugs in it.? We have not had a full cluster downtime since the summer of 2016 - and then it was only a one day downtime to allow the cleaning of our core network switches after an electrical fire in our data center! So the firmware on not only our storage arrays, but our SAN switches as well, it a bit out of date, shall we say? That is an issue we need to address internally ? our users love us not having regularly scheduled downtimes quarterly, yearly, or whatever, but there is a cost to doing business that way... Kevin On Feb 8, 2018, at 10:46 AM, Paul Ward > wrote: We tend to get the maintenance company to down-grade the firmware to match what we have for our aging hardware, before sending it to us. I assume this isn?t an option? Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Feb 9 15:12:13 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 15:12:13 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: <8388dda58d064620908b9aa62ca86da5@SMXRF105.msg.hukrf.de> Hallo Sven, thanks, it?s clear now. You have work now ;-) Happy Weekend from Coburg. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sven Oehme Gesendet: Freitag, 9. Februar 2018 16:09 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize you can only create a filesystem with a blocksize of what ever current maxblocksize is set. let me discuss with felipe what//if we can share here to solve this. sven On Fri, Feb 9, 2018 at 6:59 AM Grunenberg, Renar > wrote: Hallo Sven, that stated a mmcrfs ?newfs? -B 4M is possible if the maxblocksize is 1M (from the upgrade) without the requirement to change this parameter before?? Correct or not? Regards Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sven Oehme Gesendet: Freitag, 9. Februar 2018 15:48 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one thing we might consider doing is changing the command to use the current active maxblocksize as input for mmcrfs if maxblocksize is below current default. Sven On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar > wrote: Felipe, all, first thanks for clarification, but what was the reason for this logic? If i upgrade to Version 5 and want to create new filesystems, and the maxblocksize is on 1M, we must shutdown the hole cluster to change this to the defaults to use the new one default. I had no understanding for that decision. We are at 7 x 24h availability with our cluster today, we had no real maintenance window here! Any circumvention are welcome. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Freitag, 9. Februar 2018 14:59 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Fri Feb 9 15:25:25 2018 From: p.ward at nhm.ac.uk (Paul Ward) Date: Fri, 9 Feb 2018 15:25:25 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> Message-ID: Not sure why it took over a day for my message to be sent out by the list? If it?s the firmware you currently have, I would still prefer to have it sent to me then I am able to do a controller firmware update online during an at risk period rather than a downtime, all the time you are running on one controller is at risk! Seems you have an alternative. Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 09 February 2018 15:08 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmchdisk suspend / stop Hi All, Since several people have made this same suggestion, let me respond to that. We did ask the vendor - twice - to do that. Their response boils down to, ?No, the older version has bugs and we won?t send you a controller with firmware that we know has bugs in it.? We have not had a full cluster downtime since the summer of 2016 - and then it was only a one day downtime to allow the cleaning of our core network switches after an electrical fire in our data center! So the firmware on not only our storage arrays, but our SAN switches as well, it a bit out of date, shall we say? That is an issue we need to address internally ? our users love us not having regularly scheduled downtimes quarterly, yearly, or whatever, but there is a cost to doing business that way... Kevin On Feb 8, 2018, at 10:46 AM, Paul Ward > wrote: We tend to get the maintenance company to down-grade the firmware to match what we have for our aging hardware, before sending it to us. I assume this isn?t an option? Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzieko at wcss.pl Mon Feb 12 15:11:55 2018 From: dzieko at wcss.pl (Pawel Dziekonski) Date: Mon, 12 Feb 2018 16:11:55 +0100 Subject: [gpfsug-discuss] Configuration advice Message-ID: <20180212151155.GD23944@cefeid.wcss.wroc.pl> Hi All, I inherited from previous admin 2 separate gpfs machines. All hardware+software is old so I want to switch to new servers, new disk arrays, new gpfs version and new gpfs "design". Each machine has 4 gpfs filesystems and runs a TSM HSM client that migrates data to tapes using separate TSM servers: GPFS+HSM no 1 -> TSM server no 1 -> tapes GPFS+HSM no 2 -> TSM server no 2 -> tapes Migration is done by HSM (not GPFS policies). All filesystems are used for archiving results from HPC system and other files (a kind of backup - don't ask...). Data is written by users via nfs shares. There are 8 nfs mount points corresponding to 8 gpfs filesystems, but there is no real reason for that. 4 filesystems are large and heavily used, 4 remaining are almost not used. The question is how to configure new gpfs infrastructure? My initial impression is that I should create a GPFS cluster of 2+ nodes and export NFS using CES. The most important question is how many filesystem do I need? Maybe just 2 and 8 filesets? Or how to do that in a flexible way and not to lock myself in stupid configuration? any hints? thanks, Pawel ps. I will recall all data and copy it to new infrastructure. Yes, that's the way I want to do that. :) -- Pawel Dziekonski , http://www.wcss.pl Wroclaw Centre for Networking & Supercomputing, HPC Department From jonathan.buzzard at strath.ac.uk Tue Feb 13 13:43:01 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 13 Feb 2018 13:43:01 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> Message-ID: <1518529381.3326.93.camel@strath.ac.uk> On Fri, 2018-02-09 at 15:07 +0000, Buterbaugh, Kevin L wrote: > Hi All, > > Since several people have made this same suggestion, let me respond > to that. ?We did ask the vendor - twice - to do that. ?Their response > boils down to, ?No, the older version has bugs and we won?t send you > a controller with firmware that we know has bugs in it.? > > We have not had a full cluster downtime since the summer of 2016 - > and then it was only a one day downtime to allow the cleaning of our > core network switches after an electrical fire in our data center! > ?So the firmware on not only our storage arrays, but our SAN switches > as well, it a bit out of date, shall we say? > > That is an issue we need to address internally ? our users love us > not having regularly scheduled downtimes quarterly, yearly, or > whatever, but there is a cost to doing business that way... > What sort of storage arrays are you using that don't allow you to do a live update of the controller firmware? Heck these days even cheapy Dell MD3 series storage arrays allow you to do live drive firmware updates. Similarly with SAN switches surely you have separate A/B fabrics and can upgrade them one at a time live. In a properly designed system one should not need to schedule downtime for firmware updates. He says as he plans a firmware update on his routers for next Tuesday morning, with no scheduled downtime and no interruption to service. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Kevin.Buterbaugh at Vanderbilt.Edu Tue Feb 13 15:56:00 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 13 Feb 2018 15:56:00 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <1518529381.3326.93.camel@strath.ac.uk> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> <1518529381.3326.93.camel@strath.ac.uk> Message-ID: Hi JAB, OK, let me try one more time to clarify. I?m not naming the vendor ? they?re a small maker of commodity storage and we?ve been using their stuff for years and, overall, it?s been very solid. The problem in this specific case is that a major version firmware upgrade is required ? if the controllers were only a minor version apart we could do it live. And yes, we can upgrade our QLogic SAN switches firmware live ? in fact, we?ve done that in the past. Should?ve been more clear there ? we just try to do that as infrequently as possible. So the bottom line here is that we were unaware that ?major version? firmware upgrades could not be done live on our storage, but we?ve got a plan to work around this this time. Kevin > On Feb 13, 2018, at 7:43 AM, Jonathan Buzzard wrote: > > On Fri, 2018-02-09 at 15:07 +0000, Buterbaugh, Kevin L wrote: >> Hi All, >> >> Since several people have made this same suggestion, let me respond >> to that. We did ask the vendor - twice - to do that. Their response >> boils down to, ?No, the older version has bugs and we won?t send you >> a controller with firmware that we know has bugs in it.? >> >> We have not had a full cluster downtime since the summer of 2016 - >> and then it was only a one day downtime to allow the cleaning of our >> core network switches after an electrical fire in our data center! >> So the firmware on not only our storage arrays, but our SAN switches >> as well, it a bit out of date, shall we say? >> >> That is an issue we need to address internally ? our users love us >> not having regularly scheduled downtimes quarterly, yearly, or >> whatever, but there is a cost to doing business that way... >> > > What sort of storage arrays are you using that don't allow you to do a > live update of the controller firmware? Heck these days even cheapy > Dell MD3 series storage arrays allow you to do live drive firmware > updates. > > Similarly with SAN switches surely you have separate A/B fabrics and > can upgrade them one at a time live. > > In a properly designed system one should not need to schedule downtime > for firmware updates. He says as he plans a firmware update on his > routers for next Tuesday morning, with no scheduled downtime and no > interruption to service. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C16b7c1eca3d846afc65208d572e7b6f1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636541261898197334&sdata=fY66HEDEia55g2x18VETOmE755IH7lXAfoznAewCe5A%3D&reserved=0 From griznog at gmail.com Wed Feb 14 05:32:39 2018 From: griznog at gmail.com (John Hanks) Date: Tue, 13 Feb 2018 21:32:39 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. Message-ID: Hi, We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty straightforward run of the mill stuff. But are seeing this odd behavior. If I do this in a shell script, given a file called "a" cat a a a a a a a a a a > /path/to/gpfs/mount/test grep ATAG /path/to/gpfs/mount/test | wc -l sleep 4 grep ATAG /path/to/gpfs/mount/test | wc -l The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/gpfs/mount/test matches" The second grep | wc -l returns the correct count of ATAG in the file. Why does it take 4 seconds (3 isn't enough) for that file to be properly recognized as a text file and/or why is it seen as a binary file in the first place since a is a plain text file? Note that I have the same filesystem mounted via NFS and over an NFS mount it works as expected. Any illumination is appreciated, jbh -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Feb 14 06:49:42 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 14 Feb 2018 08:49:42 +0200 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: Hi This seems to be setup specific Care to explain a bit more of the setup. Number of nodes GPFS versions, number of FS, Networking, running from admin node, server / client, number of NSD, separated meta and data, etc? I got interested and run a quick test on a gpfs far from powerful cluster of 3 nodes on KVM [root at specscale01 IBM_REPO]# echo "a a a a a a a a a a" > test && grep ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l 0 0 [root at specscale01 IBM_REPO]# -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: John Hanks To: gpfsug-discuss Date: 14/02/2018 07:33 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty straightforward run of the mill stuff. But are seeing this odd behavior. If I do this in a shell script, given a file called "a" cat a a a a a a a a a a > /path/to/gpfs/mount/test grep ATAG /path/to/gpfs/mount/test | wc -l sleep 4 grep ATAG /path/to/gpfs/mount/test | wc -l The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/gpfs/mount/test matches" The second grep | wc -l returns the correct count of ATAG in the file. Why does it take 4 seconds (3 isn't enough) for that file to be properly recognized as a text file and/or why is it seen as a binary file in the first place since a is a plain text file? Note that I have the same filesystem mounted via NFS and over an NFS mount it works as expected. Any illumination is appreciated, jbh_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=ut35qIIMxjZMX3obFJ2xtUMng4MtGtKz4YHxpkgQbak&s=cNt66GjRD6rVhq7nGcvT76l-0_u2C3UTz9SfwzHf1xw&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Feb 14 06:53:20 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 14 Feb 2018 08:53:20 +0200 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: Sorry With cat [root at specscale01 IBM_REPO]# cp test a [root at specscale01 IBM_REPO]# cat a a a a > test && grep ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l 0 0 -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: Luis Bolinches To: gpfsug main discussion list Date: 14/02/2018 08:49 Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi This seems to be setup specific Care to explain a bit more of the setup. Number of nodes GPFS versions, number of FS, Networking, running from admin node, server / client, number of NSD, separated meta and data, etc? I got interested and run a quick test on a gpfs far from powerful cluster of 3 nodes on KVM [root at specscale01 IBM_REPO]# echo "a a a a a a a a a a" > test && grep ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l 0 0 [root at specscale01 IBM_REPO]# -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: John Hanks To: gpfsug-discuss Date: 14/02/2018 07:33 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty straightforward run of the mill stuff. But are seeing this odd behavior. If I do this in a shell script, given a file called "a" cat a a a a a a a a a a > /path/to/gpfs/mount/test grep ATAG /path/to/gpfs/mount/test | wc -l sleep 4 grep ATAG /path/to/gpfs/mount/test | wc -l The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/gpfs/mount/test matches" The second grep | wc -l returns the correct count of ATAG in the file. Why does it take 4 seconds (3 isn't enough) for that file to be properly recognized as a text file and/or why is it seen as a binary file in the first place since a is a plain text file? Note that I have the same filesystem mounted via NFS and over an NFS mount it works as expected. Any illumination is appreciated, jbh_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=ut35qIIMxjZMX3obFJ2xtUMng4MtGtKz4YHxpkgQbak&s=cNt66GjRD6rVhq7nGcvT76l-0_u2C3UTz9SfwzHf1xw&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=HrR-mBJ82ubcbtBin7NGVl2VenLj726Fcah6-3XFvDs&s=d5YiAyXz4el9bF0zjGL9gVjnTfbX4z-qelZodxRqlz0&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From griznog at gmail.com Wed Feb 14 14:20:32 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 06:20:32 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: Hi Luis, GPFS is 4.2.3 (gpfs.base-4.2.3-6.x86_64), All servers (8 in front of a DDN SFA12K) are RHEL 7.3 (stock DDN setup). All 47 clients are CentOS 7.4. GPFS mount: # mount | grep gpfs gsfs0 on /srv/gsfs0 type gpfs (rw,relatime) NFS mount: mount | grep $HOME 10.210.15.57:/srv/gsfs0/home/griznog on /home/griznog type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.210.15.57,mountvers=3,mountport=20048,mountproto=tcp,local_lock=none,addr=10.210.15.57) Example script: #!/bin/bash cat pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt > /srv/gsfs0/projects/pipetest.tmp.txt grep L1 /srv/gsfs0/projects/pipetest.tmp.txt | wc -l cat pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt > $HOME/pipetest.tmp.txt grep L1 $HOME/pipetest.tmp.txt | wc -l Example output: # ./pipetest.sh 1 1836 # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 /srv/gsfs0/projects/pipetest.tmp.txt We can "fix" the user case that exposed this by not using a temp file or inserting a sleep, but I'd still like to know why GPFS is behaving this way and make it stop. mmlsconfig below. Thanks, jbh mmlsconfig Configuration data for cluster SCG-GS.scg-gs0: ---------------------------------------------- clusterName SCG-GS.scg-gs0 clusterId 8456032987852400706 dmapiFileHandleSize 32 maxblocksize 4096K cnfsSharedRoot /srv/gsfs0/GS-NFS cnfsMountdPort 597 socketMaxListenConnections 1024 fileHeatPeriodMinutes 1440 fileHeatLossPercent 1 pingPeriod 5 minMissedPingTimeout 30 afmHashVersion 1 minReleaseLevel 4.2.0.1 [scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7] nsdbufspace 70 [common] healthCheckInterval 20 maxStatCache 512 maxFilesToCache 50000 nsdMinWorkerThreads 512 nsdMaxWorkerThreads 1024 deadlockDetectionThreshold 0 deadlockOverloadThreshold 0 prefetchThreads 288 worker1Threads 320 maxMBpS 2000 [scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7] maxMBpS 24000 [common] atimeDeferredSeconds 300 pitWorkerThreadsPerNode 2 cipherList AUTHONLY pagepool 1G [scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7] pagepool 8G [common] cnfsNFSDprocs 256 nfsPrefetchStrategy 1 autoload yes adminMode central File systems in cluster SCG-GS.scg-gs0: --------------------------------------- /dev/gsfs0 On Tue, Feb 13, 2018 at 10:53 PM, Luis Bolinches wrote: > Sorry > > With cat > > [root at specscale01 IBM_REPO]# cp test a > [root at specscale01 IBM_REPO]# cat a a a a > test && grep ATAG test | wc -l > && sleep 4 && grep ATAG test | wc -l > 0 > 0 > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > Luis Bolinches > Consultant IT Specialist > Mobile Phone: +358503112585 <+358%2050%203112585> > https://www.youracclaim.com/user/luis-bolinches > > "If you always give you will always have" -- Anonymous > > > > From: Luis Bolinches > To: gpfsug main discussion list > Date: 14/02/2018 08:49 > Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by > grep. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi > > This seems to be setup specific > > Care to explain a bit more of the setup. Number of nodes GPFS versions, > number of FS, Networking, running from admin node, server / client, number > of NSD, separated meta and data, etc? > > I got interested and run a quick test on a gpfs far from powerful cluster > of 3 nodes on KVM > > [root at specscale01 IBM_REPO]# echo "a a a a a a a a a a" > test && grep > ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l > 0 > 0 > [root at specscale01 IBM_REPO]# > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > Luis Bolinches > Consultant IT Specialist > Mobile Phone: +358503112585 <+358%2050%203112585> > *https://www.youracclaim.com/user/luis-bolinches* > > > "If you always give you will always have" -- Anonymous > > > > From: John Hanks > To: gpfsug-discuss > Date: 14/02/2018 07:33 > Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi, > > We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty > straightforward run of the mill stuff. But are seeing this odd behavior. If > I do this in a shell script, given a file called "a" > > cat a a a a a a a a a a > /path/to/gpfs/mount/test > grep ATAG /path/to/gpfs/mount/test | wc -l > sleep 4 > grep ATAG /path/to/gpfs/mount/test | wc -l > > The first grep | wc -l returns 1, because grep outputs "Binary file > /path/to/gpfs/mount/test matches" > > The second grep | wc -l returns the correct count of ATAG in the file. > > Why does it take 4 seconds (3 isn't enough) for that file to be properly > recognized as a text file and/or why is it seen as a binary file in the > first place since a is a plain text file? > > Note that I have the same filesystem mounted via NFS and over an NFS mount > it works as expected. > > Any illumination is appreciated, > > jbh_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=ut35qIIMxjZMX3obFJ2xtUMng4MtGtKz4YHxpkgQbak&s=cNt66GjRD6rVhq7nGcvT76l-0_u2C3UTz9SfwzHf1xw&e=* > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > 1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=HrR- > mBJ82ubcbtBin7NGVl2VenLj726Fcah6-3XFvDs&s=d5YiAyXz4el9bF0zjGL9gVjnTfbX4z > -qelZodxRqlz0&e= > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Feb 14 15:08:10 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 14 Feb 2018 10:08:10 -0500 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: <11815.1518620890@turing-police.cc.vt.edu> On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > /srv/gsfs0/projects/pipetest.tmp.txt > > We can "fix" the user case that exposed this by not using a temp file or > inserting a sleep, but I'd still like to know why GPFS is behaving this way > and make it stop. May be related to replication, or other behind-the-scenes behavior. Consider this example - 4.2.3.6, data and metadata replication both set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full fiberchannel mesh to 3 Dell MD34something arrays. % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test 4096+0 records in 4096+0 records out 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test Notice that the first /bin/ls shouldn't be starting until after the dd has completed - at which point it's only allocated half the blocks needed to hold the 4M of data at one site. 5 seconds later, it's allocated the blocks at both sites and thus shows the full 8M needed for 2 copies. I've also seen (but haven't replicated it as I write this) a small file (4-8K or so) showing first one full-sized block, then a second full-sized block, and then dropping back to what's needed for 2 1/32nd fragments. That had me scratching my head Having said that, that's all metadata fun and games, while your case appears to have some problems with data integrity (which is a whole lot scarier). It would be *really* nice if we understood the problem here. The scariest part is: > The first grep | wc -l returns 1, because grep outputs ?"Binary file /path/to/ > gpfs/mount/test matches" which seems to be implying that we're failing on semantic consistency. Basically, your 'cat' command is completing and closing the file, but then a temporally later open of the same find is reading something other that only the just-written data. My first guess is that it's a race condition similar to the following: The cat command is causing a write on one NSD server, and the first grep results in a read from a *different* NSD server, returning the data that *used* to be in the block because the read actually happens before the first NSD server actually completes the write. It may be interesting to replace the grep's with pairs of 'ls -ls / dd' commands to grab the raw data and its size, and check the following: 1) does the size (both blocks allocated and logical length) reported by ls match the amount of data actually read by the dd? 2) Is the file length as actually read equal to the written length, or does it overshoot and read all the way to the next block boundary? 3) If the length is correct, what's wrong with the data that's telling grep that it's a binary file? ( od -cx is your friend here). 4) If it overshoots, is the remainder all-zeros (good) or does it return semi-random "what used to be there" data (bad, due to data exposure issues)? (It's certainly not the most perplexing data consistency issue I've hit in 4 decades - the winner *has* to be a intermittent data read corruption on a GPFS 3.5 cluster that had us, IBM, SGI, DDN, and at least one vendor of networking gear all chasing our tails for 18 months before we finally tracked it down. :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From griznog at gmail.com Wed Feb 14 15:21:52 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 07:21:52 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: <11815.1518620890@turing-police.cc.vt.edu> References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Hi Valdis, I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think this is a data integrity issue, thankfully: $ ./pipetestls.sh 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 /srv/gsfs0/projects/pipetest.tmp.txt 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt $ ./pipetestmd5.sh 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt And replacing grep with 'file' even properly sees the files as ASCII: $ ./pipetestfile.sh /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines I'll poke a little harder at grep next and see what the difference in strace of each reveals. Thanks, jbh On Wed, Feb 14, 2018 at 7:08 AM, wrote: > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this > way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep > 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the dd has > completed - at which point it's only allocated half the blocks needed to > hold > the 4M of data at one site. 5 seconds later, it's allocated the blocks at > both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small file > (4-8K > or so) showing first one full-sized block, then a second full-sized block, > and > then dropping back to what's needed for 2 1/32nd fragments. That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier). It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs "Binary file > /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, but then > a > temporally later open of the same find is reading something other that > only the > just-written data. My first guess is that it's a race condition similar > to the > following: The cat command is causing a write on one NSD server, and the > first > grep results in a read from a *different* NSD server, returning the data > that > *used* to be in the block because the read actually happens before the > first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / dd' > commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, or > does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's telling > grep that > it's a binary file? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it return > semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've hit in > 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear all > chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Feb 14 15:33:24 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 14 Feb 2018 17:33:24 +0200 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Hi not going to mention much on DDN setups but first thing that makes my eyes blurry a bit is minReleaseLevel 4.2.0.1 when you mention your whole cluster is already on 4.2.3 -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: John Hanks To: gpfsug main discussion list Date: 14/02/2018 17:22 Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Valdis, I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think this is a data integrity issue, thankfully: $ ./pipetestls.sh 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 /srv/gsfs0/projects/pipetest.tmp.txt 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt $ ./pipetestmd5.sh 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt And replacing grep with 'file' even properly sees the files as ASCII: $ ./pipetestfile.sh /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines I'll poke a little harder at grep next and see what the difference in strace of each reveals. Thanks, jbh On Wed, Feb 14, 2018 at 7:08 AM, wrote: On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > /srv/gsfs0/projects/pipetest.tmp.txt > > We can "fix" the user case that exposed this by not using a temp file or > inserting a sleep, but I'd still like to know why GPFS is behaving this way > and make it stop. May be related to replication, or other behind-the-scenes behavior. Consider this example - 4.2.3.6, data and metadata replication both set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full fiberchannel mesh to 3 Dell MD34something arrays. % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test 4096+0 records in 4096+0 records out 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test Notice that the first /bin/ls shouldn't be starting until after the dd has completed - at which point it's only allocated half the blocks needed to hold the 4M of data at one site. 5 seconds later, it's allocated the blocks at both sites and thus shows the full 8M needed for 2 copies. I've also seen (but haven't replicated it as I write this) a small file (4-8K or so) showing first one full-sized block, then a second full-sized block, and then dropping back to what's needed for 2 1/32nd fragments. That had me scratching my head Having said that, that's all metadata fun and games, while your case appears to have some problems with data integrity (which is a whole lot scarier). It would be *really* nice if we understood the problem here. The scariest part is: > The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/ > gpfs/mount/test matches" which seems to be implying that we're failing on semantic consistency. Basically, your 'cat' command is completing and closing the file, but then a temporally later open of the same find is reading something other that only the just-written data. My first guess is that it's a race condition similar to the following: The cat command is causing a write on one NSD server, and the first grep results in a read from a *different* NSD server, returning the data that *used* to be in the block because the read actually happens before the first NSD server actually completes the write. It may be interesting to replace the grep's with pairs of 'ls -ls / dd' commands to grab the raw data and its size, and check the following: 1) does the size (both blocks allocated and logical length) reported by ls match the amount of data actually read by the dd? 2) Is the file length as actually read equal to the written length, or does it overshoot and read all the way to the next block boundary? 3) If the length is correct, what's wrong with the data that's telling grep that it's a binary file? ( od -cx is your friend here). 4) If it overshoots, is the remainder all-zeros (good) or does it return semi-random "what used to be there" data (bad, due to data exposure issues)? (It's certainly not the most perplexing data consistency issue I've hit in 4 decades - the winner *has* to be a intermittent data read corruption on a GPFS 3.5 cluster that had us, IBM, SGI, DDN, and at least one vendor of networking gear all chasing our tails for 18 months before we finally tracked it down. :) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_UFKMxNklx_00YDdSlmEr9lCvnUC9AWFsTVbTn6yAr4&s=JUVyUiTIfln67di06lb-hvwpA8207JNkioGxY1ayAlE&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Feb 14 17:51:04 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 14 Feb 2018 12:51:04 -0500 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Just speculating here (also known as making things up) but I wonder if grep is somehow using the file's size in its determination of binary status. I also see mmap in the strace so maybe there's some issue with mmap where some internal GPFS buffer is getting truncated inappropriately but leaving a bunch of null values which gets returned to grep. -Aaron On 2/14/18 10:21 AM, John Hanks wrote: > Hi Valdis, > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > this is a data integrity issue, thankfully: > > $ ./pipetestls.sh? > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > /srv/gsfs0/projects/pipetest.tmp.txt > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt > > $ ./pipetestmd5.sh? > 15cb81a85c9e450bdac8230309453a0a? /srv/gsfs0/projects/pipetest.tmp.txt > 15cb81a85c9e450bdac8230309453a0a? /home/griznog/pipetest.tmp.txt > > And replacing grep with 'file' even properly sees the files as ASCII: > $ ./pipetestfile.sh? > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > I'll poke a little harder at grep next and see what the difference in > strace of each reveals. > > Thanks, > > jbh > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > wrote: > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > #? ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the > dd has > completed - at which point it's only allocated half the blocks > needed to hold > the 4M of data at one site.? 5 seconds later, it's allocated the > blocks at both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small > file (4-8K > or so) showing first one full-sized block, then a second full-sized > block, and > then dropping back to what's needed for 2 1/32nd fragments.? That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier).? It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs ?"Binary file /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, > but then a > temporally later open of the same find is reading something other > that only the > just-written data.? My first guess is that it's a race condition > similar to the > following: The cat command is causing a write on one NSD server, and > the first > grep results in a read from a *different* NSD server, returning the > data that > *used* to be in the block because the read actually happens before > the first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / > dd' commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, > or does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's > telling grep that > it's a binary file?? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it > return semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've > hit in 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear > all chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From griznog at gmail.com Wed Feb 14 18:30:39 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 10:30:39 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Straces are interesting, but don't immediately open my eyes: strace of grep on NFS (works as expected) openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 530721 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3bf6c43000 write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 strace on GPFS (thinks file is binary) openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 262144 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd45ee88000 close(3) = 0 write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches ) = 72 Do the lseek() results indicate that the grep on the GPFS mounted version thinks the file is a sparse file? For comparison I strace'd md5sum in place of the grep and it does not lseek() with SEEK_HOLE, it's access in both cases look identical, like: open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb7d2c2b000 read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 ...[reads clipped]... read(3, "", 24576) = 0 lseek(3, 0, SEEK_CUR) = 530721 close(3) = 0 jbh On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister wrote: > Just speculating here (also known as making things up) but I wonder if > grep is somehow using the file's size in its determination of binary > status. I also see mmap in the strace so maybe there's some issue with > mmap where some internal GPFS buffer is getting truncated > inappropriately but leaving a bunch of null values which gets returned > to grep. > > -Aaron > > On 2/14/18 10:21 AM, John Hanks wrote: > > Hi Valdis, > > > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > > this is a data integrity issue, thankfully: > > > > $ ./pipetestls.sh > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > > /srv/gsfs0/projects/pipetest.tmp.txt > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 > /home/griznog/pipetest.tmp.txt > > > > $ ./pipetestmd5.sh > > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > > > And replacing grep with 'file' even properly sees the files as ASCII: > > $ ./pipetestfile.sh > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > > > I'll poke a little harder at grep next and see what the difference in > > strace of each reveals. > > > > Thanks, > > > > jbh > > > > > > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > > wrote: > > > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt > $HOME/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > > > We can "fix" the user case that exposed this by not using a temp > file or > > > inserting a sleep, but I'd still like to know why GPFS is behaving > this way > > > and make it stop. > > > > May be related to replication, or other behind-the-scenes behavior. > > > > Consider this example - 4.2.3.6, data and metadata replication both > > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > > a full > > fiberchannel mesh to 3 Dell MD34something arrays. > > > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > > 4096+0 records in > > 4096+0 records out > > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > > Notice that the first /bin/ls shouldn't be starting until after the > > dd has > > completed - at which point it's only allocated half the blocks > > needed to hold > > the 4M of data at one site. 5 seconds later, it's allocated the > > blocks at both > > sites and thus shows the full 8M needed for 2 copies. > > > > I've also seen (but haven't replicated it as I write this) a small > > file (4-8K > > or so) showing first one full-sized block, then a second full-sized > > block, and > > then dropping back to what's needed for 2 1/32nd fragments. That > had me > > scratching my head > > > > Having said that, that's all metadata fun and games, while your case > > appears to have some problems with data integrity (which is a whole > lot > > scarier). It would be *really* nice if we understood the problem > here. > > > > The scariest part is: > > > > > The first grep | wc -l returns 1, because grep outputs "Binary > file /path/to/ > > > gpfs/mount/test matches" > > > > which seems to be implying that we're failing on semantic > consistency. > > Basically, your 'cat' command is completing and closing the file, > > but then a > > temporally later open of the same find is reading something other > > that only the > > just-written data. My first guess is that it's a race condition > > similar to the > > following: The cat command is causing a write on one NSD server, and > > the first > > grep results in a read from a *different* NSD server, returning the > > data that > > *used* to be in the block because the read actually happens before > > the first > > NSD server actually completes the write. > > > > It may be interesting to replace the grep's with pairs of 'ls -ls / > > dd' commands to grab the > > raw data and its size, and check the following: > > > > 1) does the size (both blocks allocated and logical length) reported > by > > ls match the amount of data actually read by the dd? > > > > 2) Is the file length as actually read equal to the written length, > > or does it > > overshoot and read all the way to the next block boundary? > > > > 3) If the length is correct, what's wrong with the data that's > > telling grep that > > it's a binary file? ( od -cx is your friend here). > > > > 4) If it overshoots, is the remainder all-zeros (good) or does it > > return semi-random > > "what used to be there" data (bad, due to data exposure issues)? > > > > (It's certainly not the most perplexing data consistency issue I've > > hit in 4 decades - the > > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > > cluster that > > had us, IBM, SGI, DDN, and at least one vendor of networking gear > > all chasing our > > tails for 18 months before we finally tracked it down. :) > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 14 09:00:10 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 14 Feb 2018 09:00:10 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions Message-ID: I am sure this is a known behavior and I am going to feel very foolish in a few minutes... We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Feb 14 18:38:41 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 14 Feb 2018 18:38:41 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions In-Reply-To: References: Message-ID: Is it an AFM cache? We see this sort of behaviour occasionally where the cache has an "old" view of the directory. Doing an ls, it evidently goes back to home but by then you already have weird stuff. The next ls is usually fine. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of john.hearns at asml.com [john.hearns at asml.com] Sent: 14 February 2018 09:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] Odd d????????? permissions I am sure this is a known behavior and I am going to feel very foolish in a few minutes? We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From bbanister at jumptrading.com Wed Feb 14 18:48:32 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 14 Feb 2018 18:48:32 +0000 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Hi all, We found this a while back and IBM fixed it. Here?s your answer: http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385 Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hanks Sent: Wednesday, February 14, 2018 12:31 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. Note: External Email ________________________________ Straces are interesting, but don't immediately open my eyes: strace of grep on NFS (works as expected) openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 530721 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3bf6c43000 write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 strace on GPFS (thinks file is binary) openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 262144 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd45ee88000 close(3) = 0 write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches ) = 72 Do the lseek() results indicate that the grep on the GPFS mounted version thinks the file is a sparse file? For comparison I strace'd md5sum in place of the grep and it does not lseek() with SEEK_HOLE, it's access in both cases look identical, like: open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb7d2c2b000 read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 ...[reads clipped]... read(3, "", 24576) = 0 lseek(3, 0, SEEK_CUR) = 530721 close(3) = 0 jbh On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister > wrote: Just speculating here (also known as making things up) but I wonder if grep is somehow using the file's size in its determination of binary status. I also see mmap in the strace so maybe there's some issue with mmap where some internal GPFS buffer is getting truncated inappropriately but leaving a bunch of null values which gets returned to grep. -Aaron On 2/14/18 10:21 AM, John Hanks wrote: > Hi Valdis, > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > this is a data integrity issue, thankfully: > > $ ./pipetestls.sh > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > /srv/gsfs0/projects/pipetest.tmp.txt > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt > > $ ./pipetestmd5.sh > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > And replacing grep with 'file' even properly sees the files as ASCII: > $ ./pipetestfile.sh > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > I'll poke a little harder at grep next and see what the difference in > strace of each reveals. > > Thanks, > > jbh > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > >> wrote: > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the > dd has > completed - at which point it's only allocated half the blocks > needed to hold > the 4M of data at one site. 5 seconds later, it's allocated the > blocks at both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small > file (4-8K > or so) showing first one full-sized block, then a second full-sized > block, and > then dropping back to what's needed for 2 1/32nd fragments. That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier). It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, > but then a > temporally later open of the same find is reading something other > that only the > just-written data. My first guess is that it's a race condition > similar to the > following: The cat command is causing a write on one NSD server, and > the first > grep results in a read from a *different* NSD server, returning the > data that > *used* to be in the block because the read actually happens before > the first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / > dd' commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, > or does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's > telling grep that > it's a binary file? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it > return semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've > hit in 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear > all chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From griznog at gmail.com Wed Feb 14 19:17:19 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 11:17:19 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Thanks Bryan, mystery solved :) We also stumbled across these related items, in case anyone else wanders into this thread. http://bug-grep.gnu.narkive.com/Y8cfvWDt/bug-27666-grep-on-gpfs-filesystem-seek-hole-problem https://www.ibm.com/developerworks/community/forums/html/topic?id=c2a94433-9ec0-4a4b-abfe-d0a1e721d630 GPFS, the gift that keeps on giving ... me more things to do instead of doing the things I want to be doing. Thanks all, jbh On Wed, Feb 14, 2018 at 10:48 AM, Bryan Banister wrote: > Hi all, > > > > We found this a while back and IBM fixed it. Here?s your answer: > http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385 > > > > Cheers, > > -Bryan > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss- > bounces at spectrumscale.org] *On Behalf Of *John Hanks > *Sent:* Wednesday, February 14, 2018 12:31 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Odd behavior with cat followed by grep. > > > > *Note: External Email* > ------------------------------ > > Straces are interesting, but don't immediately open my eyes: > > > > strace of grep on NFS (works as expected) > > > > openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl > for device) > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > lseek(3, 32768, SEEK_HOLE) = 530721 > > lseek(3, 32768, SEEK_SET) = 32768 > > fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7f3bf6c43000 > > write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 > > > > strace on GPFS (thinks file is binary) > > > > openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl > for device) > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > lseek(3, 32768, SEEK_HOLE) = 262144 > > lseek(3, 32768, SEEK_SET) = 32768 > > fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7fd45ee88000 > > close(3) = 0 > > write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file > /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches > > ) = 72 > > > > Do the lseek() results indicate that the grep on the GPFS mounted version > thinks the file is a sparse file? For comparison I strace'd md5sum in place > of the grep and it does not lseek() with SEEK_HOLE, it's access in both > cases look identical, like: > > > > open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7fb7d2c2b000 > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > ...[reads clipped]... > > read(3, "", 24576) = 0 > > lseek(3, 0, SEEK_CUR) = 530721 > > close(3) = 0 > > > > > > jbh > > > > > > On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister > wrote: > > Just speculating here (also known as making things up) but I wonder if > grep is somehow using the file's size in its determination of binary > status. I also see mmap in the strace so maybe there's some issue with > mmap where some internal GPFS buffer is getting truncated > inappropriately but leaving a bunch of null values which gets returned > to grep. > > -Aaron > > On 2/14/18 10:21 AM, John Hanks wrote: > > Hi Valdis, > > > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > > this is a data integrity issue, thankfully: > > > > $ ./pipetestls.sh > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > > /srv/gsfs0/projects/pipetest.tmp.txt > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 > /home/griznog/pipetest.tmp.txt > > > > $ ./pipetestmd5.sh > > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > > > And replacing grep with 'file' even properly sees the files as ASCII: > > $ ./pipetestfile.sh > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > > > I'll poke a little harder at grep next and see what the difference in > > strace of each reveals. > > > > Thanks, > > > > jbh > > > > > > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > > > wrote: > > > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt > $HOME/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > > > We can "fix" the user case that exposed this by not using a temp > file or > > > inserting a sleep, but I'd still like to know why GPFS is behaving > this way > > > and make it stop. > > > > May be related to replication, or other behind-the-scenes behavior. > > > > Consider this example - 4.2.3.6, data and metadata replication both > > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > > a full > > fiberchannel mesh to 3 Dell MD34something arrays. > > > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > > 4096+0 records in > > 4096+0 records out > > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > > Notice that the first /bin/ls shouldn't be starting until after the > > dd has > > completed - at which point it's only allocated half the blocks > > needed to hold > > the 4M of data at one site. 5 seconds later, it's allocated the > > blocks at both > > sites and thus shows the full 8M needed for 2 copies. > > > > I've also seen (but haven't replicated it as I write this) a small > > file (4-8K > > or so) showing first one full-sized block, then a second full-sized > > block, and > > then dropping back to what's needed for 2 1/32nd fragments. That > had me > > scratching my head > > > > Having said that, that's all metadata fun and games, while your case > > appears to have some problems with data integrity (which is a whole > lot > > scarier). It would be *really* nice if we understood the problem > here. > > > > The scariest part is: > > > > > The first grep | wc -l returns 1, because grep outputs "Binary > file /path/to/ > > > gpfs/mount/test matches" > > > > which seems to be implying that we're failing on semantic > consistency. > > Basically, your 'cat' command is completing and closing the file, > > but then a > > temporally later open of the same find is reading something other > > that only the > > just-written data. My first guess is that it's a race condition > > similar to the > > following: The cat command is causing a write on one NSD server, and > > the first > > grep results in a read from a *different* NSD server, returning the > > data that > > *used* to be in the block because the read actually happens before > > the first > > NSD server actually completes the write. > > > > It may be interesting to replace the grep's with pairs of 'ls -ls / > > dd' commands to grab the > > raw data and its size, and check the following: > > > > 1) does the size (both blocks allocated and logical length) reported > by > > ls match the amount of data actually read by the dd? > > > > 2) Is the file length as actually read equal to the written length, > > or does it > > overshoot and read all the way to the next block boundary? > > > > 3) If the length is correct, what's wrong with the data that's > > telling grep that > > it's a binary file? ( od -cx is your friend here). > > > > 4) If it overshoots, is the remainder all-zeros (good) or does it > > return semi-random > > "what used to be there" data (bad, due to data exposure issues)? > > > > (It's certainly not the most perplexing data consistency issue I've > > hit in 4 decades - the > > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > > cluster that > > had us, IBM, SGI, DDN, and at least one vendor of networking gear > > all chasing our > > tails for 18 months before we finally tracked it down. :) > > > > _______________________________________________ > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Feb 14 20:54:04 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 14 Feb 2018 20:54:04 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions In-Reply-To: References: Message-ID: <9C726F78-D870-4E1E-92B6-96F495F53D54@vanderbilt.edu> Hi John, We had a similar incident happen just a week or so ago here, although in our case it was that certain files within a directory showed up with the question marks, while others didn?t. The problem was simply that the node had been run out of RAM and the GPFS daemon couldn?t allocate memory. Killing the offending process(es) and restarting GPFS fixed the issue. We saw hundreds of messages like: 2018-02-07_16:35:13.267-0600: [E] Failed to allocate 92274688 bytes in memory pool, err -1 In the GPFS log when this was happening. HTHAL? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Feb 14, 2018, at 12:38 PM, Simon Thompson (IT Research Support) > wrote: Is it an AFM cache? We see this sort of behaviour occasionally where the cache has an "old" view of the directory. Doing an ls, it evidently goes back to home but by then you already have weird stuff. The next ls is usually fine. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of john.hearns at asml.com [john.hearns at asml.com] Sent: 14 February 2018 09:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] Odd d????????? permissions I am sure this is a known behavior and I am going to feel very foolish in a few minutes? We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9df4b4d88544447ac29608d573da2d51%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636542303262503651&sdata=v6pnBIEvu6lyP3mGkkRX7hSj58H8vvkUl6R%2FCsq6gmc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From coetzee.ray at gmail.com Wed Feb 14 20:59:52 2018 From: coetzee.ray at gmail.com (Ray Coetzee) Date: Wed, 14 Feb 2018 20:59:52 +0000 Subject: [gpfsug-discuss] Scale 5, filesystem guidelines Message-ID: Since Scale 5.0 was released I've not seen much guidelines provided on how to make the best of the new filesystem layout. For example, is dedicated metadata SSD's still recommended or does the Scale 5 improvements mean we can just do metadata and data pools now? I'd be interested to hear of anyone's experience so far. Kind regards Ray Coetzee -------------- next part -------------- An HTML attachment was scrubbed... URL: From sxiao at us.ibm.com Wed Feb 14 21:53:17 2018 From: sxiao at us.ibm.com (Steve Xiao) Date: Wed, 14 Feb 2018 16:53:17 -0500 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. (John Hanks) In-Reply-To: References: Message-ID: This could be related to the following flash: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1012054 You should contact IBM service to obtain the fix for your release. Steve Y. Xiao gpfsug-discuss-bounces at spectrumscale.org wrote on 02/14/2018 02:18:02 PM: > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 02/14/2018 02:18 PM > Subject: gpfsug-discuss Digest, Vol 73, Issue 36 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Odd behavior with cat followed by grep. (John Hanks) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 14 Feb 2018 11:17:19 -0800 > From: John Hanks > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Thanks Bryan, mystery solved :) > > We also stumbled across these related items, in case anyone else wanders > into this thread. > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__bug-2Dgrep.gnu.narkive.com_Y8cfvWDt_bug-2D27666-2Dgrep-2Don-2Dgpfs-2Dfilesystem-2Dseek-2Dhole-2Dproblem&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=FgxYBxqHZ0bHdWirEs1U_B3oDpeHJe8iRd- > TYrXh6FI&e= > > https://www.ibm.com/developerworks/community/forums/html/topic? > id=c2a94433-9ec0-4a4b-abfe-d0a1e721d630 > > GPFS, the gift that keeps on giving ... me more things to do instead of > doing the things I want to be doing. > > Thanks all, > > jbh > > On Wed, Feb 14, 2018 at 10:48 AM, Bryan Banister > wrote: > > > Hi all, > > > > > > > > We found this a while back and IBM fixed it. Here?s your answer: > > http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385 > > > > > > > > Cheers, > > > > -Bryan > > > > > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss- > > bounces at spectrumscale.org] *On Behalf Of *John Hanks > > *Sent:* Wednesday, February 14, 2018 12:31 PM > > *To:* gpfsug main discussion list > > *Subject:* Re: [gpfsug-discuss] Odd behavior with cat followed by grep. > > > > > > > > *Note: External Email* > > ------------------------------ > > > > Straces are interesting, but don't immediately open my eyes: > > > > > > > > strace of grep on NFS (works as expected) > > > > > > > > openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > > > ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl > > for device) > > > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > > > lseek(3, 32768, SEEK_HOLE) = 530721 > > > > lseek(3, 32768, SEEK_SET) = 32768 > > > > fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 > > > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0x7f3bf6c43000 > > > > write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 > > > > > > > > strace on GPFS (thinks file is binary) > > > > > > > > openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > > > ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl > > for device) > > > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > > > lseek(3, 32768, SEEK_HOLE) = 262144 > > > > lseek(3, 32768, SEEK_SET) = 32768 > > > > fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 > > > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0x7fd45ee88000 > > > > close(3) = 0 > > > > write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file > > /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches > > > > ) = 72 > > > > > > > > Do the lseek() results indicate that the grep on the GPFS mounted version > > thinks the file is a sparse file? For comparison I strace'd md5sum in place > > of the grep and it does not lseek() with SEEK_HOLE, it's access in both > > cases look identical, like: > > > > > > > > open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > > > fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0x7fb7d2c2b000 > > > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > > > ...[reads clipped]... > > > > read(3, "", 24576) = 0 > > > > lseek(3, 0, SEEK_CUR) = 530721 > > > > close(3) = 0 > > > > > > > > > > > > jbh > > > > > > > > > > > > On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister > > wrote: > > > > Just speculating here (also known as making things up) but I wonder if > > grep is somehow using the file's size in its determination of binary > > status. I also see mmap in the strace so maybe there's some issue with > > mmap where some internal GPFS buffer is getting truncated > > inappropriately but leaving a bunch of null values which gets returned > > to grep. > > > > -Aaron > > > > On 2/14/18 10:21 AM, John Hanks wrote: > > > Hi Valdis, > > > > > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > > > this is a data integrity issue, thankfully: > > > > > > $ ./pipetestls.sh > > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 > > /home/griznog/pipetest.tmp.txt > > > > > > $ ./pipetestmd5.sh > > > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > > > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > > > > > And replacing grep with 'file' even properly sees the files as ASCII: > > > $ ./pipetestfile.sh > > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > > > > > I'll poke a little harder at grep next and see what the difference in > > > strace of each reveals. > > > > > > Thanks, > > > > > > jbh > > > > > > > > > > > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > > > > > wrote: > > > > > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > > > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt > > $HOME/pipetest.tmp.txt > > > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > > /home/griznog/pipetest.tmp.txt > > > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > > > > > We can "fix" the user case that exposed this by not using a temp > > file or > > > > inserting a sleep, but I'd still like to know why GPFS is behaving > > this way > > > > and make it stop. > > > > > > May be related to replication, or other behind-the-scenes behavior. > > > > > > Consider this example - 4.2.3.6, data and metadata replication both > > > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > > > a full > > > fiberchannel mesh to 3 Dell MD34something arrays. > > > > > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > > > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > > > 4096+0 records in > > > 4096+0 records out > > > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > > > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > > > > Notice that the first /bin/ls shouldn't be starting until after the > > > dd has > > > completed - at which point it's only allocated half the blocks > > > needed to hold > > > the 4M of data at one site. 5 seconds later, it's allocated the > > > blocks at both > > > sites and thus shows the full 8M needed for 2 copies. > > > > > > I've also seen (but haven't replicated it as I write this) a small > > > file (4-8K > > > or so) showing first one full-sized block, then a second full-sized > > > block, and > > > then dropping back to what's needed for 2 1/32nd fragments. That > > had me > > > scratching my head > > > > > > Having said that, that's all metadata fun and games, while your case > > > appears to have some problems with data integrity (which is a whole > > lot > > > scarier). It would be *really* nice if we understood the problem > > here. > > > > > > The scariest part is: > > > > > > > The first grep | wc -l returns 1, because grep outputs "Binary > > file /path/to/ > > > > gpfs/mount/test matches" > > > > > > which seems to be implying that we're failing on semantic > > consistency. > > > Basically, your 'cat' command is completing and closing the file, > > > but then a > > > temporally later open of the same find is reading something other > > > that only the > > > just-written data. My first guess is that it's a race condition > > > similar to the > > > following: The cat command is causing a write on one NSD server, and > > > the first > > > grep results in a read from a *different* NSD server, returning the > > > data that > > > *used* to be in the block because the read actually happens before > > > the first > > > NSD server actually completes the write. > > > > > > It may be interesting to replace the grep's with pairs of 'ls -ls / > > > dd' commands to grab the > > > raw data and its size, and check the following: > > > > > > 1) does the size (both blocks allocated and logical length) reported > > by > > > ls match the amount of data actually read by the dd? > > > > > > 2) Is the file length as actually read equal to the written length, > > > or does it > > > overshoot and read all the way to the next block boundary? > > > > > > 3) If the length is correct, what's wrong with the data that's > > > telling grep that > > > it's a binary file? ( od -cx is your friend here). > > > > > > 4) If it overshoots, is the remainder all-zeros (good) or does it > > > return semi-random > > > "what used to be there" data (bad, due to data exposure issues)? > > > > > > (It's certainly not the most perplexing data consistency issue I've > > > hit in 4 decades - the > > > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > > > cluster that > > > had us, IBM, SGI, DDN, and at least one vendor of networking gear > > > all chasing our > > > tails for 18 months before we finally tracked it down. :) > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > > > gpfsug-discuss at spectrumscale.org urldefense.proofpoint.com/v2/url? > u=http-3A__spectrumscale.org&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=jUBFb8C9yai1TUTu1BVnNTNcOnJXGxupWiEKkEjT4pM&e= > > > > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > > > > > > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > > > > ------------------------------ > > > > Note: This email is for the confidential use of the named addressee(s) > > only and may contain proprietary, confidential or privileged information. > > If you are not the intended recipient, you are hereby notified that any > > review, dissemination or copying of this email is strictly prohibited, and > > to please notify the sender immediately and destroy this email and any > > attachments. Email transmission cannot be guaranteed to be secure or > > error-free. The Company, therefore, does not make any guarantees as to the > > completeness or accuracy of this email or any attachments. This email is > > for informational purposes only and does not constitute a recommendation, > > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > > or perform any type of transaction of a financial product. > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20180214_d62fc203_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=nUcKIKr84CRhS0EbxV5vwjSlEr4p3Wf6Is3EDKvOjJg&e= > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > End of gpfsug-discuss Digest, Vol 73, Issue 36 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Feb 14 21:54:36 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 14 Feb 2018 21:54:36 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> <1518529381.3326.93.camel@strath.ac.uk> Message-ID: <90827aa7-e03c-7f2c-229a-c9db4c7dc8be@strath.ac.uk> On 13/02/18 15:56, Buterbaugh, Kevin L wrote: > Hi JAB, > > OK, let me try one more time to clarify. I?m not naming the vendor ? > they?re a small maker of commodity storage and we?ve been using their > stuff for years and, overall, it?s been very solid. The problem in > this specific case is that a major version firmware upgrade is > required ? if the controllers were only a minor version apart we > could do it live. > That makes more sense, but still do tell which vendor so I can avoid them. It's 2018 I expect never to need to take my storage down for *ANY* firmware upgrade *EVER* - period. Any vendor that falls short of that needs to go on my naughty list, for specific checking that this is no longer the case before I ever purchase any of their kit. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan at buzzard.me.uk Wed Feb 14 21:47:38 2018 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 14 Feb 2018 21:47:38 +0000 Subject: [gpfsug-discuss] Scale 5, filesystem guidelines In-Reply-To: References: Message-ID: On 14/02/18 20:59, Ray Coetzee wrote: > Since Scale 5.0 was released I've not seen much guidelines provided on > how to make the best of the new filesystem layout. > > For example, is dedicated metadata SSD's still recommended or does the > Scale 5 improvements mean we can just do metadata and data?pools now? > > I'd be interested to?hear of anyone's experience so far. > Well given metadata performance is heavily related to random IO performance I would suspect that dedicated metadata SSD's are still recommended. That is unless you have an all SSD based file system :-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From kkr at lbl.gov Thu Feb 15 01:47:26 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 14 Feb 2018 17:47:26 -0800 Subject: [gpfsug-discuss] RDMA data from Zimon Message-ID: Hi, Can one of the IBMers tell me if port_xmit_data and port_rcv_data from Zimon can be interpreted as RDMA Bytes/sec? Ideally, also how this data is being collected? I?m looking here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1hlp_monnetworksmetrics.htm But then I also look here: https://community.mellanox.com/docs/DOC-2751 and see "Total number of data octets, divided by 4 (lanes), received on all VLs. This is 64 bit counter.? So I wasn?t sure if some multiplication by 4 was in order. Please advise. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Feb 15 09:28:42 2018 From: john.hearns at asml.com (John Hearns) Date: Thu, 15 Feb 2018 09:28:42 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions In-Reply-To: <9C726F78-D870-4E1E-92B6-96F495F53D54@vanderbilt.edu> References: <9C726F78-D870-4E1E-92B6-96F495F53D54@vanderbilt.edu> Message-ID: Simon, Kevin Thankyou for your responses. Simon, indeed we do see this behavior on AFM filesets which have an ?old? view ? and we can watch the AFM fileset change as the information is updated. In this case, this filesystem is not involved with AFM. I Changed the locking semantics from NFSv4 to Posix and the report is that this has solved the problem. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: Wednesday, February 14, 2018 9:54 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Odd d????????? permissions Hi John, We had a similar incident happen just a week or so ago here, although in our case it was that certain files within a directory showed up with the question marks, while others didn?t. The problem was simply that the node had been run out of RAM and the GPFS daemon couldn?t allocate memory. Killing the offending process(es) and restarting GPFS fixed the issue. We saw hundreds of messages like: 2018-02-07_16:35:13.267-0600: [E] Failed to allocate 92274688 bytes in memory pool, err -1 In the GPFS log when this was happening. HTHAL? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Feb 14, 2018, at 12:38 PM, Simon Thompson (IT Research Support) > wrote: Is it an AFM cache? We see this sort of behaviour occasionally where the cache has an "old" view of the directory. Doing an ls, it evidently goes back to home but by then you already have weird stuff. The next ls is usually fine. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of john.hearns at asml.com [john.hearns at asml.com] Sent: 14 February 2018 09:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] Odd d????????? permissions I am sure this is a known behavior and I am going to feel very foolish in a few minutes? We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9df4b4d88544447ac29608d573da2d51%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636542303262503651&sdata=v6pnBIEvu6lyP3mGkkRX7hSj58H8vvkUl6R%2FCsq6gmc%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Feb 15 09:31:34 2018 From: john.hearns at asml.com (John Hearns) Date: Thu, 15 Feb 2018 09:31:34 +0000 Subject: [gpfsug-discuss] Thankyou - d?????? issue Message-ID: Simon, Kevin Thankyou for your responses. Simon, indeed we do see this behavior on AFM filesets which have an 'old' view - and we can watch the AFM fileset change as the information is updated. In this case, this filesystem is not involved with AFM. I changed the locking semantics from NFSv4 to Posix and the report is that this has solved the problem. Sorry for not replying on the thread. The mailing list software reckons I am not who I say I am. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Thu Feb 15 11:58:05 2018 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Thu, 15 Feb 2018 11:58:05 +0000 Subject: [gpfsug-discuss] Registration open for UK SSUG Message-ID: <8f1e98c75e688acf894fc8bb11fe0335@webmail.gpfsug.org> Dear members, The registration page for the next UK Spectrum Scale user group meeting is now live. We're looking forward to seeing you in London on 18th and 19th April where you will have the opportunity to hear the latest Spectrum Scale updates from filesystem experts as well as hear from other users on their experiences. Similar to previous years, we're also holding smaller interactive workshops to allow for more detailed discussion. Thank you for the kind sponsorship from all our sponsors IBM, DDN, E8, Ellexus, Lenovo, NEC, and OCF without which the event would not be possible. To register, please visit the Eventbrite registration page: https://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-2018-registration-41489952565?aff=MailingList [1] We look forward to seeing you in London! -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org Wireless? Wired connection for presenter (for live demo/webcasting?) Are there cameras in the rooms for webcasting at all? Links: ------ [1] https://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-2018-registration-41489952565?aff=MailingList -------------- next part -------------- An HTML attachment was scrubbed... URL: From agar at us.ibm.com Thu Feb 15 17:08:08 2018 From: agar at us.ibm.com (Eric Agar) Date: Thu, 15 Feb 2018 12:08:08 -0500 Subject: [gpfsug-discuss] RDMA data from Zimon In-Reply-To: References: Message-ID: Kristy, I experimented a bit with this some months ago and looked at the ZIMon source code. I came to the conclusion that ZIMon is reporting values obtained from the IB counters (actually, delta values adjusted for time) and that yes, for port_xmit_data and port_rcv_data, one would need to multiply the values by 4 to make sense of them. To obtain a port_xmit_data value, the ZIMon sensor first looks for /sys/class/infiniband//ports//counters_ext/port_xmit_data_64, and if that is not found then looks for /sys/class/infiniband//ports//counters/port_xmit_data. Similarly for other counters/metrics. Full disclosure: I am not an IB expert nor a ZIMon developer. I hope this helps. Eric M. Agar agar at us.ibm.com From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 02/14/2018 08:47 PM Subject: [gpfsug-discuss] RDMA data from Zimon Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Can one of the IBMers tell me if port_xmit_data and port_rcv_data from Zimon can be interpreted as RDMA Bytes/sec? Ideally, also how this data is being collected? I?m looking here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1hlp_monnetworksmetrics.htm But then I also look here: https://community.mellanox.com/docs/DOC-2751 and see "Total number of data octets, divided by 4 (lanes), received on all VLs. This is 64 bit counter.? So I wasn?t sure if some multiplication by 4 was in order. Please advise. Cheers, Kristy_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=zIRb70L9sx_FvvC9IcWVKLOSOOFnx-hIGfjw0kUN7bw&s=D1g4YTG5WeUiHI3rCPr_kkPxbG9V9E-18UGXBeCvfB8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From G.Horton at bham.ac.uk Fri Feb 16 10:28:48 2018 From: G.Horton at bham.ac.uk (Gareth Horton) Date: Fri, 16 Feb 2018 10:28:48 +0000 Subject: [gpfsug-discuss] Hello Message-ID: <85BF558D-7F13-4059-834E-7D655BD17107@bham.ac.uk> Hi All, A short note to introduce myself to all members My name is Gareth Horton and I work at Birmingham University within the Research Computing 'Architecture, Infrastructure and Systems? team I am new to GPFS and HPC, coming from a general Windows / Unix / Linux sys admin background, before moving into VMware server virtualisation and SAN & NAS storage admin. We use GPFS to provide storage and archiving services to researchers for both traditional HPC and cloud (Openstack) environments I?m currently a GPFS novice and I?m hoping to learn a lot from the experience and knowledge of the group and its members Regards Gareth Horton Architecture, Infrastructure and Systems Research Computing- IT Services Computer Centre G5, Elms Road, University of Birmingham B15 2TT g.horton at bham.ac.uk| www.bear.bham.ac.uk| -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Feb 16 18:17:18 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 16 Feb 2018 10:17:18 -0800 Subject: [gpfsug-discuss] Hello In-Reply-To: <85BF558D-7F13-4059-834E-7D655BD17107@bham.ac.uk> References: <85BF558D-7F13-4059-834E-7D655BD17107@bham.ac.uk> Message-ID: <256528BD-CAAC-4D8B-9DD4-B90992D7EFBC@lbl.gov> Welcome Gareth. As a person coming in with fresh eyes, it would be helpful if you let us know if you run into anything that makes you think ?it would be great if there were ?? ?particular documentation, information about UG events, etc. Thanks, Kristy > On Feb 16, 2018, at 2:28 AM, Gareth Horton wrote: > > Hi All, > > A short note to introduce myself to all members > > My name is Gareth Horton and I work at Birmingham University within the Research Computing 'Architecture, Infrastructure and Systems? team > > I am new to GPFS and HPC, coming from a general Windows / Unix / Linux sys admin background, before moving into VMware server virtualisation and SAN & NAS storage admin. > > We use GPFS to provide storage and archiving services to researchers for both traditional HPC and cloud (Openstack) environments > > I?m currently a GPFS novice and I?m hoping to learn a lot from the experience and knowledge of the group and its members > > Regards > > Gareth Horton > > Architecture, Infrastructure and Systems > Research Computing- IT Services > Computer Centre G5, > Elms Road, University of Birmingham > B15 2TT > g.horton at bham.ac.uk | www.bear.bham.ac.uk | > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.raimbach at googlemail.com Mon Feb 19 12:16:43 2018 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Mon, 19 Feb 2018 12:16:43 +0000 Subject: [gpfsug-discuss] GUI reports erroneous NIC errors Message-ID: Hi GUI whizzes, I have a couple of AFM nodes in my cluster with dual-port MLX cards for RDMA. Only the first port on the card is connected to the fabric and the cluster configuration seems correct to me: # mmlsconfig ---8<--- [nsdNodes] verbsPorts mlx5_1/1 [afm] verbsPorts mlx4_1/1 [afm,nsdNodes] verbsRdma enable --->8--- The cluster is working fine, and the mmlfs.log shows me what I expect, i.e. RDMA connections being made over the correct interfaces. Nevertheless the GUI tells me such lies as "Node Degraded" and "ib_rdma_nic_unrecognised" for the second port on the card (which is not explicitly used). Event details are: Event name: ib_rdma_nic_unrecognized Component: Network Entity type: Node Entity name: afm01 Event time: 19/02/18 12:53:39 Message: IB RDMA NIC mlx4_1/2 was not recognized Description: The specified IB RDMA NIC was not correctly recognized for usage by Spectrum Scale Cause: The specified IB RDMA NIC is not reported in 'mmfsadm dump verbs' User action: N/A Reporting node: afm01 Event type: Active health state of an entity which is monitored by the system. Naturally the GUI is for those who like to see reports and this incorrect entry would likely generate a high volume of unwanted questions from such report viewers. How can I bring the GUI reporting back in line with reality? Thanks, Luke. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Feb 19 14:00:49 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 19 Feb 2018 09:00:49 -0500 Subject: [gpfsug-discuss] Configuration advice In-Reply-To: <20180212151155.GD23944@cefeid.wcss.wroc.pl> References: <20180212151155.GD23944@cefeid.wcss.wroc.pl> Message-ID: As I think you understand we can only provide general guidance as regards your questions. If you want a detailed examination of your requirements and a proposal for a solution you will need to engage the appropriate IBM services team. My personal recommendation is to use as few file systems as possible, preferably just one. The reason is that makes general administration, and storage management, easier. If you do use filesets I suggest you use independent filesets because they offer more administrative control than dependent filesets. As for the number of nodes in the cluster that depends on your requirements for performance and availability. If you do have only 2 then you will need a tiebreaker disk to resolve quorum issues should the network between the nodes have problems. If you intend to continue to use HSM I would suggest you use the GPFS policy engine to drive the migrations because it should be more efficient than using HSM directly. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Pawel Dziekonski To: gpfsug-discuss at spectrumscale.org Date: 02/12/2018 10:18 AM Subject: [gpfsug-discuss] Configuration advice Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I inherited from previous admin 2 separate gpfs machines. All hardware+software is old so I want to switch to new servers, new disk arrays, new gpfs version and new gpfs "design". Each machine has 4 gpfs filesystems and runs a TSM HSM client that migrates data to tapes using separate TSM servers: GPFS+HSM no 1 -> TSM server no 1 -> tapes GPFS+HSM no 2 -> TSM server no 2 -> tapes Migration is done by HSM (not GPFS policies). All filesystems are used for archiving results from HPC system and other files (a kind of backup - don't ask...). Data is written by users via nfs shares. There are 8 nfs mount points corresponding to 8 gpfs filesystems, but there is no real reason for that. 4 filesystems are large and heavily used, 4 remaining are almost not used. The question is how to configure new gpfs infrastructure? My initial impression is that I should create a GPFS cluster of 2+ nodes and export NFS using CES. The most important question is how many filesystem do I need? Maybe just 2 and 8 filesets? Or how to do that in a flexible way and not to lock myself in stupid configuration? any hints? thanks, Pawel ps. I will recall all data and copy it to new infrastructure. Yes, that's the way I want to do that. :) -- Pawel Dziekonski , https://urldefense.proofpoint.com/v2/url?u=http-3A__www.wcss.pl&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=-wyO42O-5SDJQfYoGpqeObZNSlFzduC9mlXhsZb65HI&s=__3QSrBGRtG4Rja-QzbpqALX2o8l-67gtrqePi0NrfE&e= Wroclaw Centre for Networking & Supercomputing, HPC Department _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=-wyO42O-5SDJQfYoGpqeObZNSlFzduC9mlXhsZb65HI&s=32gAuk8HDIPkjMjY4L7DB1tFqmJxeaP4ZWIYA_Ya3ts&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 21 09:01:39 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 09:01:39 +0000 Subject: [gpfsug-discuss] GPFS Downloads Message-ID: Would someone else kindly go to this webpage: https://www.ibm.com/support/home/product/10000060/IBM%20Spectrum%20Scale Click on Downloads then confirm you get a choice of two identical Spectrum Scale products. Neither of which has a version fix level you can select on the check box below. I have tried this in Internet Explorer and Chrome. My apology if this is stupidity on my part, but I really would like to download the latest 4.2.3 version with the APAR we need. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Feb 21 09:23:10 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 21 Feb 2018 09:23:10 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: Message-ID: Same for me. What I normally do is just go straight to Fix Central and navigate from there. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 21 February 2018 09:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] GPFS Downloads Would someone else kindly go to this webpage: https://www.ibm.com/support/home/product/10000060/IBM%20Spectrum%20Scale Click on Downloads then confirm you get a choice of two identical Spectrum Scale products. Neither of which has a version fix level you can select on the check box below. I have tried this in Internet Explorer and Chrome. My apology if this is stupidity on my part, but I really would like to download the latest 4.2.3 version with the APAR we need. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 21 08:54:41 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 08:54:41 +0000 Subject: [gpfsug-discuss] Finding all bulletins and APARs Message-ID: Firstly, let me apologise for not thanking people who hav ereplied to me on this list with help. I have indeed replied and thanked you - however the list software has taken a dislike to my email address. I am currently on the myibm support site. I am looking for a specific APAR on Spectrum Scale. However I want to be able to get a list of all APARs and bulletins for Spectrum Scale, right up to date. I do get email alerts but somehow I suspect I am not getting them all, and it is a pain to search back in your email. Thanks John H -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Feb 21 09:31:25 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 21 Feb 2018 09:31:25 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From anencizo at us.ibm.com Wed Feb 21 17:19:09 2018 From: anencizo at us.ibm.com (Angela Encizo) Date: Wed, 21 Feb 2018 17:19:09 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14571047317701.png Type: image/png Size: 6645 bytes Desc: not available URL: From carlz at us.ibm.com Wed Feb 21 19:54:31 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 21 Feb 2018 19:54:31 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: Message-ID: It does look like that link is broken, thanks for letting us know. If you click on the Menu dropdown at the top of the page that says "Downloads" you'll see a link to Fix Central that takes you to the right place. Carl Zetie Offering Manager for Spectrum Scale, IBM (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From valdis.kletnieks at vt.edu Wed Feb 21 20:20:16 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 21 Feb 2018 15:20:16 -0500 Subject: [gpfsug-discuss] GPFS and Wireshark.. Message-ID: <51481.1519244416@turing-police.cc.vt.edu> Has anybody out there done a Wireshark protocol filter for GPFS? Or know where to find enough documentation of the on-the-wire data formats to write even a basic one? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From juantellez at mx1.ibm.com Wed Feb 21 21:20:44 2018 From: juantellez at mx1.ibm.com (Juan Ignacio Tellez Vilchis) Date: Wed, 21 Feb 2018 21:20:44 +0000 Subject: [gpfsug-discuss] SOBAR restore Message-ID: An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Wed Feb 21 21:23:50 2018 From: lgayne at us.ibm.com (Lyle Gayne) Date: Wed, 21 Feb 2018 16:23:50 -0500 Subject: [gpfsug-discuss] SOBAR restore In-Reply-To: References: Message-ID: April Brown should be able to assist. Lyle From: "Juan Ignacio Tellez Vilchis" To: gpfsug-discuss at spectrumscale.org Date: 02/21/2018 04:21 PM Subject: [gpfsug-discuss] SOBAR restore Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, Is there anybody that has some experience with GPFS filesystem restore using SOBAR? I already back filesystem out using SOBAR, but having some troubles with dsmc restore command. Any help would be appreciated! Juan Ignacio Tellez Vilchis Storage Consultant Lab. Services IBM Systems Hardware Phone: 52-55-5270-3218 | Mobile: 52-55-10160692 IBM E-mail: juantellez at mx1.ibm.com Find me on: LinkedIn: http://mx.linkedin.com/in/Ignaciotellez1and within IBM on: IBM Connections: Alfonso Napoles Gandara https://w3-connections.ibm.com/profiles/html/profileView.do?key=2ce9da3f-33ae-4262-9e22-50433170ea46 3111 Mexico City, DIF 01210 Mexico _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=F5mU6o96aI7N9_U21xmoWIM5YmGNLLIi66Drt1r75UY&s=C_BZnOZwvJjElYiXC-xlyQLCNkoD3tUr4qZ2SdPfxok&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57328677.jpg Type: image/jpeg Size: 518 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57450745.jpg Type: image/jpeg Size: 1208 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57307813.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From john.hearns at asml.com Wed Feb 21 16:11:54 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 16:11:54 +0000 Subject: [gpfsug-discuss] mmfind will not exec Message-ID: I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won't work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +' So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Feb 22 01:26:22 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 21 Feb 2018 20:26:22 -0500 Subject: [gpfsug-discuss] mmfind will not exec In-Reply-To: References: Message-ID: Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 21 16:22:07 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 16:22:07 +0000 Subject: [gpfsug-discuss] mmfind - a ps. Message-ID: Ps. Her is how to get mmfind to run some operation on the files it finds. (I installed mmfind in /usr/local/bin) I find this very hacky, though I suppose it is idiomatic bash #!/bin/bash while read filename do echo -n $filename " " done <<< "`/usr/local/bin/mmfind /hpc/bscratch -type f`" -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ola.Pontusson at kb.se Thu Feb 22 06:23:37 2018 From: Ola.Pontusson at kb.se (Ola Pontusson) Date: Thu, 22 Feb 2018 06:23:37 +0000 Subject: [gpfsug-discuss] SOBAR restore In-Reply-To: References: Message-ID: Hi The SOBAR is documented with Spectrum Scale on IBMs website and if you follow thoose instructions there should be no problem (unless you bump into some of the errors in SOBAR). Have you done your mmimgbackup with TSM and sent the image to TSM and that?s why you try the dsmc restore? The only time I used dsmc restore is if I send the image to TSM. If you don?t send to TSM the image is where you put it and can be moved where you want it. The whole point of SOBAR is to use dsmmigrate so all files as HSM out to TSM not backuped. Just one question, if you do a mmlsfs filesystem ?V which version is your filesystem created with and what level is your Spectrum Scale running where you tries to perform restore? Sincerely, Ola Pontusson IT-Specialist National Library of Sweden Fr?n: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] F?r Juan Ignacio Tellez Vilchis Skickat: den 21 februari 2018 22:21 Till: gpfsug-discuss at spectrumscale.org ?mne: [gpfsug-discuss] SOBAR restore Hello, Is there anybody that has some experience with GPFS filesystem restore using SOBAR? I already back filesystem out using SOBAR, but having some troubles with dsmc restore command. Any help would be appreciated! Juan Ignacio Tellez Vilchis Storage Consultant Lab. Services IBM Systems Hardware ________________________________ Phone: 52-55-5270-3218 | Mobile: 52-55-10160692 E-mail: juantellez at mx1.ibm.com Find me on: [LinkedIn: http://mx.linkedin.com/in/Ignaciotellez1] and within IBM on: [IBM Connections: https://w3-connections.ibm.com/profiles/html/profileView.do?key=2ce9da3f-33ae-4262-9e22-50433170ea46] [IBM] Alfonso Napoles Gandara 3111 Mexico City, DIF 01210 Mexico -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Feb 22 09:01:43 2018 From: john.hearns at asml.com (John Hearns) Date: Thu, 22 Feb 2018 09:01:43 +0000 Subject: [gpfsug-discuss] mmfind will not exec In-Reply-To: References: Message-ID: Stupid me. The space between the {} and \; is significant. /usr/local/bin/mmfind /hpc/bscratch -type f -exec /bin/ls {} \; Still would be nice to have the documentation clarified please. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Thursday, February 22, 2018 2:26 AM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind will not exec Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns > To: gpfsug main discussion list > Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 22 14:20:32 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 22 Feb 2018 09:20:32 -0500 Subject: [gpfsug-discuss] mmfind -ls In-Reply-To: References: Message-ID: Leaving aside the -exec option, and whether you choose classic find or mmfind, why not just use the -ls option - same output, less overhead... mmfind pathname -type f -ls From: John Hearns To: gpfsug main discussion list Cc: "gpfsug-discuss-bounces at spectrumscale.org" Date: 02/22/2018 04:03 AM Subject: Re: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org Stupid me. The space between the {} and \; is significant. /usr/local/bin/mmfind /hpc/bscratch -type f -exec /bin/ls {} \; Still would be nice to have the documentation clarified please. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Thursday, February 22, 2018 2:26 AM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind will not exec Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=77Whh54a5VWNFaaczlMhEzn7B802MGX9m-C2xj4sP1k&s=L4bZlOcrZLwkyth7maRTEmms7Ftarchh_DkBvdTEF7w&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 22 14:27:28 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 22 Feb 2018 09:27:28 -0500 Subject: [gpfsug-discuss] mmfind - Use mmfind ... -xargs In-Reply-To: References: Message-ID: More recent versions of mmfind support an -xargs option... Run mmfind --help and see: -xargs [-L maxlines] [-I rplstr] COMMAND Similar to find ... | xargs [-L x] [-I r] COMMAND but COMMAND executions may run in parallel. This is preferred to -exec. With -xargs mmfind will run the COMMANDs in phase subject to mmapplypolicy options -m, -B, -N. Must be the last option to mmfind This gives you the fully parallelized power of mmapplypolicy without having to write SQL rules nor scripts. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 11:00 PM Subject: [gpfsug-discuss] mmfind - a ps. Sent by: gpfsug-discuss-bounces at spectrumscale.org Ps. Her is how to get mmfind to run some operation on the files it finds. (I installed mmfind in /usr/local/bin) I find this very hacky, though I suppose it is idiomatic bash #!/bin/bash while read filename do echo -n $filename " " done <<< "`/usr/local/bin/mmfind /hpc/bscratch -type f`" -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=vbcae5NoH6gMQCovOqRVJVgj9jJ2USmq47GHxVn6En8&s=F_GqjJRzSzubUSXpcjysWCwCjhVKO9YrbUdzjusY0SY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Thu Feb 22 19:58:48 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 14:58:48 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Message-ID: Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. At the end of this .. the workflow that i am targeting is where: ? If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. ? I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. I have tried LROC too, but it does not behave the same way as i guess AFM works. Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. Regards, Lohit -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Thu Feb 22 20:26:58 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 15:26:58 -0500 Subject: [gpfsug-discuss] GPFS, MMAP and Pagepool Message-ID: Hi all, I wanted to know, how does mmap interact with GPFS pagepool with respect to filesystem block-size? Does the efficiency depend on the mmap read size and the block-size of the filesystem even if all the data is cached in pagepool? GPFS 4.2.3.2 and CentOS7. Here is what i observed: I was testing a user script that uses mmap to read from 100M to 500MB files. The above files are stored on 3 different filesystems. Compute nodes - 10G pagepool and 5G seqdiscardthreshold. 1. 4M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the required files fully cached" from the above GPFS cluster as home. Data and Metadata together on SSDs 3. 16M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs When i run the script first time for ?each" filesystem: I see that GPFS reads from the files, and caches into the pagepool as it reads, from mmdiag -- iohist When i run the second time, i see that there are no IO requests from the compute node to GPFS NSD servers, which is expected since all the data from the 3 filesystems is cached. However - the time taken for the script to run for the files in the 3 different filesystems is different - although i know that they are just "mmapping"/reading from pagepool/cache and not from disk. Here is the difference in time, for IO just from pagepool: 20s 4M block size 15s 1M block size 40S 16M block size. Why do i see a difference when trying to mmap reads from different block-size filesystems, although i see that the IO requests are not hitting disks and just the pagepool? I am willing to share the strace output and mmdiag outputs if needed. Thanks, Lohit -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Feb 22 20:59:27 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 22 Feb 2018 20:59:27 +0000 Subject: [gpfsug-discuss] GPFS, MMAP and Pagepool In-Reply-To: References: Message-ID: Hi Lohit, i am working with ray on a mmap performance improvement right now, which most likely has the same root cause as yours , see --> http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html the thread above is silent after a couple of back and rorth, but ray and i have active communication in the background and will repost as soon as there is something new to share. i am happy to look at this issue after we finish with ray's workload if there is something missing, but first let's finish his, get you try the same fix and see if there is something missing. btw. if people would share their use of MMAP , what applications they use (home grown, just use lmdb which uses mmap under the cover, etc) please let me know so i get a better picture on how wide the usage is with GPFS. i know a lot of the ML/DL workloads are using it, but i would like to know what else is out there i might not think about. feel free to drop me a personal note, i might not reply to it right away, but eventually. thx. sven On Thu, Feb 22, 2018 at 12:33 PM wrote: > Hi all, > > I wanted to know, how does mmap interact with GPFS pagepool with respect > to filesystem block-size? > Does the efficiency depend on the mmap read size and the block-size of the > filesystem even if all the data is cached in pagepool? > > GPFS 4.2.3.2 and CentOS7. > > Here is what i observed: > > I was testing a user script that uses mmap to read from 100M to 500MB > files. > > The above files are stored on 3 different filesystems. > > Compute nodes - 10G pagepool and 5G seqdiscardthreshold. > > 1. 4M block size GPFS filesystem, with separate metadata and data. Data on > Near line and metadata on SSDs > 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the > required files fully cached" from the above GPFS cluster as home. Data and > Metadata together on SSDs > 3. 16M block size GPFS filesystem, with separate metadata and data. Data > on Near line and metadata on SSDs > > When i run the script first time for ?each" filesystem: > I see that GPFS reads from the files, and caches into the pagepool as it > reads, from mmdiag -- iohist > > When i run the second time, i see that there are no IO requests from the > compute node to GPFS NSD servers, which is expected since all the data from > the 3 filesystems is cached. > > However - the time taken for the script to run for the files in the 3 > different filesystems is different - although i know that they are just > "mmapping"/reading from pagepool/cache and not from disk. > > Here is the difference in time, for IO just from pagepool: > > 20s 4M block size > 15s 1M block size > 40S 16M block size. > > Why do i see a difference when trying to mmap reads from different > block-size filesystems, although i see that the IO requests are not hitting > disks and just the pagepool? > > I am willing to share the strace output and mmdiag outputs if needed. > > Thanks, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Feb 22 21:08:06 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 22 Feb 2018 16:08:06 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: References: Message-ID: I do not think AFM is intended to solve the problem you are trying to solve. If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. If that is true that would not be wise especially if you are going to do many metadata operations. I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: valleru at cbio.mskcc.org To: gpfsug main discussion list Date: 02/22/2018 03:11 PM Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. At the end of this .. the workflow that i am targeting is where: ? If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. ? I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. I have tried LROC too, but it does not behave the same way as i guess AFM works. Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. Regards, Lohit _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Thu Feb 22 21:19:08 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 16:19:08 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: References: Message-ID: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> Thank you. I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. Please do let me know, if i understood it wrong. On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , wrote: > I do not think AFM is intended to solve the problem you are trying to solve. ?If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. ?If that is true that would not be wise especially if you are going to do many metadata operations. ?I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. ?You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? ?I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. ?If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact ?1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: ? ? ? ?valleru at cbio.mskcc.org > To: ? ? ? ?gpfsug main discussion list > Date: ? ? ? ?02/22/2018 03:11 PM > Subject: ? ? ? ?[gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > ? > If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. > The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. > ? > > > I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. > I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM works. > > Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. > > Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Feb 22 21:52:01 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 22 Feb 2018 16:52:01 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> Message-ID: My apologies for not being more clear on the flash storage pool. I meant that this would be just another GPFS storage pool in the same cluster, so no separate AFM cache cluster. You would then use the file heat feature to ensure more frequently accessed files are migrated to that all flash storage pool. As for LROC could you please clarify what you mean by a few headers/stubs of the file? In reading the LROC documentation and the LROC variables available in the mmchconfig command I think you might want to take a look a the lrocDataStubFileSize variable since it seems to apply to your situation. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: valleru at cbio.mskcc.org To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Date: 02/22/2018 04:21 PM Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Sent by: gpfsug-discuss-bounces at spectrumscale.org Thank you. I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. Please do let me know, if i understood it wrong. On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , wrote: I do not think AFM is intended to solve the problem you are trying to solve. If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. If that is true that would not be wise especially if you are going to do many metadata operations. I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: valleru at cbio.mskcc.org To: gpfsug main discussion list Date: 02/22/2018 03:11 PM Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. At the end of this .. the workflow that i am targeting is where: ? If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. ? I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. I have tried LROC too, but it does not behave the same way as i guess AFM works. Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. Regards, Lohit _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 23 00:48:12 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 19:48:12 -0500 Subject: [gpfsug-discuss] GPFS, MMAP and Pagepool In-Reply-To: References: Message-ID: Thanks a lot Sven. I was trying out all the scenarios that Ray mentioned, with respect to lroc and all flash GPFS cluster and nothing seemed to be effective. As of now, we are deploying a new test cluster on GPFS 5.0 and it would be good to know the respective features that could be enabled and see if it improves anything. On the other side, i have seen various cases in my past 6 years with GPFS, where different tools do frequently use mmap. This dates back to 2013..?http://www.spectrumscale.org/pipermail/gpfsug-discuss/2013-May/000253.html?when one of my colleagues asked the same question. At that time, it was a homegrown application that was using mmap, along with few other genomic pipelines. An year ago, we had issue with mmap and lot of threads where GPFS would just hang without any traces or logs, which was fixed recently. That was related to relion : https://sbgrid.org/software/titles/relion The issue that we are seeing now is ML/DL workloads, and is related to implementing external tools such as openslide (http://openslide.org/), pytorch (http://pytorch.org/) with field of application being deep learning for thousands of image patches. The IO is really slow when accessed from hard disk, and thus i was trying out other options such as LROC and flash cluster/afm cluster. But everything has a limitation as Ray mentioned. Thanks, Lohit On Feb 22, 2018, 3:59 PM -0500, Sven Oehme , wrote: > Hi Lohit, > > i am working with ray on a mmap performance improvement right now, which most likely has the same root cause as yours , see -->??http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html > the thread above is silent after a couple of back and rorth, but ray and i have active communication in the background and will repost as soon as there is something new to share. > i am happy to look at this issue after we finish with ray's workload if there is something missing, but first let's finish his, get you try the same fix and see if there is something missing. > > btw. if people would share their use of MMAP , what applications they use (home grown, just use lmdb which uses mmap under the cover, etc) please let me know so i get a better picture on how wide the usage is with GPFS. i know a lot of the ML/DL workloads are using it, but i would like to know what else is out there i might not think about. feel free to drop me a personal note, i might not reply to it right away, but eventually. > > thx. sven > > > > On Thu, Feb 22, 2018 at 12:33 PM wrote: > > > Hi all, > > > > > > I wanted to know, how does mmap interact with GPFS pagepool with respect to filesystem block-size? > > > Does the efficiency depend on the mmap read size and the block-size of the filesystem even if all the data is cached in pagepool? > > > > > > GPFS 4.2.3.2 and CentOS7. > > > > > > Here is what i observed: > > > > > > I was testing a user script that uses mmap to read from 100M to 500MB files. > > > > > > The above files are stored on 3 different filesystems. > > > > > > Compute nodes - 10G pagepool and 5G seqdiscardthreshold. > > > > > > 1. 4M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs > > > 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the required files fully cached" from the above GPFS cluster as home. Data and Metadata together on SSDs > > > 3. 16M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs > > > > > > When i run the script first time for ?each" filesystem: > > > I see that GPFS reads from the files, and caches into the pagepool as it reads, from mmdiag -- iohist > > > > > > When i run the second time, i see that there are no IO requests from the compute node to GPFS NSD servers, which is expected since all the data from the 3 filesystems is cached. > > > > > > However - the time taken for the script to run for the files in the 3 different filesystems is different - although i know that they are just "mmapping"/reading from pagepool/cache and not from disk. > > > > > > Here is the difference in time, for IO just from pagepool: > > > > > > 20s 4M block size > > > 15s 1M block size > > > 40S 16M block size. > > > > > > Why do i see a difference when trying to mmap reads from different block-size filesystems, although i see that the IO requests are not hitting disks and just the pagepool? > > > > > > I am willing to share the strace output and mmdiag outputs if needed. > > > > > > Thanks, > > > Lohit > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 23 01:27:58 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 20:27:58 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> Message-ID: <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> Thanks, I will try the file heat feature but i am really not sure, if it would work - since the code can access cold files too, and not necessarily files recently accessed/hot files. With respect to LROC. Let me explain as below: The use case is that - The code initially reads headers (small region of data) from thousands of files as the first step. For example about 30,000 of them with each about 300MB to 500MB in size. After the first step, with the help of those headers - it mmaps/seeks across various regions of a set of files in parallel. Since its all small IOs and it was really slow at reading from GPFS over the network directly from disks - Our idea was to use AFM which i believe fetches all file data into flash/ssds, once the initial few blocks of the files are read. But again - AFM seems to not solve the problem, so i want to know if LROC behaves in the same way as AFM, where all of the file data is prefetched in full block size utilizing all the worker threads ?- if few blocks of the file is read initially. Thanks, Lohit On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale , wrote: > My apologies for not being more clear on the flash storage pool. ?I meant that this would be just another GPFS storage pool in the same cluster, so no separate AFM cache cluster. ?You would then use the file heat feature to ensure more frequently accessed files are migrated to that all flash storage pool. > > As for LROC could you please clarify what you mean by a few headers/stubs of the file? ?In reading the LROC documentation and the LROC variables available in the mmchconfig command I think you might want to take a look a the lrocDataStubFileSize variable since it seems to apply to your situation. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact ?1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: ? ? ? ?valleru at cbio.mskcc.org > To: ? ? ? ?gpfsug main discussion list > Cc: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > Date: ? ? ? ?02/22/2018 04:21 PM > Subject: ? ? ? ?Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > > > > Thank you. > > I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. > I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. > You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? > The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? > I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. > Please do let me know, if i understood it wrong. > > On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , wrote: > I do not think AFM is intended to solve the problem you are trying to solve. ?If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. ?If that is true that would not be wise especially if you are going to do many metadata operations. ?I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. ?You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? ?I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. ?If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact ?1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: ? ? ? ?valleru at cbio.mskcc.org > To: ? ? ? ?gpfsug main discussion list > Date: ? ? ? ?02/22/2018 03:11 PM > Subject: ? ? ? ?[gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > ? > If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. > The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. > ? > > > I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. > I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM works. > > Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. > > Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Feb 23 03:17:26 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 22 Feb 2018 22:17:26 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory Message-ID: I've been exploring the idea for a while of writing a SLURM SPANK plugin to allow users to dynamically change the pagepool size on a node. Every now and then we have some users who would benefit significantly from a much larger pagepool on compute nodes but by default keep it on the smaller side to make as much physmem available as possible to batch work. In testing, though, it seems as though reducing the pagepool doesn't quite release all of the memory. I don't really understand it because I've never before seen memory that was previously resident become un-resident but still maintain the virtual memory allocation. Here's what I mean. Let's take a node with 128G and a 1G pagepool. If I do the following to simulate what might happen as various jobs tweak the pagepool: - tschpool 64G - tschpool 1G - tschpool 32G - tschpool 1G - tschpool 32G I end up with this: mmfsd thinks there's 32G resident but 64G virt # ps -o vsz,rss,comm -p 24397 VSZ RSS COMMAND 67589400 33723236 mmfsd however, linux thinks there's ~100G used # free -g total used free shared buffers cached Mem: 125 100 25 0 0 0 -/+ buffers/cache: 98 26 Swap: 7 0 7 I can jump back and forth between 1G and 32G *after* allocating 64G pagepool and the overall amount of memory in use doesn't balloon but I can't seem to shed that original 64G. I don't understand what's going on... :) Any ideas? This is with Scale 4.2.3.6. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Fri Feb 23 03:24:00 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 22 Feb 2018 22:24:00 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: This is also interesting (although I don't know what it really means). Looking at pmap run against mmfsd I can see what happens after each step: # baseline 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] Total: 1613580K 1191020K 1189650K 1171836K 0K # tschpool 64G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp [anon] Total: 67706636K 67284108K 67282625K 67264920K 0K # tschpool 1G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] Total: 67706636K 1223820K 1222451K 1204632K 0K Even though mmfsd has that 64G chunk allocated there's none of it *used*. I wonder why Linux seems to be accounting it as allocated. -Aaron On 2/22/18 10:17 PM, Aaron Knister wrote: > I've been exploring the idea for a while of writing a SLURM SPANK plugin > to allow users to dynamically change the pagepool size on a node. Every > now and then we have some users who would benefit significantly from a > much larger pagepool on compute nodes but by default keep it on the > smaller side to make as much physmem available as possible to batch work. > > In testing, though, it seems as though reducing the pagepool doesn't > quite release all of the memory. I don't really understand it because > I've never before seen memory that was previously resident become > un-resident but still maintain the virtual memory allocation. > > Here's what I mean. Let's take a node with 128G and a 1G pagepool. > > If I do the following to simulate what might happen as various jobs > tweak the pagepool: > > - tschpool 64G > - tschpool 1G > - tschpool 32G > - tschpool 1G > - tschpool 32G > > I end up with this: > > mmfsd thinks there's 32G resident but 64G virt > # ps -o vsz,rss,comm -p 24397 > ?? VSZ?? RSS COMMAND > 67589400 33723236 mmfsd > > however, linux thinks there's ~100G used > > # free -g > ???????????? total?????? used?????? free???? shared??? buffers???? cached > Mem:?????????? 125??????? 100???????? 25????????? 0????????? 0????????? 0 > -/+ buffers/cache:???????? 98???????? 26 > Swap:??????????? 7????????? 0????????? 7 > > I can jump back and forth between 1G and 32G *after* allocating 64G > pagepool and the overall amount of memory in use doesn't balloon but I > can't seem to shed that original 64G. > > I don't understand what's going on... :) Any ideas? This is with Scale > 4.2.3.6. > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From john.hearns at asml.com Fri Feb 23 09:37:08 2018 From: john.hearns at asml.com (John Hearns) Date: Fri, 23 Feb 2018 09:37:08 +0000 Subject: [gpfsug-discuss] mmfind -ls In-Reply-To: References: Message-ID: Hi. I hope this reply comes through. I often get bounced when replying here. In fact the reason is because I am not running ls. This was just an example. I am running mmgetlocation to get the chunks allocation on each NSD of a file. Secondly my problem is that a space is needed: mmfind /mountpoint -type f -exec mmgetlocation -D myproblemnsd -f {} \; Note the space before the \ TO my shame this is the same as in the normal find command From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, February 22, 2018 3:21 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind -ls Leaving aside the -exec option, and whether you choose classic find or mmfind, why not just use the -ls option - same output, less overhead... mmfind pathname -type f -ls From: John Hearns > To: gpfsug main discussion list > Cc: "gpfsug-discuss-bounces at spectrumscale.org" > Date: 02/22/2018 04:03 AM Subject: Re: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Stupid me. The space between the {} and \; is significant. /usr/local/bin/mmfind /hpc/bscratch -type f -exec /bin/ls {} \; Still would be nice to have the documentation clarified please. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Thursday, February 22, 2018 2:26 AM To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind will not exec Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns > To: gpfsug main discussion list > Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=77Whh54a5VWNFaaczlMhEzn7B802MGX9m-C2xj4sP1k&s=L4bZlOcrZLwkyth7maRTEmms7Ftarchh_DkBvdTEF7w&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Feb 23 14:35:41 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 23 Feb 2018 09:35:41 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: AFAIK you can increase the pagepool size dynamically but you cannot shrink it dynamically. To shrink it you must restart the GPFS daemon. Also, could you please provide the actual pmap commands you executed? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Aaron Knister To: Date: 02/22/2018 10:30 PM Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all memory Sent by: gpfsug-discuss-bounces at spectrumscale.org This is also interesting (although I don't know what it really means). Looking at pmap run against mmfsd I can see what happens after each step: # baseline 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] Total: 1613580K 1191020K 1189650K 1171836K 0K # tschpool 64G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp [anon] Total: 67706636K 67284108K 67282625K 67264920K 0K # tschpool 1G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] Total: 67706636K 1223820K 1222451K 1204632K 0K Even though mmfsd has that 64G chunk allocated there's none of it *used*. I wonder why Linux seems to be accounting it as allocated. -Aaron On 2/22/18 10:17 PM, Aaron Knister wrote: > I've been exploring the idea for a while of writing a SLURM SPANK plugin > to allow users to dynamically change the pagepool size on a node. Every > now and then we have some users who would benefit significantly from a > much larger pagepool on compute nodes but by default keep it on the > smaller side to make as much physmem available as possible to batch work. > > In testing, though, it seems as though reducing the pagepool doesn't > quite release all of the memory. I don't really understand it because > I've never before seen memory that was previously resident become > un-resident but still maintain the virtual memory allocation. > > Here's what I mean. Let's take a node with 128G and a 1G pagepool. > > If I do the following to simulate what might happen as various jobs > tweak the pagepool: > > - tschpool 64G > - tschpool 1G > - tschpool 32G > - tschpool 1G > - tschpool 32G > > I end up with this: > > mmfsd thinks there's 32G resident but 64G virt > # ps -o vsz,rss,comm -p 24397 > VSZ RSS COMMAND > 67589400 33723236 mmfsd > > however, linux thinks there's ~100G used > > # free -g > total used free shared buffers cached > Mem: 125 100 25 0 0 0 > -/+ buffers/cache: 98 26 > Swap: 7 0 7 > > I can jump back and forth between 1G and 32G *after* allocating 64G > pagepool and the overall amount of memory in use doesn't balloon but I > can't seem to shed that original 64G. > > I don't understand what's going on... :) Any ideas? This is with Scale > 4.2.3.6. > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OrZQeEmI6chBdguG-h4YPHsxXZ4gTU3CtIuN4e3ijdY&s=hvVIRG5kB1zom2Iql2_TOagchsgl99juKiZfJt5S1tM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Fri Feb 23 14:44:21 2018 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Fri, 23 Feb 2018 15:44:21 +0100 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> hi all, we had the same idea long ago, afaik the issue we had was due to the pinned memory the pagepool uses when RDMA is enabled. at some point we restarted gpfs on the compute nodes for each job, similar to the way we do swapoff/swapon; but in certain scenarios gpfs really did not like it; so we gave up on it. the other issue that needs to be resolved is that the pagepool needs to be numa aware, so the pagepool is nicely allocated across all numa domains, instead of using the first ones available. otherwise compute jobs might start that only do non-local doamin memeory access. stijn On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: > AFAIK you can increase the pagepool size dynamically but you cannot shrink > it dynamically. To shrink it you must restart the GPFS daemon. Also, > could you please provide the actual pmap commands you executed? > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Aaron Knister > To: > Date: 02/22/2018 10:30 PM > Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all > memory > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > This is also interesting (although I don't know what it really means). > Looking at pmap run against mmfsd I can see what happens after each step: > > # baseline > 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] > Total: 1613580K 1191020K 1189650K 1171836K 0K > > # tschpool 64G > 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp > [anon] > Total: 67706636K 67284108K 67282625K 67264920K 0K > > # tschpool 1G > 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] > 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] > 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] > Total: 67706636K 1223820K 1222451K 1204632K 0K > > Even though mmfsd has that 64G chunk allocated there's none of it > *used*. I wonder why Linux seems to be accounting it as allocated. > > -Aaron > > On 2/22/18 10:17 PM, Aaron Knister wrote: >> I've been exploring the idea for a while of writing a SLURM SPANK plugin > >> to allow users to dynamically change the pagepool size on a node. Every >> now and then we have some users who would benefit significantly from a >> much larger pagepool on compute nodes but by default keep it on the >> smaller side to make as much physmem available as possible to batch > work. >> >> In testing, though, it seems as though reducing the pagepool doesn't >> quite release all of the memory. I don't really understand it because >> I've never before seen memory that was previously resident become >> un-resident but still maintain the virtual memory allocation. >> >> Here's what I mean. Let's take a node with 128G and a 1G pagepool. >> >> If I do the following to simulate what might happen as various jobs >> tweak the pagepool: >> >> - tschpool 64G >> - tschpool 1G >> - tschpool 32G >> - tschpool 1G >> - tschpool 32G >> >> I end up with this: >> >> mmfsd thinks there's 32G resident but 64G virt >> # ps -o vsz,rss,comm -p 24397 >> VSZ RSS COMMAND >> 67589400 33723236 mmfsd >> >> however, linux thinks there's ~100G used >> >> # free -g >> total used free shared buffers > cached >> Mem: 125 100 25 0 0 > 0 >> -/+ buffers/cache: 98 26 >> Swap: 7 0 7 >> >> I can jump back and forth between 1G and 32G *after* allocating 64G >> pagepool and the overall amount of memory in use doesn't balloon but I >> can't seem to shed that original 64G. >> >> I don't understand what's going on... :) Any ideas? This is with Scale >> 4.2.3.6. >> >> -Aaron >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From makaplan at us.ibm.com Fri Feb 23 16:53:26 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 23 Feb 2018 11:53:26 -0500 Subject: [gpfsug-discuss] mmfind -ls, -exec but use -xargs wherever you can. In-Reply-To: References: Message-ID: So much the more reasons to use mmfind ... -xargs ... Which, for large number of files, gives you a very much more performant and parallelized execution of the classic find ... | xargs ... The difference is exec is run in line with the evaluation of the other find conditionals (like -type f) but spawns a new command shell for each evaluation of exec... Whereas -xargs is run after the pathnames of all of the (matching) files are discovered ... Like classic xargs, if your command can take a list of files, you save overhead there BUT -xargs also runs multiple instances of your command in multiple parallel processes on multiple nodes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Feb 23 23:41:52 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 23 Feb 2018 15:41:52 -0800 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th In-Reply-To: References: Message-ID: Agenda work for the US Spring meeting is still underway and in addition to Bob?s request below, let me ask you to comment on what you?d like to hear about from IBM developers, and/or other topics of interest. Even if you can?t attend the event, feel free to contribute ideas as the talks will be posted online after the event. Just reply to the list to generate any follow-on discussion or brainstorming about topics. Best, Kristy Kristy Kallback-Rose Sr HPC Storage Systems Analyst NERSC/LBL > On Feb 8, 2018, at 12:34 PM, Oesterlin, Robert wrote: > > We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! > > I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. > > Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > SSUG Co-principal > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Feb 24 12:01:08 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 24 Feb 2018 12:01:08 +0000 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> Message-ID: On 23/02/18 01:27, valleru at cbio.mskcc.org wrote: > Thanks, I will try the file heat feature but i am really not sure, if it > would work - since the code can access cold files too, and not > necessarily files recently accessed/hot files. > > With respect to LROC. Let me explain as below: > > The use case is that - > The code initially reads headers (small region of data) from thousands > of files as the first step. For example about 30,000 of them with each > about 300MB to 500MB in size. > After the first step, with the help of those headers - it mmaps/seeks > across various regions of a set of files in parallel. > Since its all small IOs and it was really slow at reading from GPFS over > the network directly from disks - Our idea was to use AFM which i > believe fetches all file data into flash/ssds, once the initial few > blocks of the files are read. > But again - AFM seems to not solve the problem, so i want to know if > LROC behaves in the same way as AFM, where all of the file data is > prefetched in full block size utilizing all the worker threads ?- if few > blocks of the file is read initially. > Imagine a single GPFS file system, metadata in SSD, a fast data pool and a slow data pool (fast and slow being really good names to avoid the 8 character nonsense). Now if your fast data pool is appropriately sized then your slow data pool will normally be doing diddly squat. We are talking under 10 I/O's per second. Frankly under 5 I/O's per second is more like it from my experience. If your slow pool is 8-10PB in size, then it has thousands of spindles in it, and should be able to absorb the start of the job without breaking sweat. For numbers a 7.2K RPM disk can do around 120 random I/O's per second, so using RAID6 and 8TB disks that's 130 LUN's so around 15,000 random I/O's per second spare overhead, more if it's not random. It should take all of around 1-2s to read in those headers. Therefore unless these jobs only run for a few seconds or you have dozens of them starting every minute it should not be an issue. Finally if GPFS is taking ages to read the files over the network, then it sounds like your network needs an upgrade or GPFS needs tuning which may or may not require a larger fast storage pool. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From aaron.s.knister at nasa.gov Sun Feb 25 16:45:10 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 25 Feb 2018 11:45:10 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: <65453649-77df-2efa-8776-eb2775ca9efa@nasa.gov> Hmm...interesting. It sure seems to try :) The pmap command was this: pmap $(pidof mmfsd) | sort -n -k3 | tail -Aaron On 2/23/18 9:35 AM, IBM Spectrum Scale wrote: > AFAIK you can increase the pagepool size dynamically but you cannot > shrink it dynamically. ?To shrink it you must restart the GPFS daemon. > Also, could you please provide the actual pmap commands you executed? > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks Forum > at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please contact > ?1-800-237-5511 in the United States or your local IBM Service Center > in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Aaron Knister > To: > Date: 02/22/2018 10:30 PM > Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all memory > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > This is also interesting (although I don't know what it really means). > Looking at pmap run against mmfsd I can see what happens after each step: > > # baseline > 00007fffe4639000 ?59164K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 00007fffd837e000 ?61960K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 0000020000000000 1048576K 1048576K 1048576K 1048576K ? ? ?0K rwxp [anon] > Total: ? ? ? ? ? 1613580K 1191020K 1189650K 1171836K ? ? ?0K > > # tschpool 64G > 00007fffe4639000 ?59164K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 00007fffd837e000 ?61960K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 0000020000000000 67108864K 67108864K 67108864K 67108864K ?0K rwxp [anon] > Total: ? ? ? ? ? 67706636K 67284108K 67282625K 67264920K ? ? ?0K > > # tschpool 1G > 00007fffe4639000 ?59164K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 00007fffd837e000 ?61960K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 0000020001400000 139264K 139264K 139264K 139264K ? ? ?0K rwxp [anon] > 0000020fc9400000 897024K 897024K 897024K 897024K ? ? ?0K rwxp [anon] > 0000020009c00000 66052096K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K rwxp [anon] > Total: ? ? ? ? ? 67706636K 1223820K 1222451K 1204632K ? ? ?0K > > Even though mmfsd has that 64G chunk allocated there's none of it > *used*. I wonder why Linux seems to be accounting it as allocated. > > -Aaron > > On 2/22/18 10:17 PM, Aaron Knister wrote: > > I've been exploring the idea for a while of writing a SLURM SPANK plugin > > to allow users to dynamically change the pagepool size on a node. Every > > now and then we have some users who would benefit significantly from a > > much larger pagepool on compute nodes but by default keep it on the > > smaller side to make as much physmem available as possible to batch work. > > > > In testing, though, it seems as though reducing the pagepool doesn't > > quite release all of the memory. I don't really understand it because > > I've never before seen memory that was previously resident become > > un-resident but still maintain the virtual memory allocation. > > > > Here's what I mean. Let's take a node with 128G and a 1G pagepool. > > > > If I do the following to simulate what might happen as various jobs > > tweak the pagepool: > > > > - tschpool 64G > > - tschpool 1G > > - tschpool 32G > > - tschpool 1G > > - tschpool 32G > > > > I end up with this: > > > > mmfsd thinks there's 32G resident but 64G virt > > # ps -o vsz,rss,comm -p 24397 > > ??? VSZ?? RSS COMMAND > > 67589400 33723236 mmfsd > > > > however, linux thinks there's ~100G used > > > > # free -g > > total?????? used free???? shared??? buffers cached > > Mem:?????????? 125 100???????? 25 0????????? 0 0 > > -/+ buffers/cache: 98???????? 26 > > Swap: 7????????? 0 7 > > > > I can jump back and forth between 1G and 32G *after* allocating 64G > > pagepool and the overall amount of memory in use doesn't balloon but I > > can't seem to shed that original 64G. > > > > I don't understand what's going on... :) Any ideas? This is with Scale > > 4.2.3.6. > > > > -Aaron > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OrZQeEmI6chBdguG-h4YPHsxXZ4gTU3CtIuN4e3ijdY&s=hvVIRG5kB1zom2Iql2_TOagchsgl99juKiZfJt5S1tM&e= > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Sun Feb 25 16:54:06 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 25 Feb 2018 11:54:06 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> References: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> Message-ID: Hi Stijn, Thanks for sharing your experiences-- I'm glad I'm not the only one whose had the idea (and come up empty handed). About the pagpool and numa awareness, I'd remembered seeing something about that somewhere and I did some googling and found there's a parameter called numaMemoryInterleave that "starts mmfsd with numactl --interleave=all". Do you think that provides the kind of numa awareness you're looking for? -Aaron On 2/23/18 9:44 AM, Stijn De Weirdt wrote: > hi all, > > we had the same idea long ago, afaik the issue we had was due to the > pinned memory the pagepool uses when RDMA is enabled. > > at some point we restarted gpfs on the compute nodes for each job, > similar to the way we do swapoff/swapon; but in certain scenarios gpfs > really did not like it; so we gave up on it. > > the other issue that needs to be resolved is that the pagepool needs to > be numa aware, so the pagepool is nicely allocated across all numa > domains, instead of using the first ones available. otherwise compute > jobs might start that only do non-local doamin memeory access. > > stijn > > On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: >> AFAIK you can increase the pagepool size dynamically but you cannot shrink >> it dynamically. To shrink it you must restart the GPFS daemon. Also, >> could you please provide the actual pmap commands you executed? >> >> Regards, The Spectrum Scale (GPFS) team >> >> ------------------------------------------------------------------------------------------------------------------ >> If you feel that your question can benefit other users of Spectrum Scale >> (GPFS), then please post it to the public IBM developerWroks Forum at >> https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 >> . >> >> If your query concerns a potential software error in Spectrum Scale (GPFS) >> and you have an IBM software maintenance contract please contact >> 1-800-237-5511 in the United States or your local IBM Service Center in >> other countries. >> >> The forum is informally monitored as time permits and should not be used >> for priority messages to the Spectrum Scale (GPFS) team. >> >> >> >> From: Aaron Knister >> To: >> Date: 02/22/2018 10:30 PM >> Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all >> memory >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> This is also interesting (although I don't know what it really means). >> Looking at pmap run against mmfsd I can see what happens after each step: >> >> # baseline >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] >> 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] >> Total: 1613580K 1191020K 1189650K 1171836K 0K >> >> # tschpool 64G >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] >> 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp >> [anon] >> Total: 67706636K 67284108K 67282625K 67264920K 0K >> >> # tschpool 1G >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] >> 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] >> 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] >> 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] >> Total: 67706636K 1223820K 1222451K 1204632K 0K >> >> Even though mmfsd has that 64G chunk allocated there's none of it >> *used*. I wonder why Linux seems to be accounting it as allocated. >> >> -Aaron >> >> On 2/22/18 10:17 PM, Aaron Knister wrote: >>> I've been exploring the idea for a while of writing a SLURM SPANK plugin >> >>> to allow users to dynamically change the pagepool size on a node. Every >>> now and then we have some users who would benefit significantly from a >>> much larger pagepool on compute nodes but by default keep it on the >>> smaller side to make as much physmem available as possible to batch >> work. >>> >>> In testing, though, it seems as though reducing the pagepool doesn't >>> quite release all of the memory. I don't really understand it because >>> I've never before seen memory that was previously resident become >>> un-resident but still maintain the virtual memory allocation. >>> >>> Here's what I mean. Let's take a node with 128G and a 1G pagepool. >>> >>> If I do the following to simulate what might happen as various jobs >>> tweak the pagepool: >>> >>> - tschpool 64G >>> - tschpool 1G >>> - tschpool 32G >>> - tschpool 1G >>> - tschpool 32G >>> >>> I end up with this: >>> >>> mmfsd thinks there's 32G resident but 64G virt >>> # ps -o vsz,rss,comm -p 24397 >>> VSZ RSS COMMAND >>> 67589400 33723236 mmfsd >>> >>> however, linux thinks there's ~100G used >>> >>> # free -g >>> total used free shared buffers >> cached >>> Mem: 125 100 25 0 0 >> 0 >>> -/+ buffers/cache: 98 26 >>> Swap: 7 0 7 >>> >>> I can jump back and forth between 1G and 32G *after* allocating 64G >>> pagepool and the overall amount of memory in use doesn't balloon but I >>> can't seem to shed that original 64G. >>> >>> I don't understand what's going on... :) Any ideas? This is with Scale >>> 4.2.3.6. >>> >>> -Aaron >>> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Sun Feb 25 16:59:45 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 25 Feb 2018 11:59:45 -0500 Subject: [gpfsug-discuss] [non-nasa source] Re: pagepool shrink doesn't release all memory In-Reply-To: References: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> Message-ID: <79885b2d-947d-4098-89bd-09b764635847@nasa.gov> Oh, and I think you're absolutely right about the rdma interaction. If I stop the infiniband service on a node and try the same exercise again, I can jump between 100G and 1G several times and the free'd memory is actually released. -Aaron On 2/25/18 11:54 AM, Aaron Knister wrote: > Hi Stijn, > > Thanks for sharing your experiences-- I'm glad I'm not the only one > whose had the idea (and come up empty handed). > > About the pagpool and numa awareness, I'd remembered seeing something > about that somewhere and I did some googling and found there's a > parameter called numaMemoryInterleave that "starts mmfsd with numactl > --interleave=all". Do you think that provides the kind of numa awareness > you're looking for? > > -Aaron > > On 2/23/18 9:44 AM, Stijn De Weirdt wrote: >> hi all, >> >> we had the same idea long ago, afaik the issue we had was due to the >> pinned memory the pagepool uses when RDMA is enabled. >> >> at some point we restarted gpfs on the compute nodes for each job, >> similar to the way we do swapoff/swapon; but in certain scenarios gpfs >> really did not like it; so we gave up on it. >> >> the other issue that needs to be resolved is that the pagepool needs to >> be numa aware, so the pagepool is nicely allocated across all numa >> domains, instead of using the first ones available. otherwise compute >> jobs might start that only do non-local doamin memeory access. >> >> stijn >> >> On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: >>> AFAIK you can increase the pagepool size dynamically but you cannot >>> shrink >>> it dynamically.? To shrink it you must restart the GPFS daemon.?? Also, >>> could you please provide the actual pmap commands you executed? >>> >>> Regards, The Spectrum Scale (GPFS) team >>> >>> ------------------------------------------------------------------------------------------------------------------ >>> >>> If you feel that your question can benefit other users of? Spectrum >>> Scale >>> (GPFS), then please post it to the public IBM developerWroks Forum at >>> https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 >>> >>> . >>> >>> If your query concerns a potential software error in Spectrum Scale >>> (GPFS) >>> and you have an IBM software maintenance contract please contact >>> 1-800-237-5511 in the United States or your local IBM Service Center in >>> other countries. >>> >>> The forum is informally monitored as time permits and should not be used >>> for priority messages to the Spectrum Scale (GPFS) team. >>> >>> >>> >>> From:?? Aaron Knister >>> To:???? >>> Date:?? 02/22/2018 10:30 PM >>> Subject:??????? Re: [gpfsug-discuss] pagepool shrink doesn't release all >>> memory >>> Sent by:??????? gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> This is also interesting (although I don't know what it really means). >>> Looking at pmap run against mmfsd I can see what happens after each >>> step: >>> >>> # baseline >>> 00007fffe4639000? 59164K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 00007fffd837e000? 61960K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 0000020000000000 1048576K 1048576K 1048576K 1048576K????? 0K rwxp [anon] >>> Total:?????????? 1613580K 1191020K 1189650K 1171836K????? 0K >>> >>> # tschpool 64G >>> 00007fffe4639000? 59164K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 00007fffd837e000? 61960K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 0000020000000000 67108864K 67108864K 67108864K 67108864K????? 0K rwxp >>> [anon] >>> Total:?????????? 67706636K 67284108K 67282625K 67264920K????? 0K >>> >>> # tschpool 1G >>> 00007fffe4639000? 59164K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 00007fffd837e000? 61960K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 0000020001400000 139264K 139264K 139264K 139264K????? 0K rwxp [anon] >>> 0000020fc9400000 897024K 897024K 897024K 897024K????? 0K rwxp [anon] >>> 0000020009c00000 66052096K????? 0K????? 0K????? 0K????? 0K rwxp [anon] >>> Total:?????????? 67706636K 1223820K 1222451K 1204632K????? 0K >>> >>> Even though mmfsd has that 64G chunk allocated there's none of it >>> *used*. I wonder why Linux seems to be accounting it as allocated. >>> >>> -Aaron >>> >>> On 2/22/18 10:17 PM, Aaron Knister wrote: >>>> I've been exploring the idea for a while of writing a SLURM SPANK >>>> plugin >>> >>>> to allow users to dynamically change the pagepool size on a node. Every >>>> now and then we have some users who would benefit significantly from a >>>> much larger pagepool on compute nodes but by default keep it on the >>>> smaller side to make as much physmem available as possible to batch >>> work. >>>> >>>> In testing, though, it seems as though reducing the pagepool doesn't >>>> quite release all of the memory. I don't really understand it because >>>> I've never before seen memory that was previously resident become >>>> un-resident but still maintain the virtual memory allocation. >>>> >>>> Here's what I mean. Let's take a node with 128G and a 1G pagepool. >>>> >>>> If I do the following to simulate what might happen as various jobs >>>> tweak the pagepool: >>>> >>>> - tschpool 64G >>>> - tschpool 1G >>>> - tschpool 32G >>>> - tschpool 1G >>>> - tschpool 32G >>>> >>>> I end up with this: >>>> >>>> mmfsd thinks there's 32G resident but 64G virt >>>> # ps -o vsz,rss,comm -p 24397 >>>> ???? VSZ?? RSS COMMAND >>>> 67589400 33723236 mmfsd >>>> >>>> however, linux thinks there's ~100G used >>>> >>>> # free -g >>>> ?????????????? total?????? used?????? free???? shared??? buffers >>> cached >>>> Mem:?????????? 125??????? 100???????? 25????????? 0????????? 0 >>> 0 >>>> -/+ buffers/cache:???????? 98???????? 26 >>>> Swap:??????????? 7????????? 0????????? 7 >>>> >>>> I can jump back and forth between 1G and 32G *after* allocating 64G >>>> pagepool and the overall amount of memory in use doesn't balloon but I >>>> can't seem to shed that original 64G. >>>> >>>> I don't understand what's going on... :) Any ideas? This is with Scale >>>> 4.2.3.6. >>>> >>>> -Aaron >>>> >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Sun Feb 25 17:49:38 2018 From: oehmes at gmail.com (Sven Oehme) Date: Sun, 25 Feb 2018 17:49:38 +0000 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> Message-ID: Hi, i guess you saw that in some of my presentations about communication code overhaul. we started in 4.2.X and since then added more and more numa awareness to GPFS. Version 5.0 also has enhancements in this space. sven On Sun, Feb 25, 2018 at 8:54 AM Aaron Knister wrote: > Hi Stijn, > > Thanks for sharing your experiences-- I'm glad I'm not the only one > whose had the idea (and come up empty handed). > > About the pagpool and numa awareness, I'd remembered seeing something > about that somewhere and I did some googling and found there's a > parameter called numaMemoryInterleave that "starts mmfsd with numactl > --interleave=all". Do you think that provides the kind of numa awareness > you're looking for? > > -Aaron > > On 2/23/18 9:44 AM, Stijn De Weirdt wrote: > > hi all, > > > > we had the same idea long ago, afaik the issue we had was due to the > > pinned memory the pagepool uses when RDMA is enabled. > > > > at some point we restarted gpfs on the compute nodes for each job, > > similar to the way we do swapoff/swapon; but in certain scenarios gpfs > > really did not like it; so we gave up on it. > > > > the other issue that needs to be resolved is that the pagepool needs to > > be numa aware, so the pagepool is nicely allocated across all numa > > domains, instead of using the first ones available. otherwise compute > > jobs might start that only do non-local doamin memeory access. > > > > stijn > > > > On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: > >> AFAIK you can increase the pagepool size dynamically but you cannot > shrink > >> it dynamically. To shrink it you must restart the GPFS daemon. Also, > >> could you please provide the actual pmap commands you executed? > >> > >> Regards, The Spectrum Scale (GPFS) team > >> > >> > ------------------------------------------------------------------------------------------------------------------ > >> If you feel that your question can benefit other users of Spectrum > Scale > >> (GPFS), then please post it to the public IBM developerWroks Forum at > >> > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > >> . > >> > >> If your query concerns a potential software error in Spectrum Scale > (GPFS) > >> and you have an IBM software maintenance contract please contact > >> 1-800-237-5511 <(800)%20237-5511> in the United States or your local > IBM Service Center in > >> other countries. > >> > >> The forum is informally monitored as time permits and should not be used > >> for priority messages to the Spectrum Scale (GPFS) team. > >> > >> > >> > >> From: Aaron Knister > >> To: > >> Date: 02/22/2018 10:30 PM > >> Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all > >> memory > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> > >> > >> > >> This is also interesting (although I don't know what it really means). > >> Looking at pmap run against mmfsd I can see what happens after each > step: > >> > >> # baseline > >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > >> 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] > >> Total: 1613580K 1191020K 1189650K 1171836K 0K > >> > >> # tschpool 64G > >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > >> 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp > >> [anon] > >> Total: 67706636K 67284108K 67282625K 67264920K 0K > >> > >> # tschpool 1G > >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > >> 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] > >> 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] > >> 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] > >> Total: 67706636K 1223820K 1222451K 1204632K 0K > >> > >> Even though mmfsd has that 64G chunk allocated there's none of it > >> *used*. I wonder why Linux seems to be accounting it as allocated. > >> > >> -Aaron > >> > >> On 2/22/18 10:17 PM, Aaron Knister wrote: > >>> I've been exploring the idea for a while of writing a SLURM SPANK > plugin > >> > >>> to allow users to dynamically change the pagepool size on a node. Every > >>> now and then we have some users who would benefit significantly from a > >>> much larger pagepool on compute nodes but by default keep it on the > >>> smaller side to make as much physmem available as possible to batch > >> work. > >>> > >>> In testing, though, it seems as though reducing the pagepool doesn't > >>> quite release all of the memory. I don't really understand it because > >>> I've never before seen memory that was previously resident become > >>> un-resident but still maintain the virtual memory allocation. > >>> > >>> Here's what I mean. Let's take a node with 128G and a 1G pagepool. > >>> > >>> If I do the following to simulate what might happen as various jobs > >>> tweak the pagepool: > >>> > >>> - tschpool 64G > >>> - tschpool 1G > >>> - tschpool 32G > >>> - tschpool 1G > >>> - tschpool 32G > >>> > >>> I end up with this: > >>> > >>> mmfsd thinks there's 32G resident but 64G virt > >>> # ps -o vsz,rss,comm -p 24397 > >>> VSZ RSS COMMAND > >>> 67589400 33723236 mmfsd > >>> > >>> however, linux thinks there's ~100G used > >>> > >>> # free -g > >>> total used free shared buffers > >> cached > >>> Mem: 125 100 25 0 0 > >> 0 > >>> -/+ buffers/cache: 98 26 > >>> Swap: 7 0 7 > >>> > >>> I can jump back and forth between 1G and 32G *after* allocating 64G > >>> pagepool and the overall amount of memory in use doesn't balloon but I > >>> can't seem to shed that original 64G. > >>> > >>> I don't understand what's going on... :) Any ideas? This is with Scale > >>> 4.2.3.6. > >>> > >>> -Aaron > >>> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Feb 26 12:20:52 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 26 Feb 2018 20:20:52 +0800 Subject: [gpfsug-discuss] Finding all bulletins and APARs In-Reply-To: References: Message-ID: Hi John, For all Flashes, alerts and bulletins for IBM Spectrum Scale, please check this link: https://www.ibm.com/support/home/search-results/10000060/system_storage/storage_software/software_defined_storage/ibm_spectrum_scale?filter=DC.Type_avl:CT792,CT555,CT755&sortby=-dcdate_sortrange&ct=fab For any other content which you got in the notification, please check this link: https://www.ibm.com/support/home/search-results/10000060/IBM_Spectrum_Scale?docOnly=true&sortby=-dcdate_sortrange&ct=rc Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 05:28 PM Subject: [gpfsug-discuss] Finding all bulletins and APARs Sent by: gpfsug-discuss-bounces at spectrumscale.org Firstly, let me apologise for not thanking people who hav ereplied to me on this list with help. I have indeed replied and thanked you ? however the list software has taken a dislike to my email address. I am currently on the myibm support site. I am looking for a specific APAR on Spectrum Scale. However I want to be able to get a list of all APARs and bulletins for Spectrum Scale, right up to date. I do get email alerts but somehow I suspect I am not getting them all, and it is a pain to search back in your email. Thanks John H -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=v0fVzSMP-N6VctcEcAQKTLJlrvu0WUry8rSo41ia-mY&s=_zoOdAst7NdP-PByM7WrniXyNLofARAf9hayK0BF5rU&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jan.sundermann at kit.edu Mon Feb 26 16:38:46 2018 From: jan.sundermann at kit.edu (Sundermann, Jan Erik (SCC)) Date: Mon, 26 Feb 2018 16:38:46 +0000 Subject: [gpfsug-discuss] Problems with remote mount via routed IB Message-ID: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Dear all we are currently trying to remote mount a file system in a routed Infiniband test setup and face problems with dropped RDMA connections. The setup is the following: - Spectrum Scale Cluster 1 is setup on four servers which are connected to the same infiniband network. Additionally they are connected to a fast ethernet providing ip communication in the network 192.168.11.0/24. - Spectrum Scale Cluster 2 is setup on four additional servers which are connected to a second infiniband network. These servers have IPs on their IB interfaces in the network 192.168.12.0/24. - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated machine. - We have a dedicated IB hardware router connected to both IB subnets. We tested that the routing, both IP and IB, is working between the two clusters without problems and that RDMA is working fine both for internal communication inside cluster 1 and cluster 2 When trying to remote mount a file system from cluster 1 in cluster 2, RDMA communication is not working as expected. Instead we see error messages on the remote host (cluster 2) 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 1 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 0 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 2 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 and in the cluster with the file system (cluster 1) 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 Any advice on how to configure the setup in a way that would allow the remote mount via routed IB would be very appreciated. Thank you and best regards Jan Erik -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5252 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Feb 26 19:16:34 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 26 Feb 2018 14:16:34 -0500 Subject: [gpfsug-discuss] Problems with remote mount via routed IB In-Reply-To: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> References: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Message-ID: Hi Jan Erik, It was my understanding that the IB hardware router required RDMA CM to work. By default GPFS doesn't use the RDMA Connection Manager but it can be enabled (e.g. verbsRdmaCm=enable). I think this requires a restart on clients/servers (in both clusters) to take effect. Maybe someone else on the list can comment in more detail-- I've been told folks have successfully deployed IB routers with GPFS. -Aaron On 2/26/18 11:38 AM, Sundermann, Jan Erik (SCC) wrote: > > Dear all > > we are currently trying to remote mount a file system in a routed Infiniband test setup and face problems with dropped RDMA connections. The setup is the following: > > - Spectrum Scale Cluster 1 is setup on four servers which are connected to the same infiniband network. Additionally they are connected to a fast ethernet providing ip communication in the network 192.168.11.0/24. > > - Spectrum Scale Cluster 2 is setup on four additional servers which are connected to a second infiniband network. These servers have IPs on their IB interfaces in the network 192.168.12.0/24. > > - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated machine. > > - We have a dedicated IB hardware router connected to both IB subnets. > > > We tested that the routing, both IP and IB, is working between the two clusters without problems and that RDMA is working fine both for internal communication inside cluster 1 and cluster 2 > > When trying to remote mount a file system from cluster 1 in cluster 2, RDMA communication is not working as expected. Instead we see error messages on the remote host (cluster 2) > > > 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 > 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 > 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 1 > 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 > 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1 > 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 0 > 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 > 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0 > 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 2 > 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 > 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 > 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > > > and in the cluster with the file system (cluster 1) > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > > > > Any advice on how to configure the setup in a way that would allow the remote mount via routed IB would be very appreciated. > > > Thank you and best regards > Jan Erik > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From john.hearns at asml.com Tue Feb 27 09:17:36 2018 From: john.hearns at asml.com (John Hearns) Date: Tue, 27 Feb 2018 09:17:36 +0000 Subject: [gpfsug-discuss] Problems with remote mount via routed IB In-Reply-To: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> References: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Message-ID: Jan Erik, Can you clarify if you are routing IP traffic between the two Infiniband networks. Or are you routing Infiniband traffic? If I can be of help I manage an Infiniband network which connects to other IP networks using Mellanox VPI gateways, which proxy arp between IB and Ethernet. But I am not running GPFS traffic over these. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sundermann, Jan Erik (SCC) Sent: Monday, February 26, 2018 5:39 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Problems with remote mount via routed IB Dear all we are currently trying to remote mount a file system in a routed Infiniband test setup and face problems with dropped RDMA connections. The setup is the following: - Spectrum Scale Cluster 1 is setup on four servers which are connected to the same infiniband network. Additionally they are connected to a fast ethernet providing ip communication in the network 192.168.11.0/24. - Spectrum Scale Cluster 2 is setup on four additional servers which are connected to a second infiniband network. These servers have IPs on their IB interfaces in the network 192.168.12.0/24. - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated machine. - We have a dedicated IB hardware router connected to both IB subnets. We tested that the routing, both IP and IB, is working between the two clusters without problems and that RDMA is working fine both for internal communication inside cluster 1 and cluster 2 When trying to remote mount a file system from cluster 1 in cluster 2, RDMA communication is not working as expected. Instead we see error messages on the remote host (cluster 2) 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 1 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 0 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 2 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 and in the cluster with the file system (cluster 1) 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 Any advice on how to configure the setup in a way that would allow the remote mount via routed IB would be very appreciated. Thank you and best regards Jan Erik -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From alex at calicolabs.com Tue Feb 27 22:25:30 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Tue, 27 Feb 2018 14:25:30 -0800 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> Message-ID: Hi, My experience has been that you could spend the same money to just make your main pool more performant. Instead of doing two data transfers (one from cold pool to AFM or hot pools, one from AFM/hot to client), you can just make the direct access of the data faster by adding more resources to your main pool. Regards, Alex On Thu, Feb 22, 2018 at 5:27 PM, wrote: > Thanks, I will try the file heat feature but i am really not sure, if it > would work - since the code can access cold files too, and not necessarily > files recently accessed/hot files. > > With respect to LROC. Let me explain as below: > > The use case is that - > The code initially reads headers (small region of data) from thousands of > files as the first step. For example about 30,000 of them with each about > 300MB to 500MB in size. > After the first step, with the help of those headers - it mmaps/seeks > across various regions of a set of files in parallel. > Since its all small IOs and it was really slow at reading from GPFS over > the network directly from disks - Our idea was to use AFM which i believe > fetches all file data into flash/ssds, once the initial few blocks of the > files are read. > But again - AFM seems to not solve the problem, so i want to know if LROC > behaves in the same way as AFM, where all of the file data is prefetched in > full block size utilizing all the worker threads - if few blocks of the > file is read initially. > > Thanks, > Lohit > > On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale , > wrote: > > My apologies for not being more clear on the flash storage pool. I meant > that this would be just another GPFS storage pool in the same cluster, so > no separate AFM cache cluster. You would then use the file heat feature to > ensure more frequently accessed files are migrated to that all flash > storage pool. > > As for LROC could you please clarify what you mean by a few headers/stubs > of the file? In reading the LROC documentation and the LROC variables > available in the mmchconfig command I think you might want to take a look a > the lrocDataStubFileSize variable since it seems to apply to your situation. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: valleru at cbio.mskcc.org > To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org > Date: 02/22/2018 04:21 PM > Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > storage > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Thank you. > > I am sorry if i was not clear, but the metadata pool is all on SSDs in the > GPFS clusters that we use. Its just the data pool that is on Near-Line > Rotating disks. > I understand that AFM might not be able to solve the issue, and I will try > and see if file heat works for migrating the files to flash tier. > You mentioned an all flash storage pool for heavily used files - so you > mean a different GPFS cluster just with flash storage, and to manually copy > the files to flash storage whenever needed? > The IO performance that i am talking is prominently for reads, so you > mention that LROC can work in the way i want it to? that is prefetch all > the files into LROC cache, after only few headers/stubs of data are read > from those files? > I thought LROC only keeps that block of data that is prefetched from the > disk, and will not prefetch the whole file if a stub of data is read. > Please do let me know, if i understood it wrong. > > On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , > wrote: > I do not think AFM is intended to solve the problem you are trying to > solve. If I understand your scenario correctly you state that you are > placing metadata on NL-SAS storage. If that is true that would not be wise > especially if you are going to do many metadata operations. I suspect your > performance issues are partially due to the fact that metadata is being > stored on NL-SAS storage. You stated that you did not think the file heat > feature would do what you intended but have you tried to use it to see if > it could solve your problem? I would think having metadata on SSD/flash > storage combined with a all flash storage pool for your heavily used files > would perform well. If you expect IO usage will be such that there will be > far more reads than writes then LROC should be beneficial to your overall > performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > *https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479* > > . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: valleru at cbio.mskcc.org > To: gpfsug main discussion list > Date: 02/22/2018 03:11 PM > Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage > in front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. > The backend storage will/can be tuned to give out large streaming bandwidth > and enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS > SSD cluster in front end that uses AFM and acts as a cache cluster with the > backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > ? > If the compute nodes read headers of thousands of large files ranging from > 100MB to 1GB, the AFM cluster should be able to bring up enough threads to > bring up all of the files from the backend to the faster SSD/Flash GPFS > cluster. > The working set might be about 100T, at a time which i want to be on a > faster/low latency tier, and the rest of the files to be in slower tier > until they are read by the compute nodes. > ? > > > I do not want to use GPFS policies to achieve the above, is because i am > not sure - if policies could be written in a way, that files are moved from > the slower tier to faster tier depending on how the jobs interact with the > files. > I know that the policies could be written depending on the heat, and > size/format but i don?t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM > cache cluster before the near line storage. However the AFM cluster was > really really slow, It took it about few hours to copy the files from near > line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not > tuned to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM > works. > > Has anyone tried or know if GPFS supports an architecture - where the fast > tier can bring up thousands of threads and copy the files almost > instantly/asynchronously from the slow tier, whenever the jobs from compute > nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be > really fast, as well as the network between the AFM cluster and the backend > cluster. > > Please do also let me know, if the above workflow can be done using GPFS > policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______ > ________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z > 6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From coetzee.ray at gmail.com Tue Feb 27 23:54:17 2018 From: coetzee.ray at gmail.com (Ray Coetzee) Date: Tue, 27 Feb 2018 23:54:17 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 73, Issue 60 In-Reply-To: References: Message-ID: Hi Lohit Using mmap based applications against GPFS has a number of challenges. For me the main challenge is that mmap threads can fragment the IO into multiple strided reads at random offsets which defeats GPFS's attempts in prefetching the file contents. LROC, as the name implies, is only a Local Read Only Cache and functions as an extension of your local page pool on the client. You would only see a performance improvement if the file(s) have been read into the local pagepool on a previous occasion. Depending on the dataset size & the NVMe/SSDs you have for LROC, you could look at using a pre-job to read the file(s) in their entirety on the compute node before the mmap process starts, as this would ensure the relevant data blocks are in the local pagepool or LROC. Another solution I've seen is to stage the dataset into tmpfs. Sven is working on improvements for mmap on GPFS that may make it into a production release so keep an eye out for an update. Kind regards Ray Coetzee On Tue, Feb 27, 2018 at 10:25 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Problems with remote mount via routed IB (John Hearns) > 2. Re: GPFS and Flash/SSD Storage tiered storage (Alex Chekholko) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 27 Feb 2018 09:17:36 +0000 > From: John Hearns > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Problems with remote mount via routed IB > Message-ID: > eurprd02.prod.outlook.com> > > Content-Type: text/plain; charset="us-ascii" > > Jan Erik, > Can you clarify if you are routing IP traffic between the two > Infiniband networks. > Or are you routing Infiniband traffic? > > > If I can be of help I manage an Infiniband network which connects to other > IP networks using Mellanox VPI gateways, which proxy arp between IB and > Ethernet. But I am not running GPFS traffic over these. > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss- > bounces at spectrumscale.org] On Behalf Of Sundermann, Jan Erik (SCC) > Sent: Monday, February 26, 2018 5:39 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Problems with remote mount via routed IB > > > Dear all > > we are currently trying to remote mount a file system in a routed > Infiniband test setup and face problems with dropped RDMA connections. The > setup is the following: > > - Spectrum Scale Cluster 1 is setup on four servers which are connected to > the same infiniband network. Additionally they are connected to a fast > ethernet providing ip communication in the network 192.168.11.0/24. > > - Spectrum Scale Cluster 2 is setup on four additional servers which are > connected to a second infiniband network. These servers have IPs on their > IB interfaces in the network 192.168.12.0/24. > > - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated > machine. > > - We have a dedicated IB hardware router connected to both IB subnets. > > > We tested that the routing, both IP and IB, is working between the two > clusters without problems and that RDMA is working fine both for internal > communication inside cluster 1 and cluster 2 > > When trying to remote mount a file system from cluster 1 in cluster 2, > RDMA communication is not working as expected. Instead we see error > messages on the remote host (cluster 2) > > > 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 2 > 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 3 > 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to > 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 1 > 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 3 > 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 > 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 1 > 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to > 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 0 > 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 > 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 0 > 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to > 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 2 > 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 2 > 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 3 > 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 3 > > > and in the cluster with the file system (cluster 1) > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > > > > Any advice on how to configure the setup in a way that would allow the > remote mount via routed IB would be very appreciated. > > > Thank you and best regards > Jan Erik > > > -- The information contained in this communication and any attachments is > confidential and may be privileged, and is for the sole use of the intended > recipient(s). Any unauthorized review, use, disclosure or distribution is > prohibited. Unless explicitly stated otherwise in the body of this > communication or the attachment thereto (if any), the information is > provided on an AS-IS basis without any express or implied warranties or > liabilities. To the extent you are relying on this information, you are > doing so at your own risk. If you are not the intended recipient, please > notify the sender immediately by replying to this message and destroy all > copies of this message and any attachments. Neither the sender nor the > company/group of companies he or she represents shall be liable for the > proper and complete transmission of the information contained in this > communication, or for any delay in its receipt. > > > ------------------------------ > > Message: 2 > Date: Tue, 27 Feb 2018 14:25:30 -0800 > From: Alex Chekholko > To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org > Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > storage > Message-ID: > mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > My experience has been that you could spend the same money to just make > your main pool more performant. Instead of doing two data transfers (one > from cold pool to AFM or hot pools, one from AFM/hot to client), you can > just make the direct access of the data faster by adding more resources to > your main pool. > > Regards, > Alex > > On Thu, Feb 22, 2018 at 5:27 PM, wrote: > > > Thanks, I will try the file heat feature but i am really not sure, if it > > would work - since the code can access cold files too, and not > necessarily > > files recently accessed/hot files. > > > > With respect to LROC. Let me explain as below: > > > > The use case is that - > > The code initially reads headers (small region of data) from thousands of > > files as the first step. For example about 30,000 of them with each about > > 300MB to 500MB in size. > > After the first step, with the help of those headers - it mmaps/seeks > > across various regions of a set of files in parallel. > > Since its all small IOs and it was really slow at reading from GPFS over > > the network directly from disks - Our idea was to use AFM which i believe > > fetches all file data into flash/ssds, once the initial few blocks of the > > files are read. > > But again - AFM seems to not solve the problem, so i want to know if LROC > > behaves in the same way as AFM, where all of the file data is prefetched > in > > full block size utilizing all the worker threads - if few blocks of the > > file is read initially. > > > > Thanks, > > Lohit > > > > On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale , > > wrote: > > > > My apologies for not being more clear on the flash storage pool. I meant > > that this would be just another GPFS storage pool in the same cluster, so > > no separate AFM cache cluster. You would then use the file heat feature > to > > ensure more frequently accessed files are migrated to that all flash > > storage pool. > > > > As for LROC could you please clarify what you mean by a few headers/stubs > > of the file? In reading the LROC documentation and the LROC variables > > available in the mmchconfig command I think you might want to take a > look a > > the lrocDataStubFileSize variable since it seems to apply to your > situation. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------ > > ------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > > (GPFS), then please post it to the public IBM developerWroks Forum at > > https://www.ibm.com/developerworks/community/ > > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) > > and you have an IBM software maintenance contract please contact > > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > > Service Center in other countries. > > > > The forum is informally monitored as time permits and should not be used > > for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: valleru at cbio.mskcc.org > > To: gpfsug main discussion list > > > Cc: gpfsug-discuss-bounces at spectrumscale.org > > Date: 02/22/2018 04:21 PM > > Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > > storage > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------ > > > > > > > > Thank you. > > > > I am sorry if i was not clear, but the metadata pool is all on SSDs in > the > > GPFS clusters that we use. Its just the data pool that is on Near-Line > > Rotating disks. > > I understand that AFM might not be able to solve the issue, and I will > try > > and see if file heat works for migrating the files to flash tier. > > You mentioned an all flash storage pool for heavily used files - so you > > mean a different GPFS cluster just with flash storage, and to manually > copy > > the files to flash storage whenever needed? > > The IO performance that i am talking is prominently for reads, so you > > mention that LROC can work in the way i want it to? that is prefetch all > > the files into LROC cache, after only few headers/stubs of data are read > > from those files? > > I thought LROC only keeps that block of data that is prefetched from the > > disk, and will not prefetch the whole file if a stub of data is read. > > Please do let me know, if i understood it wrong. > > > > On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , > > wrote: > > I do not think AFM is intended to solve the problem you are trying to > > solve. If I understand your scenario correctly you state that you are > > placing metadata on NL-SAS storage. If that is true that would not be > wise > > especially if you are going to do many metadata operations. I suspect > your > > performance issues are partially due to the fact that metadata is being > > stored on NL-SAS storage. You stated that you did not think the file > heat > > feature would do what you intended but have you tried to use it to see if > > it could solve your problem? I would think having metadata on SSD/flash > > storage combined with a all flash storage pool for your heavily used > files > > would perform well. If you expect IO usage will be such that there will > be > > far more reads than writes then LROC should be beneficial to your overall > > performance. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------ > > ------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > > (GPFS), then please post it to the public IBM developerWroks Forum at > > *https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479* > > forums/html/forum?id=11111111-0000-0000-0000-000000000479> > > . > > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) > > and you have an IBM software maintenance contract please contact > > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > > Service Center in other countries. > > > > The forum is informally monitored as time permits and should not be used > > for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: valleru at cbio.mskcc.org > > To: gpfsug main discussion list > > > Date: 02/22/2018 03:11 PM > > Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > storage > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------ > > > > > > > > Hi All, > > > > I am trying to figure out a GPFS tiering architecture with flash storage > > in front end and near line storage as backend, for Supercomputing > > > > The Backend storage will be a GPFS storage on near line of about 8-10PB. > > The backend storage will/can be tuned to give out large streaming > bandwidth > > and enough metadata disks to make the stat of all these files fast > enough. > > > > I was thinking if it would be possible to use a GPFS flash cluster or > GPFS > > SSD cluster in front end that uses AFM and acts as a cache cluster with > the > > backend GPFS cluster. > > > > At the end of this .. the workflow that i am targeting is where: > > > > > > ? > > If the compute nodes read headers of thousands of large files ranging > from > > 100MB to 1GB, the AFM cluster should be able to bring up enough threads > to > > bring up all of the files from the backend to the faster SSD/Flash GPFS > > cluster. > > The working set might be about 100T, at a time which i want to be on a > > faster/low latency tier, and the rest of the files to be in slower tier > > until they are read by the compute nodes. > > ? > > > > > > I do not want to use GPFS policies to achieve the above, is because i am > > not sure - if policies could be written in a way, that files are moved > from > > the slower tier to faster tier depending on how the jobs interact with > the > > files. > > I know that the policies could be written depending on the heat, and > > size/format but i don?t think thes policies work in a similar way as > above. > > > > I did try the above architecture, where an SSD GPFS cluster acts as an > AFM > > cache cluster before the near line storage. However the AFM cluster was > > really really slow, It took it about few hours to copy the files from > near > > line storage to AFM cache cluster. > > I am not sure if AFM is not designed to work this way, or if AFM is not > > tuned to work as fast as it should. > > > > I have tried LROC too, but it does not behave the same way as i guess AFM > > works. > > > > Has anyone tried or know if GPFS supports an architecture - where the > fast > > tier can bring up thousands of threads and copy the files almost > > instantly/asynchronously from the slow tier, whenever the jobs from > compute > > nodes reads few blocks from these files? > > I understand that with respect to hardware - the AFM cluster should be > > really fast, as well as the network between the AFM cluster and the > backend > > cluster. > > > > Please do also let me know, if the above workflow can be done using GPFS > > policies and be as fast as it is needed to be. > > > > Regards, > > Lohit > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_ > listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s= > AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=* > > listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s= > AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=> > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______ > > ________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_ > iaSHvJObTbx-siA1ZOg&r= > > IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z > > 6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20180227/be7c09c4/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 73, Issue 60 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuartb at 4gh.net Wed Feb 28 17:49:47 2018 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 28 Feb 2018 12:49:47 -0500 (EST) Subject: [gpfsug-discuss] Problems with remote mount via routed IB In-Reply-To: References: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Message-ID: The problem with CM is that it seems to require configuring IP over Infiniband. I'm rather strongly opposed to IP over IB. We did run IPoIB years ago, but pulled it out of our environment as adding unneeded complexity. It requires provisioning IP addresses across the Infiniband infrastructure and possibly adding routers to other portions of the IP infrastructure. It was also confusing some users due to multiple IPs on the compute infrastructure. We have recently been in discussions with a vendor about their support for GPFS over IB and they kept directing us to using CM (which still didn't work). CM wasn't necessary once we found out about the actual problem (we needed the undocumented verbsRdmaUseGidIndexZero configuration option among other things due to their use of SR-IOV based virtual IB interfaces). We don't use routed Infiniband and it might be that CM and IPoIB is required for IB routing, but I doubt it. It sounds like the OP is keeping IB and IP infrastructure separate. Stuart Barkley On Mon, 26 Feb 2018 at 14:16 -0000, Aaron Knister wrote: > Date: Mon, 26 Feb 2018 14:16:34 > From: Aaron Knister > Reply-To: gpfsug main discussion list > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Problems with remote mount via routed IB > > Hi Jan Erik, > > It was my understanding that the IB hardware router required RDMA CM to work. > By default GPFS doesn't use the RDMA Connection Manager but it can be enabled > (e.g. verbsRdmaCm=enable). I think this requires a restart on clients/servers > (in both clusters) to take effect. Maybe someone else on the list can comment > in more detail-- I've been told folks have successfully deployed IB routers > with GPFS. > > -Aaron > > On 2/26/18 11:38 AM, Sundermann, Jan Erik (SCC) wrote: > > > > Dear all > > > > we are currently trying to remote mount a file system in a routed Infiniband > > test setup and face problems with dropped RDMA connections. The setup is the > > following: > > > > - Spectrum Scale Cluster 1 is setup on four servers which are connected to > > the same infiniband network. Additionally they are connected to a fast > > ethernet providing ip communication in the network 192.168.11.0/24. > > > > - Spectrum Scale Cluster 2 is setup on four additional servers which are > > connected to a second infiniband network. These servers have IPs on their IB > > interfaces in the network 192.168.12.0/24. > > > > - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated > > machine. > > > > - We have a dedicated IB hardware router connected to both IB subnets. > > > > > > We tested that the routing, both IP and IB, is working between the two > > clusters without problems and that RDMA is working fine both for internal > > communication inside cluster 1 and cluster 2 > > > > When trying to remote mount a file system from cluster 1 in cluster 2, RDMA > > communication is not working as expected. Instead we see error messages on > > the remote host (cluster 2) > > > > > > 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > > 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 2 > > 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to > > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 3 > > 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > > 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to > > 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 1 > > 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 3 > > 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 > > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 > > 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 > > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 1 > > 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to > > 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 0 > > 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 > > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 > > 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 > > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 0 > > 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to > > 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 2 > > 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > > 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 2 > > 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to > > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 3 > > 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 3 > > > > > > and in the cluster with the file system (cluster 1) > > > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > > > > > > > Any advice on how to configure the setup in a way that would allow the > > remote mount via routed IB would be very appreciated. > > > > > > Thank you and best regards > > Jan Erik > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From kkr at lbl.gov Thu Feb 1 18:10:46 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 1 Feb 2018 10:10:46 -0800 Subject: [gpfsug-discuss] Grafana Bridge/OpenTSDB-related question Message-ID: <00D3A984-5CAE-4A17-8948-A3063901701C@lbl.gov> Sorry this is slightly OT from GPFS, but it is an issue I?m bumping up against trying to use Grafana with the IBM-provided OpenTSDB bridge for Zimon stats. My issue is very similar to the one posted here, which comes to a dead end (https://community.grafana.com/t/one-alert-for-group-of-hosts/2090 ) I?d like to use the Grafana alert functionality to monitor for thresholds on individual nodes, NSDs etc. The ugly way to do this would be to add a metric and alert for each node, NSD or whatever I want to watch for threshold crossing. The better way to do this would be to let a query report back the node, NSD, whatever so I can generate an alert such as ?CPU approaching 100% on ? So my question is does anyone have a clever workaround or alternate approach to achieve this goal? Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Feb 2 16:43:51 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 2 Feb 2018 16:43:51 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> Message-ID: The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfosburg at mdanderson.org Fri Feb 2 16:49:36 2018 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Fri, 2 Feb 2018 16:49:36 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> Message-ID: I?ve just reached out to our GPFS architect at IBM. From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Feb 2 17:04:14 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 2 Feb 2018 17:04:14 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Message-ID: <90EF00A9-E89D-48EA-A04B-B069BF81E188@nuance.com> Link takes a bit to be active ? it?s there now. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [EXTERNAL] [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfosburg at mdanderson.org Fri Feb 2 17:03:00 2018 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Fri, 2 Feb 2018 17:03:00 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> Message-ID: <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> The document is now up. From: on behalf of Jonathan Fosburgh Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) I?ve just reached out to our GPFS architect at IBM. From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Feb 2 17:45:36 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 2 Feb 2018 17:45:36 +0000 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> , <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> Message-ID: Good stuff. Thanks all. Get Outlook for Android ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Fosburgh,Jonathan Sent: Friday, February 2, 2018 5:03:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The document is now up. From: on behalf of Jonathan Fosburgh Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) I?ve just reached out to our GPFS architect at IBM. From: on behalf of "Sobey, Richard A" Reply-To: gpfsug main discussion list Date: Friday, February 2, 2018 at 10:44 AM To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) The link goes nowhere ? can anyone point us in the right direction? Thanks Richard From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] Sent: 02 February 2018 16:39 To: Sobey, Richard A Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) Storage IBM My Notifications Check out the IBM Electronic Support IBM Spectrum Scale : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux environments, in which a sparse file may be silently corrupted during archival, resulting in the file being restored incorrectly. Subscribe or Unsubscribe | Feedback | Follow us on Twitter. Your support Notifications display in English by default. Machine translation based on your IBM profile language setting is added if you specify this option in My defaults within My Notifications. (Note: Not all languages are available at this time, and the English version always takes precedence over the machine translated version.) Get help with technical questions on the dW Answers forum To ensure proper delivery please add mynotify at stg.events.ihost.com to your address book. You received this email because you are subscribed to IBM My Notifications as: r.sobey at imperial.ac.uk Please do not reply to this message as it is generated by an automated service machine. ?International Business Machines Corporation 2018. All rights reserved. IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From SAnderson at convergeone.com Fri Feb 2 19:59:14 2018 From: SAnderson at convergeone.com (Shaun Anderson) Date: Fri, 2 Feb 2018 19:59:14 +0000 Subject: [gpfsug-discuss] In place upgrade of ESS? Message-ID: <1517601554597.83665@convergeone.com> I haven't found a firm answer yet. Is it possible to in place upgrade say, a GL2 to a GL4 and subsequently a GL6? ? Do we know if this feature is coming? SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments hereto may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Feb 2 20:23:36 2018 From: ewahl at osc.edu (Edward Wahl) Date: Fri, 2 Feb 2018 15:23:36 -0500 Subject: [gpfsug-discuss] policy ilm features? In-Reply-To: References: <20180119163803.79fddbeb@osc.edu> Message-ID: <20180202152336.03e8bab7@osc.edu> Thanks John, this was the path I was HOPING to go down as I do similar things already, but there appears to be no extended attribute in ILM for what I want. Data block replication flag exists in the ILM, but not MetaData, or balance. Yet these states ARE reported by mmlsattr, so there must be a flag somewhere. bad MD replication & balance example: mmlsattr -L /fs/scratch/sysp/ed/180days.pol file name: /fs/scratch/sysp/ed/180days.pol metadata replication: 1 max 2 data replication: 1 max 2 flags: illreplicated,unbalanced Encrypted: yes File next to it for comparison. note proper MD replication and balance. mmlsattr -L /fs/scratch/sysp/ed/120days.pol file name: /fs/scratch/sysp/ed/120days.pol metadata replication: 2 max 2 data replication: 1 max 2 flags: Encrypted: yes misc_attributes flags from a policy run showing no difference in status: FJAEu -- /fs/scratch/sysp/ed/180days.pol FJAEu -- /fs/scratch/sysp/ed/120days.pol File system has MD replication enabled, but not Data, so ALL files show "J" ilm flag mmlsfs scratch -m flag value description ------------------- ------------------------ ----------------------------------- -m 2 Default number of metadata replicas mmlsfs scratch -r flag value description ------------------- ------------------------ ----------------------------------- -r 1 Default number of data replicas I poked around a little trying to find out if perhaps using GetXattr would work and show me what I wanted, it does not. All I sem to be able to get is the File Encryption Key. I was hoping perhaps someone had found a cheaper way for this to work rather than hundreds of millions of 'mmlsattr' execs. :-( On the plus side, I've only run across a few of these and all appear to be from before we did the MD replication and re-striping. On the minus, I have NO idea where they are, and they appears to be on both of our filesystems. So several hundred million files to check. Ed On Mon, 22 Jan 2018 08:29:42 +0000 John Hearns wrote: > Ed, > This is not a perfect answer. You need to look at policies for this. I have > been doing something similar recently. > > Something like: > > RULE 'list_file' EXTERNAL LIST 'all-files' EXEC > '/var/mmfs/etc/mmpolicyExec-list' RULE 'listall' list 'all-files' > SHOW( varchar(kb_allocated) || ' ' || varchar(file_size) || ' ' || > varchar(misc_attributes) || ' ' || name || ' ' || fileset_name ) WHERE > REGEX(misc_attributes,'[J]') > > > So this policy shows the kbytes allocates, file size, the miscellaneous > attributes, name and fileset name For all files with miscellaneous > attributes of 'J' which means 'Some data blocks might be ill replicated' > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Edward Wahl > Sent: Friday, January 19, 2018 10:38 PM To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] policy ilm features? > > > This one has been on my list a long time so I figured I'd ask here first > before I open an apar or request an enhancement (most likely). > > Is there a way using the policy engine to determine the following? > > -metadata replication total/current > -unbalanced file > > Looking to catch things like this that stand out on my filesystem without > having to run several hundred million 'mmlsattr's. > > metadata replication: 1 max 2 > flags: unbalanced > > Ed > > > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C056e34c5a8df4d8f10fd08d55f91e73c%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=dnt7vV4TCd68l7fSJnY35eyNM%2B8pNrZElImSZeZbit8%3D&reserved=0 > -- The information contained in this communication and any attachments is > confidential and may be privileged, and is for the sole use of the intended > recipient(s). Any unauthorized review, use, disclosure or distribution is > prohibited. Unless explicitly stated otherwise in the body of this > communication or the attachment thereto (if any), the information is provided > on an AS-IS basis without any express or implied warranties or liabilities. > To the extent you are relying on this information, you are doing so at your > own risk. If you are not the intended recipient, please notify the sender > immediately by replying to this message and destroy all copies of this > message and any attachments. Neither the sender nor the company/group of > companies he or she represents shall be liable for the proper and complete > transmission of the information contained in this communication, or for any > delay in its receipt. _______________________________________________ > gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From S.J.Thompson at bham.ac.uk Fri Feb 2 20:41:42 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 2 Feb 2018 20:41:42 +0000 Subject: [gpfsug-discuss] In place upgrade of ESS? In-Reply-To: <1517601554597.83665@convergeone.com> References: <1517601554597.83665@convergeone.com> Message-ID: If you mean adding storage shelves to increase capacity to an ESS, then no I don't believe it is supported. I think it is supported on the Lenovo DSS-G models, though you have to have a separate DA for each shelf increment so the performance may different between an upgraded Vs complete solution. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of SAnderson at convergeone.com [SAnderson at convergeone.com] Sent: 02 February 2018 19:59 To: gpfsug main discussion list Subject: [gpfsug-discuss] In place upgrade of ESS? I haven't found a firm answer yet. Is it possible to in place upgrade say, a GL2 to a GL4 and subsequently a GL6? ? Do we know if this feature is coming? SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments hereto may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. From aaron.s.knister at nasa.gov Fri Feb 2 20:46:27 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 2 Feb 2018 15:46:27 -0500 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: References: <-2142026518.24060.1517589526829.JavaMail.webinst@w30112> <36B1FD9C-90CF-4C49-8C21-051F7A826E41@mdanderson.org> Message-ID: Has anyone asked for the efix and gotten it? I'm not having much luck so far. -Aaron On 2/2/18 12:45 PM, Sobey, Richard A wrote: > Good stuff. Thanks all. > > Get Outlook for Android > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at spectrumscale.org > on behalf of > Fosburgh,Jonathan > *Sent:* Friday, February 2, 2018 5:03:00 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > ? > > The document is now up. > > ? > > *From: * on behalf of Jonathan > Fosburgh > *Reply-To: *gpfsug main discussion list > *Date: *Friday, February 2, 2018 at 10:59 AM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > > ? > > I?ve just reached out to our GPFS architect at IBM. > > ? > > *From: * on behalf of "Sobey, > Richard A" > *Reply-To: *gpfsug main discussion list > *Date: *Friday, February 2, 2018 at 10:44 AM > *To: *"'gpfsug-discuss at spectrumscale.org'" > > *Subject: *[gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > > ? > > The link goes nowhere ? can anyone point us in the right direction? > > ? > > Thanks > > Richard > > ? > > *From:* IBM My Notifications [mailto:mynotify at stg.events.ihost.com] > *Sent:* 02 February 2018 16:39 > *To:* Sobey, Richard A > *Subject:* FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of > archived sparse files (Linux) (2018.02.02) > > ? > > ? > > *Storage * > > IBM My Notifications > > Check out the *IBM Electronic > Support* > > > > ? > > > > ? > > IBM Spectrum Scale > > > > *: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse > files > (Linux)*** > > > > IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for > Linux environments, in which a sparse file may be silently corrupted > during archival, resulting in the file being restored incorrectly. > > > > ? > > *Subscribe or Unsubscribe*| > *Feedback*| > *Follow us on Twitter*. > > Your support Notifications display in English by default. Machine > translation based on your IBM profile language setting is added if you > specify this option in My defaults within My Notifications. (Note: Not > all languages are available at this time, and the English version always > takes precedence over the machine translated version.) > > > > Get help with technical questions on the dW Answers > forum > > To ensure proper delivery please add > mynotify at stg.events.ihost.comto > your address book. > > You received this email because you are subscribed to IBM My > Notifications as: > r.sobey at imperial.ac.uk** > > Please do not reply to this message as it is generated by an automated > service machine. > > > > > ?International Business Machines Corporation 2018. All rights reserved. > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > ? > > The information contained in this e-mail message may be privileged, > confidential, and/or protected from disclosure. This e-mail message may > contain protected health information (PHI); dissemination of PHI should > comply with applicable federal and state laws. If you are not the > intended recipient, or an authorized representative of the intended > recipient, any further review, disclosure, use, dissemination, > distribution, or copying of this message or any attachment (or the > information contained therein) is strictly prohibited. If you think that > you have received this e-mail message in error, please notify the sender > by return e-mail and delete all references to it and its contents from > your systems. > > The information contained in this e-mail message may be privileged, > confidential, and/or protected from disclosure. This e-mail message may > contain protected health information (PHI); dissemination of PHI should > comply with applicable federal and state laws. If you are not the > intended recipient, or an authorized representative of the intended > recipient, any further review, disclosure, use, dissemination, > distribution, or copying of this message or any attachment (or the > information contained therein) is strictly prohibited. If you think that > you have received this e-mail message in error, please notify the sender > by return e-mail and delete all references to it and its contents from > your systems. > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From ewahl at osc.edu Fri Feb 2 22:17:47 2018 From: ewahl at osc.edu (Edward Wahl) Date: Fri, 2 Feb 2018 17:17:47 -0500 Subject: [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files (Linux) (2018.02.02) In-Reply-To: <90EF00A9-E89D-48EA-A04B-B069BF81E188@nuance.com> References: <90EF00A9-E89D-48EA-A04B-B069BF81E188@nuance.com> Message-ID: <20180202171747.5e7adeb2@osc.edu> Should we even ask if Spectrum Protect (TSM) is affected? Ed On Fri, 2 Feb 2018 17:04:14 +0000 "Oesterlin, Robert" wrote: > Link takes a bit to be active ? it?s there now. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > From: on behalf of "Sobey, Richard > A" Reply-To: gpfsug main discussion list > Date: Friday, February 2, 2018 at 10:44 AM > To: "'gpfsug-discuss at spectrumscale.org'" > Subject: [EXTERNAL] [gpfsug-discuss] FW: FLASH: IBM Spectrum Scale (GPFS): > Undetected corruption of archived sparse files (Linux) (2018.02.02) > > The link goes nowhere ? can anyone point us in the right direction? > > Thanks > Richard > > From: IBM My Notifications [mailto:mynotify at stg.events.ihost.com] > Sent: 02 February 2018 16:39 > To: Sobey, Richard A > Subject: FLASH: IBM Spectrum Scale (GPFS): Undetected corruption of archived > sparse files (Linux) (2018.02.02) > > > > > Storage > > IBM My Notifications > > Check out the IBM Electronic > Support > > > > > > > IBM Spectrum Scale > > : IBM Spectrum Scale (GPFS): Undetected corruption of archived sparse files > (Linux) > > IBM has identified an issue with IBM GPFS and IBM Spectrum Scale for Linux > environments, in which a sparse file may be silently corrupted during > archival, resulting in the file being restored incorrectly. > > > Subscribe or > Unsubscribe > | > Feedback > | Follow us on > Twitter. > > Your support Notifications display in English by default. Machine translation > based on your IBM profile language setting is added if you specify this > option in My defaults within My Notifications. (Note: Not all languages are > available at this time, and the English version always takes precedence over > the machine translated version.) > > > Get help with technical questions on the dW Answers > forum > > To ensure proper delivery please add > mynotify at stg.events.ihost.com to your > address book. > > You received this email because you are subscribed to IBM My Notifications as: > r.sobey at imperial.ac.uk > > Please do not reply to this message as it is generated by an automated > service machine. > > > > > ?International Business Machines Corporation 2018. All rights reserved. > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From duersch at us.ibm.com Sat Feb 3 02:32:49 2018 From: duersch at us.ibm.com (Steve Duersch) Date: Fri, 2 Feb 2018 21:32:49 -0500 Subject: [gpfsug-discuss] In place upgrade of ESS? In-Reply-To: References: Message-ID: This has been on our to-do list for quite some time. We hope to have in place hardware upgrade in 2H2018. Steve Duersch Spectrum Scale IBM Poughkeepsie, New York gpfsug-discuss-bounces at spectrumscale.org wrote on 02/02/2018 03:15:33 PM: > > Message: 2 > Date: Fri, 2 Feb 2018 19:59:14 +0000 > From: Shaun Anderson > To: gpfsug main discussion list > Subject: [gpfsug-discuss] In place upgrade of ESS? > Message-ID: <1517601554597.83665 at convergeone.com> > Content-Type: text/plain; charset="iso-8859-1" > > I haven't found a firm answer yet. Is it possible to in place > upgrade say, a GL2 to a GL4 and subsequently a GL6? > > ? > > Do we know if this feature is coming? > > SHAUN ANDERSON > STORAGE ARCHITECT > O 208.577.2112 > M 214.263.7014 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Sun Feb 4 19:58:39 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Sun, 04 Feb 2018 14:58:39 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? Message-ID: <20180204145839.77101pngtlr3qacv@support.scinet.utoronto.ca> Here is what I found for versions 4 & 3.5: * Maximum Number of Dependent Filesets: 10,000 * Maximum Number of Independent Filesets: 1,000 https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets I'm having some difficulty finding published documentation on limitations for version 5: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm Any hints? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From truongv at us.ibm.com Mon Feb 5 13:20:16 2018 From: truongv at us.ibm.com (Truong Vu) Date: Mon, 5 Feb 2018 08:20:16 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: Message-ID: Hi Jamie, The limits are the same in 5.0.0. We'll look into the FAQ. Thanks, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 02/05/2018 07:00 AM Subject: gpfsug-discuss Digest, Vol 73, Issue 9 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) ---------------------------------------------------------------------- Message: 1 Date: Sun, 04 Feb 2018 14:58:39 -0500 From: "Jaime Pinto" To: "gpfsug main discussion list" Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? Message-ID: <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Here is what I found for versions 4 & 3.5: * Maximum Number of Dependent Filesets: 10,000 * Maximum Number of Independent Filesets: 1,000 https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets I'm having some difficulty finding published documentation on limitations for version 5: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm Any hints? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= End of gpfsug-discuss Digest, Vol 73, Issue 9 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Mon Feb 5 13:50:51 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 05 Feb 2018 08:50:51 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: Message-ID: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca> Thanks Truong Jaime Quoting "Truong Vu" : > > Hi Jamie, > > The limits are the same in 5.0.0. We'll look into the FAQ. > > Thanks, > Tru. > > > > > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 02/05/2018 07:00 AM > Subject: gpfsug-discuss Digest, Vol 73, Issue 9 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 04 Feb 2018 14:58:39 -0500 > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > Message-ID: > <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> > Content-Type: text/plain; charset=ISO-8859-1; > DelSp="Yes"; > format="flowed" > > Here is what I found for versions 4 & 3.5: > * Maximum Number of Dependent Filesets: 10,000 > * Maximum Number of Independent Filesets: 1,000 > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets > > > > I'm having some difficulty finding published documentation on > limitations for version 5: > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm > > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm > > > Any hints? > > Thanks > Jaime > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > > > End of gpfsug-discuss Digest, Vol 73, Issue 9 > ********************************************* > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From daniel.kidger at uk.ibm.com Mon Feb 5 14:19:39 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Mon, 5 Feb 2018 14:19:39 +0000 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca> References: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca>, Message-ID: An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon Feb 5 15:02:17 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 05 Feb 2018 10:02:17 -0500 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca>, Message-ID: <20180205100217.46131a75yav2wi61@support.scinet.utoronto.ca> We are considering moving from user/group based quotas to path based quotas with nested filesets. We also facing challenges to traverse 'Dependent Filesets' for daily TSM backups of projects and for purging scratch area. We're about to deploy a new GPFS storage cluster, some 12-15PB, 13K+ users and 5K+ groups as the baseline, with expected substantial scaling up within the next 3-5 years in all dimmensions. Therefore, decisions we make now under GPFS v4.x trough v5.x will have consequences in the very near future, if they are not the proper ones. Thanks Jaime Quoting "Daniel Kidger" : > Jamie, I believe at least one of those limits is 'maximum supported' > rather than an architectural limit. Is your use case one which > would push these boundaries? If so care to describe what you would > wish to do? Daniel > > [1] > > DR DANIEL KIDGER > IBM Technical Sales Specialist > Software Defined Solution Sales > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Truong Vu" > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > Date: Mon, Feb 5, 2018 2:56 PM > Thanks Truong > Jaime > > Quoting "Truong Vu" : > >> >> Hi Jamie, >> >> The limits are the same in 5.0.0. We'll look into the FAQ. >> >> Thanks, >> Tru. >> >> >> >> >> From: gpfsug-discuss-request at spectrumscale.org >> To: gpfsug-discuss at spectrumscale.org >> Date: 02/05/2018 07:00 AM >> Subject: gpfsug-discuss Digest, Vol 73, Issue 9 >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at spectrumscale.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e=[2] >> >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at spectrumscale.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at spectrumscale.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) >> >> >> > ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Sun, 04 Feb 2018 14:58:39 -0500 >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" > >> Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? >> Message-ID: >> <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> >> Content-Type: text/plain; charset=ISO-8859-1; >> DelSp="Yes"; >> format="flowed" >> >> Here is what I found for versions 4 & 3.5: >> * Maximum Number of Dependent Filesets: 10,000 >> * Maximum Number of Independent Filesets: 1,000 >> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets[3] >> >> >> >> I'm having some difficulty finding published documentation on >> limitations for version 5: >> >> > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm[4] >> >> >> > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm[5] >> >> >> Any hints? >> >> Thanks >> Jaime >> >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e=[6] >> >> >> >> End of gpfsug-discuss Digest, Vol 73, Issue 9 >> ********************************************* >> >> >> >> > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=Dln7axLq9ej2KttpKZJwLKuvxfSDkPErDQI5KCAQcg4&e=[7] > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=ZMGxi-PBv5-WEGj5RFm1QV0K8azswe9Z-C6rE1ey-UQ&e=[8] > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire > PO6 3AU > > > > Links: > ------ > [1] https://www.youracclaim.com/user/danel-kidger > [2] > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > [3] > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets > [4] > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm > [5] > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm > [6] > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > [7] > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=Dln7axLq9ej2KttpKZJwLKuvxfSDkPErDQI5KCAQcg4&e= > [8] > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=ZMGxi-PBv5-WEGj5RFm1QV0K8azswe9Z-C6rE1ey-UQ&e= > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From jtucker at pixitmedia.com Mon Feb 5 16:11:58 2018 From: jtucker at pixitmedia.com (Jez Tucker) Date: Mon, 5 Feb 2018 16:11:58 +0000 Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? In-Reply-To: References: <20180205085051.15436lim3xaw49iz@support.scinet.utoronto.ca> Message-ID: Hi ? IIRC these are hard limits - at least were a year or so ago. I have a customers with ~ 7500 dependent filesets and knocking on the door of the 1000 independent fileset limit. Before independent filesets were 'a thing', projects were created with dependent filesets.? However the arrival of independent filesets, per-fileset snapshotting etc. and improved workflow makes these a per-project primary choice - but with 10x less to operate with :-/ If someone @ IBM fancied upping the #defines x10 and confirming the testing limit, that would be appreciated :-) If you need testing kit, happy to facilitate. Best, Jez On 05/02/18 14:19, Daniel Kidger wrote: > Jamie, > I believe at least one of those limits is 'maximum supported' rather > than an architectural limit. > Is your use case one which would push these?boundaries? ?If so care to > describe what you would wish to do? > Daniel > > IBM Storage Professional Badge > > > *Dr Daniel Kidger* > IBM?Technical Sales Specialist > Software Defined Solution Sales > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" > , "Truong Vu" > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > Date: Mon, Feb 5, 2018 2:56 PM > Thanks Truong > Jaime > > Quoting "Truong Vu" : > > > > > Hi Jamie, > > > > The limits are the same in 5.0.0. ?We'll look into the FAQ. > > > > Thanks, > > Tru. > > > > > > > > > > From: gpfsug-discuss-request at spectrumscale.org > > To: gpfsug-discuss at spectrumscale.org > > Date: 02/05/2018 07:00 AM > > Subject: gpfsug-discuss Digest, Vol 73, Issue 9 > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > > > Send gpfsug-discuss mailing list submissions to > > gpfsug-discuss at spectrumscale.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > > > or, via email, send a message with subject or body 'help' to > > gpfsug-discuss-request at spectrumscale.org > > > > You can reach the person managing the list at > > gpfsug-discuss-owner at spectrumscale.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of gpfsug-discuss digest..." > > > > > > Today's Topics: > > > > ? ?1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) > > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sun, 04 Feb 2018 14:58:39 -0500 > > From: "Jaime Pinto" > > To: "gpfsug main discussion list" > > Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? > > Message-ID: > > <20180204145839.77101pngtlr3qacv at support.scinet.utoronto.ca> > > Content-Type: text/plain; charset=ISO-8859-1; > > DelSp="Yes"; > > format="flowed" > > > > Here is what I found for versions 4 & 3.5: > > * Maximum Number of Dependent Filesets: 10,000 > > * Maximum Number of Independent Filesets: 1,000 > > > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets > > > > > > > > I'm having some difficulty finding published documentation on > > limitations for version 5: > > > > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm > > > > > > > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm > > > > > > Any hints? > > > > Thanks > > Jaime > > > > > > --- > > Jaime Pinto > > SciNet HPC Consortium - Compute/Calcul Canada > > www.scinet.utoronto.ca - www.computecanada.ca > > University of Toronto > > > > > > ---------------------------------------------------------------- > > This message was sent using IMP at SciNet Consortium, University of > > Toronto. > > > > > > > > > > ------------------------------ > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4&s=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s&e= > > > > > > > > End of gpfsug-discuss Digest, Vol 73, Issue 9 > > ********************************************* > > > > > > > > > > > > > > > ?? ? ? ? ?************************************ > ?? ? ? ? ? TELL US ABOUT YOUR SUCCESS STORIES > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=Dln7axLq9ej2KttpKZJwLKuvxfSDkPErDQI5KCAQcg4&e= > ?? ? ? ? ?************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University > of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ&s=ZMGxi-PBv5-WEGj5RFm1QV0K8azswe9Z-C6rE1ey-UQ&e= > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Feb 7 21:28:46 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 7 Feb 2018 16:28:46 -0500 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Message-ID: I noticed something curious after migrating some nodes from 4.1 to 4.2 which is that mounts now can take foorrreeevverrr. It seems to boil down to the point in the mount process where getEFOptions is called. To highlight the difference-- 4.1: # /usr/bin/time /usr/lpp/mmfs/bin/mmcommon getEFOptions dnb02 skipMountPointCheck >/dev/null 0.16user 0.04system 0:00.43elapsed 45%CPU (0avgtext+0avgdata 9108maxresident)k 0inputs+2768outputs (0major+15404minor)pagefaults 0swaps 4.2: /usr/bin/time /usr/lpp/mmfs/bin/mmcommon getEFOptions dnb02 skipMountPointCheck >/dev/null 9.75user 3.79system 0:23.35elapsed 58%CPU (0avgtext+0avgdata 10832maxresident)k 0inputs+38104outputs (0major+3135097minor)pagefaults 0swaps that's uh...a 543x increase. Which, if you have 25+ filesystems and 3500 nodes that time really starts to add up. It looks like under 4.2 this getEFOptions function triggers a bunch of mmsdrfs parsing happens and node lists get generated whereas on 4.1 that doesn't happen. Digging in a little deeper it looks to me like the big difference is in gpfsClusterInit after the node fetches the "shadow" mmsdrs file. Here's a 4.1 node: gpfsClusterInit:mmsdrfsdef.sh[2827]> loginPrefix='' gpfsClusterInit:mmsdrfsdef.sh[2828]> [[ -n '' ]] gpfsClusterInit:mmsdrfsdef.sh[2829]> /usr/bin/scp supersecrethost:/var/mmfs/gen/mmsdrfs /var/mmfs/gen/mmsdrfs.25326 gpfsClusterInit:mmsdrfsdef.sh[2830]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2831]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2863]> [[ -f /var/mmfs/gen/mmsdrfs.25326 ]] gpfsClusterInit:mmsdrfsdef.sh[2867]> /usr/bin/diff /var/mmfs/gen/mmsdrfs.25326 /var/mmfs/gen/mmsdrfs gpfsClusterInit:mmsdrfsdef.sh[2867]> 1> /dev/null 2> /dev/null gpfsClusterInit:mmsdrfsdef.sh[2868]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2869]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2874]> sdrfsFile=/var/mmfs/gen/mmsdrfs gpfsClusterInit:mmsdrfsdef.sh[2875]> /bin/rm -f /var/mmfs/gen/mmsdrfs.25326 Here's a 4.2 node: gpfsClusterInit:mmsdrfsdef.sh[2938]> loginPrefix='' gpfsClusterInit:mmsdrfsdef.sh[2939]> [[ -n '' ]] gpfsClusterInit:mmsdrfsdef.sh[2940]> /usr/bin/scp supersecrethost:/var/mmfs/gen/mmsdrfs /var/mmfs/gen/mmsdrfs.8534 gpfsClusterInit:mmsdrfsdef.sh[2941]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2942]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2974]> /bin/rm -f /var/mmfs/tmp/cmdTmpDir.mmcommon.8534/tmpsdrfs.gpfsClusterInit gpfsClusterInit:mmsdrfsdef.sh[2975]> [[ -f /var/mmfs/gen/mmsdrfs.8534 ]] gpfsClusterInit:mmsdrfsdef.sh[2979]> /usr/bin/diff /var/mmfs/gen/mmsdrfs.8534 /var/mmfs/gen/mmsdrfs gpfsClusterInit:mmsdrfsdef.sh[2979]> 1> /dev/null 2> /dev/null gpfsClusterInit:mmsdrfsdef.sh[2980]> rc=0 gpfsClusterInit:mmsdrfsdef.sh[2981]> [[ 0 -ne 0 ]] gpfsClusterInit:mmsdrfsdef.sh[2986]> sdrfsFile=/var/mmfs/gen/mmsdrfs it looks like the 4.1 code deletes the shadow mmsdrfs file is it's not different from what's locally on the node where as 4.2 does *not* do that. This seems to cause a problem when checkMmfsEnvironment is called because it will return 1 if the shadow file exists which according to the function comments indicates "something is not right", triggering the environment update where the slowdown is incurred. On 4.1 checkMmfsEnvironment returned 0 because the shadow mmsdrfs file had been removed, whereas on 4.2 it returned 1 because the shadow mmsdrfs file still existed despite it being identical to the mmsdrfs on the node. I've looked at 4.2.3.6 (efix12) and it doesn't look like 4.2.3.7 has dropped yet so it may be this has been fixed there. Maybe it's time for a PMR... -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From tortay at cc.in2p3.fr Thu Feb 8 07:08:50 2018 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Thu, 8 Feb 2018 08:08:50 +0100 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: References: Message-ID: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> On 07/02/2018 22:28, Aaron Knister wrote: > I noticed something curious after migrating some nodes from 4.1 to 4.2 > which is that mounts now can take foorrreeevverrr. It seems to boil down > to the point in the mount process where getEFOptions is called. > > To highlight the difference-- > [...] > Hello, I have had this (or a very similar) issue after migrating from 4.1.1.8 to 4.2.3. There are 37 filesystems in our main cluster, which made the problem really noticeable. A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, I'm told, should be released today) actually resolve my problems (APAR IJ03192 & IJ03235). Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From Tomasz.Wolski at ts.fujitsu.com Thu Feb 8 10:35:54 2018 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Thu, 8 Feb 2018 10:35:54 +0000 Subject: [gpfsug-discuss] Inode scan optimization Message-ID: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the "hidden" config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Feb 8 12:44:35 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 8 Feb 2018 07:44:35 -0500 Subject: [gpfsug-discuss] Inode scan optimization In-Reply-To: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: You mention that all the NSDs are metadata and data but you do not say how many NSDs are defined or the type of storage used, that is are these on SAS or NL-SAS storage? I'm assuming they are not on SSDs/flash storage. Have you considered moving the metadata to separate NSDs, preferably SSD/flash storage? This is likely to give you a significant performance boost. You state that using the inode scan API you reduced the time to 40 days. Did you analyze your backup application to determine where the time was being spent for the backup? If the inode scan is a small percentage of your backup time then optimizing it will not provide much benefit. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the ?hidden? config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=y2y22xZuqjpkKfO2WSdcJsBXMaM8hOedaB_AlgFlIb0&s=DL0ZnBuH9KpvKN6XQNvoYmvwfZDbbwMlM-4rCbsAgWo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 8 13:56:42 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Feb 2018 08:56:42 -0500 Subject: [gpfsug-discuss] Inode scan optimization In-Reply-To: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: Recall that many years ago we demonstrated a Billion files scanned with mmapplypolicy in under 20 minutes... And that was on ordinary at the time, spinning disks (not SSD!)... Granted we packed about 1000 files per directory and made some other choices that might not be typical usage.... OTOH storage and nodes have improved since then... SO when you say it takes 60 days to backup 2 billion files and that's a problem.... Like any large computing job, one has to do some analysis to find out what parts of the job are taking how much time... So... what commands are you using to do the backup...? What timing statistics or measurements have you collected? If you are using mmbackup and/or mmapplypolicy, those commands can show you how much time they spend scanning the file system looking for files to backup AND then how much time they spend copying the data to backup media. In fact they operate in distinct phases... directory scan, inode scan, THEN data copying ... so it's straightforward to see which phases are taking how much time. OH... I see you also say you are using gpfs_stat_inode_with_xattrs64 -- These APIs are tricky and not a panacea.... That's why we provide you with mmapplypolicy which in fact uses those APIs in clever, patented ways -- optimized and honed with years of work.... And more recently, we provided you with samples/ilm/mmfind -- which has the functionality of the classic unix find command -- but runs in parallel - using mmapplypolicy. TRY IT on you file system! From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the ?hidden? config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 8 15:33:13 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Feb 2018 10:33:13 -0500 Subject: [gpfsug-discuss] Inode scan optimization - (Tomasz.Wolski@ts.fujitsu.com ) In-Reply-To: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: Please clarify and elaborate .... When you write "a full backup ... takes 60 days" - that seems very poor indeed. BUT you haven't stated how much data is being copied to what kind of backup media nor how much equipment or what types you are using... Nor which backup software... We have Spectrum Scale installation doing nightly backups of huge file systems using the mmbackup command with TivoliStorageManager backup, using IBM branded or approved equipment and software. From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: ? Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the ?hidden? config parameters ? iscanPrefetchAggressiveness 2 ? iscanPrefetchDepth 0 ? iscanPrefetchThreadsPerNode 0 ? Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Feb 8 15:52:22 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 08 Feb 2018 10:52:22 -0500 Subject: [gpfsug-discuss] Inode scan optimization - (Tomasz.Wolski@ts.fujitsu.com ) In-Reply-To: References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> Message-ID: <9124.1518105142@turing-police.cc.vt.edu> On Thu, 08 Feb 2018 10:33:13 -0500, "Marc A Kaplan" said: > Please clarify and elaborate .... When you write "a full backup ... takes > 60 days" - that seems very poor indeed. > BUT you haven't stated how much data is being copied to what kind of > backup media nor how much equipment or what types you are using... Nor > which backup software... > > We have Spectrum Scale installation doing nightly backups of huge file > systems using the mmbackup command with TivoliStorageManager backup, using > IBM branded or approved equipment and software. How long did the *first* TSM backup take? Remember that TSM does the moral equivalent of a 'full' backup at first, and incrementals thereafter. So it's quite possible to have a very large filesystem with little data churn to do incrementals in 5-6 hours, even though the first one took several weeks. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Feb 8 15:59:44 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 8 Feb 2018 15:59:44 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Hi All, We are in a bit of a difficult situation right now with one of our non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are looking for some advice on how to deal with this unfortunate situation. We have a non-IBM FC storage array with dual-?redundant? controllers. One of those controllers is dead and the vendor is sending us a replacement. However, the replacement controller will have mis-matched firmware with the surviving controller and - long story short - the vendor says there is no way to resolve that without taking the storage array down for firmware upgrades. Needless to say there?s more to that story than what I?ve included here, but I won?t bore everyone with unnecessary details. The storage array has 5 NSDs on it, but fortunately enough they are part of our ?capacity? pool ? i.e. the only way a file lands here is if an mmapplypolicy scan moved it there because the *access* time is greater than 90 days. Filesystem data replication is set to one. So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. There?s over 250 TB of data on those 5 NSDs, so restriping off of them or setting replication to two are not options. It is very unlikely that anyone would try to access a file on those NSDs during the hour or so I?d need to do the firmware upgrades, but how would GPFS itself react to those (suspended) disks going away for a while? I?m thinking I could be OK if there was just a way to actually stop them rather than suspend them. Any undocumented options to mmchdisk that I?m not aware of??? Are there other options - besides buying IBM hardware - that I am overlooking? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Feb 8 16:23:33 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 8 Feb 2018 16:23:33 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: Sorry I can?t help? the only thing going round and round my head right now is why on earth the existing controller cannot push the required firmware to the new one when it comes online. Never heard of anything else! Feel free to name and shame so I can avoid ? Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] mmchdisk suspend / stop Hi All, We are in a bit of a difficult situation right now with one of our non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are looking for some advice on how to deal with this unfortunate situation. We have a non-IBM FC storage array with dual-?redundant? controllers. One of those controllers is dead and the vendor is sending us a replacement. However, the replacement controller will have mis-matched firmware with the surviving controller and - long story short - the vendor says there is no way to resolve that without taking the storage array down for firmware upgrades. Needless to say there?s more to that story than what I?ve included here, but I won?t bore everyone with unnecessary details. The storage array has 5 NSDs on it, but fortunately enough they are part of our ?capacity? pool ? i.e. the only way a file lands here is if an mmapplypolicy scan moved it there because the *access* time is greater than 90 days. Filesystem data replication is set to one. So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. There?s over 250 TB of data on those 5 NSDs, so restriping off of them or setting replication to two are not options. It is very unlikely that anyone would try to access a file on those NSDs during the hour or so I?d need to do the firmware upgrades, but how would GPFS itself react to those (suspended) disks going away for a while? I?m thinking I could be OK if there was just a way to actually stop them rather than suspend them. Any undocumented options to mmchdisk that I?m not aware of??? Are there other options - besides buying IBM hardware - that I am overlooking? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Feb 8 16:25:33 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 8 Feb 2018 16:25:33 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop Message-ID: Check out ?unmountOnDiskFail? config parameter perhaps? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_tuningguide.htm unmountOnDiskFail The unmountOnDiskFail specifies how the GPFS daemon responds when a disk failure is detected. The valid values of this parameter are yes, no, and meta. The default value is no. I have it set to ?meta? which prevents the file system from unmounting if an NSD fails and the metadata is still available. I have 2 replicas of metadata and one data. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Thursday, February 8, 2018 at 10:15 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] mmchdisk suspend / stop So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Feb 8 16:31:25 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 08 Feb 2018 11:31:25 -0500 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: Message-ID: <14127.1518107485@turing-police.cc.vt.edu> On Thu, 08 Feb 2018 16:25:33 +0000, "Oesterlin, Robert" said: > unmountOnDiskFail > The unmountOnDiskFail specifies how the GPFS daemon responds when a disk > failure is detected. The valid values of this parameter are yes, no, and meta. > The default value is no. I suspect that the only relevant setting there is the default 'no' - it sounds like these 5 NSD's are just one storage pool in a much larger filesystem, and Kevin doesn't want the entire thing to unmount if GPFS notices that the NSDs went walkies. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Feb 8 17:10:39 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Feb 2018 12:10:39 -0500 Subject: [gpfsug-discuss] Inode scan optimization - (Tomasz.Wolski@ts.fujitsu.com ) In-Reply-To: <9124.1518105142@turing-police.cc.vt.edu> References: <90738848a99d4e67b8537305242aa988@R01UKEXCASM223.r01.fujitsu.local> <9124.1518105142@turing-police.cc.vt.edu> Message-ID: Let's give Fujitsu an opportunity to answer with some facts and re-pose their questions. When I first read the complaint, I kinda assumed they were using mmbackup and TSM -- but then I noticed words about some gpfs_XXX apis.... So it looks like this Fujitsu fellow is "rolling his own"... NOT using mmapplypolicy. And we don't know if he is backing up to an old paper tape punch device or what ! He's just saying that whatever it is that he did took 60 days... Can you get from here to there faster? Sure, take an airplane instead of walking! My other remark which had a typo was and is: There have many satisfied customers and installations of Spectrum Scale File System using mmbackup and/or Tivoli Storage Manager. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sxiao at us.ibm.com Thu Feb 8 17:17:45 2018 From: sxiao at us.ibm.com (Steve Xiao) Date: Thu, 8 Feb 2018 12:17:45 -0500 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: References: Message-ID: You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Feb 8 19:38:33 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 8 Feb 2018 19:38:33 +0000 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: References: Message-ID: <550b2cc6552f4e669d2cfee72b1a244a@jumptrading.com> I don't know or care who the hardware vendor is, but they can DEFINITELY ship you a controller with the right firmware! Just demand it, which is what I do and they have basically always complied with the request. There is the risk associated with running even longer with a single point of failure, only using the surviving controller, but if this storage system has been in production a long time (e.g. a year or so) and is generally reliable, then they should be able to get you a new, factory tested controller with the right FW versions in a couple of days. The choice is yours of course, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Steve Xiao Sent: Thursday, February 08, 2018 11:18 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) Note: External Email ________________________________ You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Feb 8 19:48:54 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 8 Feb 2018 19:48:54 +0000 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: References: Message-ID: <769B6E06-BAB5-4EDB-A5A3-54E1063A8A6D@vanderbilt.edu> Hi again all, It sounds like doing the ?mmchconfig unmountOnDiskFail=meta -i? suggested by Steve and Bob followed by using mmchdisk to stop the disks temporarily is the way we need to go. We will, as an aside, also run a mmapplypolicy first to pull any files users have started accessing again back to the ?regular? pool before doing any of this. Given that this is our ?capacity? pool and files have to have an atime > 90 days to get migrated there in the 1st place I think this is reasonable. Especially since users will get an I/O error if they happen to try to access one of those NSDs during the brief maintenance window. As to naming and shaming the vendor ? I?m not going to do that at this point in time. We?ve been using their stuff for well over a decade at this point and have had a generally positive experience with them. In fact, I have spoken with them via phone since my original post today and they have clarified that the problem with the mismatched firmware is only an issue because we are a major version off of what is current due to us choosing to not have a downtime and therefore not having done any firmware upgrades in well over 18 months. Thanks, all... Kevin On Feb 8, 2018, at 11:17 AM, Steve Xiao > wrote: You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C435bd89b3fcc4a94ee5008d56f17e49e%7C5f88b91902e3490fb772327aa8177b95%7C0%7C0%7C636537070783260582&sdata=AbY7rJQecb76rMC%2FlxrthyzHfueQDJTT%2FJuuRCac5g8%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Thu Feb 8 18:33:32 2018 From: ewahl at osc.edu (Edward Wahl) Date: Thu, 8 Feb 2018 13:33:32 -0500 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: <20180208133332.30440b89@osc.edu> I'm with Richard on this one. Sounds dubious to me. Even older style stuff could start a new controller in a 'failed' or 'service' state and push firmware back in the 20th century... ;) Ed On Thu, 8 Feb 2018 16:23:33 +0000 "Sobey, Richard A" wrote: > Sorry I can?t help? the only thing going round and round my head right now is > why on earth the existing controller cannot push the required firmware to the > new one when it comes online. Never heard of anything else! Feel free to name > and shame so I can avoid ? > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, > Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmchdisk > suspend / stop > > Hi All, > > We are in a bit of a difficult situation right now with one of our non-IBM > hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are > looking for some advice on how to deal with this unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? controllers. One of > those controllers is dead and the vendor is sending us a replacement. > However, the replacement controller will have mis-matched firmware with the > surviving controller and - long story short - the vendor says there is no way > to resolve that without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included here, but > I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are part of > our ?capacity? pool ? i.e. the only way a file lands here is if an > mmapplypolicy scan moved it there because the *access* time is greater than > 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either suspend > or (preferably) stop those NSDs, do the firmware upgrade, and resume the > NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents > the allocation of new blocks ? so, in theory, if a user suddenly decided to > start using a file they hadn?t needed for 3 months then I?ve got a problem. > Stopping all I/O to the disks is what I really want to do. However, > according to the mmchdisk man page stop cannot be used on a filesystem with > replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of them or > setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those NSDs > during the hour or so I?d need to do the firmware upgrades, but how would > GPFS itself react to those (suspended) disks going away for a while? I?m > thinking I could be OK if there was just a way to actually stop them rather > than suspend them. Any undocumented options to mmchdisk that I?m not aware > of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From aaron.s.knister at nasa.gov Thu Feb 8 20:22:52 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 8 Feb 2018 15:22:52 -0500 (EST) Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> Message-ID: Hi Loic, Thank you for that information! I have two follow up questions-- 1. Are you using ccr? 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. what's the output of "mmlsconfig mmsdrservPort" on your cluster?). -Aaron On Thu, 8 Feb 2018, Loic Tortay wrote: > On 07/02/2018 22:28, Aaron Knister wrote: >> I noticed something curious after migrating some nodes from 4.1 to 4.2 >> which is that mounts now can take foorrreeevverrr. It seems to boil down >> to the point in the mount process where getEFOptions is called. >> >> To highlight the difference-- >> > [...] >> > Hello, > I have had this (or a very similar) issue after migrating from 4.1.1.8 to > 4.2.3. There are 37 filesystems in our main cluster, which made the problem > really noticeable. > > A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, > I'm told, should be released today) actually resolve my problems (APAR > IJ03192 & IJ03235). > > > Lo?c. > -- > | Lo?c Tortay - IN2P3 Computing Centre | > From Robert.Oesterlin at nuance.com Thu Feb 8 20:34:35 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 8 Feb 2018 20:34:35 +0000 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Message-ID: We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. Bob Oesterlin Sr Principal Storage Engineer, Nuance SSUG Co-principal -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Feb 8 21:11:34 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 8 Feb 2018 21:11:34 +0000 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> Message-ID: <2dbcc01f542d40698a7ad6cc10d2dbd1@jumptrading.com> It may be related to this issue of using root squashed file system option, here are some edited comments from my colleague who stumbled upon this while chatting with a friend at a CUG: " Something I learned last week: apparently the libmount code from util-linux (used by /bin/mount) will call utimensat() on new mountpoints if access() fails (for example, on root-squashed filesystems). This is done "just to be sure" that the filesystem is really read-only. This operation can be quite expensive and (anecdotally) may cause huge slowdowns when mounting root-squashed parallel filesystems on thousands of clients. Here is the relevant code: https://github.com/karelzak/util-linux/blame/1ea4e7bd8d9d0f0ef317558c627e6fa069950e8d/libmount/src/utils.c#L222 This code has been in util-linux for years. It's not clear exactly what the impact is in our environment, but this certainly can't be helping, especially since we've grown the size of the cluster considerably. Mounting GPFS has recently really become a slow and disruptive operation ? if you try to mount many clients at once, the FS will hang for a considerable period of time. The timing varies, but here is one example from an isolated mounting operation: 12:09:11.222513 mount("", "", "gpfs", MS_MGC_VAL, "dev="...) = 0 <1.590217> 12:09:12.812777 access("", W_OK) = -1 EACCES (Permission denied) <0.000022> 12:09:12.812841 utimensat(AT_FDCWD, "", \{UTIME_NOW, \{93824994378048, 1073741822}}, 0) = -1 EPERM (Operation not permitted) <2.993689> Here, the utimensat() took ~3 seconds, almost twice as long as the mount operation! I also suspect it will slow down other clients trying to mount the filesystem since the sgmgr has to process this write attempt to the mountpoint. (Hilariously, it still returns the "wrong" answer, because this filesystem is not read-only, just squashed.) As of today, the person who originally brought the issue to my attention at CUG has raised it for discussion on the util-linux mailing list. https://marc.info/?l=util-linux-ng&m=151075932824688&w=2 " We ended up putting facls on our mountpoints like such, which hacked around this stupidity: for fs in gpfs_mnt_point ; do chmod 1755 $fs setfacl -m u:99:rwx $fs # 99 is the "nobody" uid to which root is mapped--see "mmauth" output done Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aaron Knister Sent: Thursday, February 08, 2018 2:23 PM To: Loic Tortay Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Note: External Email ------------------------------------------------- Hi Loic, Thank you for that information! I have two follow up questions-- 1. Are you using ccr? 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. what's the output of "mmlsconfig mmsdrservPort" on your cluster?). -Aaron On Thu, 8 Feb 2018, Loic Tortay wrote: > On 07/02/2018 22:28, Aaron Knister wrote: >> I noticed something curious after migrating some nodes from 4.1 to 4.2 >> which is that mounts now can take foorrreeevverrr. It seems to boil down >> to the point in the mount process where getEFOptions is called. >> >> To highlight the difference-- >> > [...] >> > Hello, > I have had this (or a very similar) issue after migrating from 4.1.1.8 to > 4.2.3. There are 37 filesystems in our main cluster, which made the problem > really noticeable. > > A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, > I'm told, should be released today) actually resolve my problems (APAR > IJ03192 & IJ03235). > > > Lo?c. > -- > | Lo?c Tortay > - IN2P3 Computing Centre | > ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tortay at cc.in2p3.fr Fri Feb 9 08:59:12 2018 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Fri, 9 Feb 2018 09:59:12 +0100 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> Message-ID: <969a0c4b-a3b0-2fdb-80f4-2913bc9b0a67@cc.in2p3.fr> On 02/08/2018 09:22 PM, Aaron Knister wrote: > Hi Loic, > > Thank you for that information! > > I have two follow up questions-- > 1. Are you using ccr? > 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. > what's the output of "mmlsconfig mmsdrservPort" on your cluster?). > Hello, We do not use CCR on this cluster (yet). We use the default port for mmsdrserv: # mmlsconfig mmsdrservPort mmsdrservPort 1191 Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From Renar.Grunenberg at huk-coburg.de Fri Feb 9 09:06:32 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 09:06:32 +0000 Subject: [gpfsug-discuss] V5 Experience Message-ID: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From frankli at us.ibm.com Fri Feb 9 11:29:17 2018 From: frankli at us.ibm.com (Frank N Lee) Date: Fri, 9 Feb 2018 05:29:17 -0600 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th In-Reply-To: References: Message-ID: Bob, Can you provide your email or shall I just reply here? Frank Frank Lee, PhD IBM Systems 314-482-5329 | @drfranknlee From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 02/08/2018 02:35 PM Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Sent by: gpfsug-discuss-bounces at spectrumscale.org We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. Bob Oesterlin Sr Principal Storage Engineer, Nuance SSUG Co-principal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HIs14G9Qcs5MqpsAFL5E0TH5hqFD-KbquYdQ_mTmTnI&m=_7q7xOAgpDoLwznJe069elHn1thk8KmxGLgXM6zuST0&s=1aWP0EJWxIsAycMNiVX7v4FWC5BsSzyx566RyllXCCM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Fri Feb 9 11:53:30 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Fri, 9 Feb 2018 12:53:30 +0100 Subject: [gpfsug-discuss] V5 Experience In-Reply-To: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Fri Feb 9 12:30:10 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 9 Feb 2018 12:30:10 +0000 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Message-ID: <1AC64CE4-BEE8-4C4B-BB7D-02A39C176621@nuance.com> Replied to Frank directly. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Frank N Lee Reply-To: gpfsug main discussion list Date: Friday, February 9, 2018 at 5:30 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Bob, Can you provide your email or shall I just reply here? Frank Frank Lee, PhD IBM Systems 314-482-5329 | @drfranknlee From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 02/08/2018 02:35 PM Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th Sent by: gpfsug-discuss-bounces at spectrumscale.org We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. Bob Oesterlin Sr Principal Storage Engineer, Nuance SSUG Co-principal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HIs14G9Qcs5MqpsAFL5E0TH5hqFD-KbquYdQ_mTmTnI&m=_7q7xOAgpDoLwznJe069elHn1thk8KmxGLgXM6zuST0&s=1aWP0EJWxIsAycMNiVX7v4FWC5BsSzyx566RyllXCCM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 138 bytes Desc: image001.png URL: From YARD at il.ibm.com Fri Feb 9 13:28:49 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Fri, 9 Feb 2018 15:28:49 +0200 Subject: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) In-Reply-To: <769B6E06-BAB5-4EDB-A5A3-54E1063A8A6D@vanderbilt.edu> References: <769B6E06-BAB5-4EDB-A5A3-54E1063A8A6D@vanderbilt.edu> Message-ID: Hi Just make sure you have a backup, just in case ... Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage architect Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 02/08/2018 09:49 PM Subject: Re: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi again all, It sounds like doing the ?mmchconfig unmountOnDiskFail=meta -i? suggested by Steve and Bob followed by using mmchdisk to stop the disks temporarily is the way we need to go. We will, as an aside, also run a mmapplypolicy first to pull any files users have started accessing again back to the ?regular? pool before doing any of this. Given that this is our ?capacity? pool and files have to have an atime > 90 days to get migrated there in the 1st place I think this is reasonable. Especially since users will get an I/O error if they happen to try to access one of those NSDs during the brief maintenance window. As to naming and shaming the vendor ? I?m not going to do that at this point in time. We?ve been using their stuff for well over a decade at this point and have had a generally positive experience with them. In fact, I have spoken with them via phone since my original post today and they have clarified that the problem with the mismatched firmware is only an issue because we are a major version off of what is current due to us choosing to not have a downtime and therefore not having done any firmware upgrades in well over 18 months. Thanks, all... Kevin On Feb 8, 2018, at 11:17 AM, Steve Xiao wrote: You can change the cluster configuration to online unmount the file system when there is error accessing metadata. This can be done run the following command: mmchconfig unmountOnDiskFail=meta -i After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command. While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable. Steve Y. Xiao > Date: Thu, 8 Feb 2018 15:59:44 +0000 > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: [gpfsug-discuss] mmchdisk suspend / stop > Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu> > Content-Type: text/plain; charset="utf-8" > > Hi All, > > We are in a bit of a difficult situation right now with one of our > non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! > ) and are looking for some advice on how to deal with this > unfortunate situation. > > We have a non-IBM FC storage array with dual-?redundant? > controllers. One of those controllers is dead and the vendor is > sending us a replacement. However, the replacement controller will > have mis-matched firmware with the surviving controller and - long > story short - the vendor says there is no way to resolve that > without taking the storage array down for firmware upgrades. > Needless to say there?s more to that story than what I?ve included > here, but I won?t bore everyone with unnecessary details. > > The storage array has 5 NSDs on it, but fortunately enough they are > part of our ?capacity? pool ? i.e. the only way a file lands here is > if an mmapplypolicy scan moved it there because the *access* time is > greater than 90 days. Filesystem data replication is set to one. > > So ? what I was wondering if I could do is to use mmchdisk to either > suspend or (preferably) stop those NSDs, do the firmware upgrade, > and resume the NSDs? The problem I see is that suspend doesn?t stop > I/O, it only prevents the allocation of new blocks ? so, in theory, > if a user suddenly decided to start using a file they hadn?t needed > for 3 months then I?ve got a problem. Stopping all I/O to the disks > is what I really want to do. However, according to the mmchdisk man > page stop cannot be used on a filesystem with replication set to one. > > There?s over 250 TB of data on those 5 NSDs, so restriping off of > them or setting replication to two are not options. > > It is very unlikely that anyone would try to access a file on those > NSDs during the hour or so I?d need to do the firmware upgrades, but > how would GPFS itself react to those (suspended) disks going away > for a while? I?m thinking I could be OK if there was just a way to > actually stop them rather than suspend them. Any undocumented > options to mmchdisk that I?m not aware of??? > > Are there other options - besides buying IBM hardware - that I am > overlooking? Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C435bd89b3fcc4a94ee5008d56f17e49e%7C5f88b91902e3490fb772327aa8177b95%7C0%7C0%7C636537070783260582&sdata=AbY7rJQecb76rMC%2FlxrthyzHfueQDJTT%2FJuuRCac5g8%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=3yfKUCiWGXtAEPiwlmQNFGTjLx5h3PlCYfUXDBMGJpQ&s=-pkjeFOUVSDUGgwtKkoYbmGLADk2UHfDbUPiuWSw4gQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From knop at us.ibm.com Fri Feb 9 13:32:30 2018 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 9 Feb 2018 08:32:30 -0500 Subject: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? In-Reply-To: <2dbcc01f542d40698a7ad6cc10d2dbd1@jumptrading.com> References: <9869457d-322e-fd27-1051-cb4875832215@cc.in2p3.fr> <2dbcc01f542d40698a7ad6cc10d2dbd1@jumptrading.com> Message-ID: All, For at least one of the instances reported by this group, a PMR has been opened, and a fix is being developed. For folks that are getting affected by the problem: Please contact the service team to confirm your problem is the same as the one previously reported, and for an outlook for the availability of the fix. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Bryan Banister To: gpfsug main discussion list , "Loic Tortay" Date: 02/08/2018 04:11 PM Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Sent by: gpfsug-discuss-bounces at spectrumscale.org It may be related to this issue of using root squashed file system option, here are some edited comments from my colleague who stumbled upon this while chatting with a friend at a CUG: " Something I learned last week: apparently the libmount code from util-linux (used by /bin/mount) will call utimensat() on new mountpoints if access() fails (for example, on root-squashed filesystems). This is done "just to be sure" that the filesystem is really read-only. This operation can be quite expensive and (anecdotally) may cause huge slowdowns when mounting root-squashed parallel filesystems on thousands of clients. Here is the relevant code: https://github.com/karelzak/util-linux/blame/1ea4e7bd8d9d0f0ef317558c627e6fa069950e8d/libmount/src/utils.c#L222 This code has been in util-linux for years. It's not clear exactly what the impact is in our environment, but this certainly can't be helping, especially since we've grown the size of the cluster considerably. Mounting GPFS has recently really become a slow and disruptive operation ? if you try to mount many clients at once, the FS will hang for a considerable period of time. The timing varies, but here is one example from an isolated mounting operation: 12:09:11.222513 mount("", "", "gpfs", MS_MGC_VAL, "dev="...) = 0 <1.590217> 12:09:12.812777 access("", W_OK) = -1 EACCES (Permission denied) <0.000022> 12:09:12.812841 utimensat(AT_FDCWD, "", \{UTIME_NOW, \{93824994378048, 1073741822}}, 0) = -1 EPERM (Operation not permitted) <2.993689> Here, the utimensat() took ~3 seconds, almost twice as long as the mount operation! I also suspect it will slow down other clients trying to mount the filesystem since the sgmgr has to process this write attempt to the mountpoint. (Hilariously, it still returns the "wrong" answer, because this filesystem is not read-only, just squashed.) As of today, the person who originally brought the issue to my attention at CUG has raised it for discussion on the util-linux mailing list. https://marc.info/?l=util-linux-ng&m=151075932824688&w=2 " We ended up putting facls on our mountpoints like such, which hacked around this stupidity: for fs in gpfs_mnt_point ; do chmod 1755 $fs setfacl -m u:99:rwx $fs # 99 is the "nobody" uid to which root is mapped--see "mmauth" output done Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aaron Knister Sent: Thursday, February 08, 2018 2:23 PM To: Loic Tortay Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1? Note: External Email ------------------------------------------------- Hi Loic, Thank you for that information! I have two follow up questions-- 1. Are you using ccr? 2. Do you happen to have mmsdrserv disabled in your environment? (e.g. what's the output of "mmlsconfig mmsdrservPort" on your cluster?). -Aaron On Thu, 8 Feb 2018, Loic Tortay wrote: > On 07/02/2018 22:28, Aaron Knister wrote: >> I noticed something curious after migrating some nodes from 4.1 to 4.2 >> which is that mounts now can take foorrreeevverrr. It seems to boil down >> to the point in the mount process where getEFOptions is called. >> >> To highlight the difference-- >> > [...] >> > Hello, > I have had this (or a very similar) issue after migrating from 4.1.1.8 to > 4.2.3. There are 37 filesystems in our main cluster, which made the problem > really noticeable. > > A PMR has been opened. I have tested the fixes included in 4.2.3.7, (which, > I'm told, should be released today) actually resolve my problems (APAR > IJ03192 & IJ03235). > > > Lo?c. > -- > | Lo?c Tortay - IN2P3 Computing Centre | > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=C0S8WTufrOCvXbHUegB8zS9jk_1SLczALa-4aVEubu4&s=VTWKI-xcUiJ_LeMhJ-xOPmnz0Zm9IspKsU3bsxA4BNo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From carlz at us.ibm.com Fri Feb 9 13:46:51 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Fri, 9 Feb 2018 13:46:51 +0000 Subject: [gpfsug-discuss] V5 Experience In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Feb 9 13:58:58 2018 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 9 Feb 2018 08:58:58 -0500 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Uwe Falke" To: gpfsug main discussion list Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From p.ward at nhm.ac.uk Thu Feb 8 16:46:25 2018 From: p.ward at nhm.ac.uk (Paul Ward) Date: Thu, 8 Feb 2018 16:46:25 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: We tend to get the maintenance company to down-grade the firmware to match what we have for our aging hardware, before sending it to us. I assume this isn?t an option? Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] mmchdisk suspend / stop Hi All, We are in a bit of a difficult situation right now with one of our non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are looking for some advice on how to deal with this unfortunate situation. We have a non-IBM FC storage array with dual-?redundant? controllers. One of those controllers is dead and the vendor is sending us a replacement. However, the replacement controller will have mis-matched firmware with the surviving controller and - long story short - the vendor says there is no way to resolve that without taking the storage array down for firmware upgrades. Needless to say there?s more to that story than what I?ve included here, but I won?t bore everyone with unnecessary details. The storage array has 5 NSDs on it, but fortunately enough they are part of our ?capacity? pool ? i.e. the only way a file lands here is if an mmapplypolicy scan moved it there because the *access* time is greater than 90 days. Filesystem data replication is set to one. So ? what I was wondering if I could do is to use mmchdisk to either suspend or (preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs? The problem I see is that suspend doesn?t stop I/O, it only prevents the allocation of new blocks ? so, in theory, if a user suddenly decided to start using a file they hadn?t needed for 3 months then I?ve got a problem. Stopping all I/O to the disks is what I really want to do. However, according to the mmchdisk man page stop cannot be used on a filesystem with replication set to one. There?s over 250 TB of data on those 5 NSDs, so restriping off of them or setting replication to two are not options. It is very unlikely that anyone would try to access a file on those NSDs during the hour or so I?d need to do the firmware upgrades, but how would GPFS itself react to those (suspended) disks going away for a while? I?m thinking I could be OK if there was just a way to actually stop them rather than suspend them. Any undocumented options to mmchdisk that I?m not aware of??? Are there other options - besides buying IBM hardware - that I am overlooking? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Feb 9 14:30:34 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 14:30:34 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: Felipe, all, first thanks for clarification, but what was the reason for this logic? If i upgrade to Version 5 and want to create new filesystems, and the maxblocksize is on 1M, we must shutdown the hole cluster to change this to the defaults to use the new one default. I had no understanding for that decision. We are at 7 x 24h availability with our cluster today, we had no real maintenance window here! Any circumvention are welcome. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Freitag, 9. Februar 2018 14:59 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From oehmes at gmail.com Fri Feb 9 14:47:54 2018 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 09 Feb 2018 14:47:54 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one thing we might consider doing is changing the command to use the current active maxblocksize as input for mmcrfs if maxblocksize is below current default. Sven On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar < Renar.Grunenberg at huk-coburg.de> wrote: > Felipe, all, > > first thanks for clarification, but what was the reason for this logic? If > i upgrade to Version 5 and want to create new filesystems, and the > maxblocksize is on 1M, we must shutdown the hole cluster to change this to > the defaults to use the new one default. I had no understanding for that > decision. We are at 7 x 24h availability with our cluster today, we had no > real maintenance window here! Any circumvention are welcome. > > > > Regards Renar > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *Im Auftrag von *Felipe Knop > *Gesendet:* Freitag, 9. Februar 2018 14:59 > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] V5 Experience -- maxblocksize > > > > All, > > Correct. There is no need to change the value of 'maxblocksize' for > existing clusters which are upgraded to the 5.0.0 level. If a new file > system needs to be created with a block size which exceeds the value of > maxblocksize then the mmchconfig needs to be issued to increase the value > of maxblocksize (which requires the entire cluster to be stopped). > > For clusters newly created with 5.0.0, the value of maxblocksize is set to > 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man > pages in 5.0.0 . > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 > AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe > Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default > is <>1MB, so your config parameter was properly translated. > > From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM > Subject: Re: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > I suppose the new maxBlockSize default is <>1MB, so your config parameter > was properly translated. I'd see no need to change anything. > > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 <+49%20371%2069782165> > Mobile: +49 175 575 2877 <+49%20175%205752877> > E-Mail: uwefalke at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > > Date: 02/09/2018 10:16 AM > Subject: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hallo All, > we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but > I see after the mmchconfig release=LATEST a new common parameter > ?maxblocksize 1M? > (our fs are on these blocksizes) is happening. > Ok, but if I will change this parameter the hole cluster was requestet > that: > > root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT > Verifying GPFS is stopped on all nodes ... > mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de > mmchconfig: Command failed. Examine previous error messages to determine > cause. > Can someone explain the behavior here, and same clarification in an update > plan what can we do to go to the defaults without clusterdown. > Is this a bug or a feature;-) > > Regards Renar > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: > 09561 96-44110 > Telefax: > 09561 96-44104 > E-Mail: > Renar.Grunenberg at huk-coburg.de > Internet: > www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Fri Feb 9 14:59:31 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 14:59:31 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: Hallo Sven, that stated a mmcrfs ?newfs? -B 4M is possible if the maxblocksize is 1M (from the upgrade) without the requirement to change this parameter before?? Correct or not? Regards Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sven Oehme Gesendet: Freitag, 9. Februar 2018 15:48 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one thing we might consider doing is changing the command to use the current active maxblocksize as input for mmcrfs if maxblocksize is below current default. Sven On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar > wrote: Felipe, all, first thanks for clarification, but what was the reason for this logic? If i upgrade to Version 5 and want to create new filesystems, and the maxblocksize is on 1M, we must shutdown the hole cluster to change this to the defaults to use the new one default. I had no understanding for that decision. We are at 7 x 24h availability with our cluster today, we had no real maintenance window here! Any circumvention are welcome. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Freitag, 9. Februar 2018 14:59 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Feb 9 15:08:38 2018 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 09 Feb 2018 15:08:38 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: you can only create a filesystem with a blocksize of what ever current maxblocksize is set. let me discuss with felipe what//if we can share here to solve this. sven On Fri, Feb 9, 2018 at 6:59 AM Grunenberg, Renar < Renar.Grunenberg at huk-coburg.de> wrote: > Hallo Sven, > > that stated a mmcrfs ?newfs? -B 4M is possible if the maxblocksize is 1M > (from the upgrade) without the requirement to change this parameter > before?? Correct or not? > > Regards > > > > > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *Im Auftrag von *Sven Oehme > *Gesendet:* Freitag, 9. Februar 2018 15:48 > > > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] V5 Experience -- maxblocksize > > > > Renar, > > > > if you specify the filesystem blocksize of 1M during mmcr you don't have > to restart anything. scale 5 didn't change anything on the behaviour of > maxblocksize change while the cluster is online, it only changed the > default passed to the blocksize parameter for create a new filesystem. one > thing we might consider doing is changing the command to use the current > active maxblocksize as input for mmcrfs if maxblocksize is below current > default. > > > > Sven > > > > > > On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar < > Renar.Grunenberg at huk-coburg.de> wrote: > > Felipe, all, > > first thanks for clarification, but what was the reason for this logic? If > i upgrade to Version 5 and want to create new filesystems, and the > maxblocksize is on 1M, we must shutdown the hole cluster to change this to > the defaults to use the new one default. I had no understanding for that > decision. We are at 7 x 24h availability with our cluster today, we had no > real maintenance window here! Any circumvention are welcome. > > > > Regards Renar > > > > Renar Grunenberg > Abteilung Informatik ? Betrieb > > > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > > Telefon: > > 09561 96-44110 > > Telefax: > > 09561 96-44104 > > E-Mail: > > Renar.Grunenberg at huk-coburg.de > > Internet: > > www.huk.de > > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *Im Auftrag von *Felipe Knop > *Gesendet:* Freitag, 9. Februar 2018 14:59 > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] V5 Experience -- maxblocksize > > > > All, > > Correct. There is no need to change the value of 'maxblocksize' for > existing clusters which are upgraded to the 5.0.0 level. If a new file > system needs to be created with a block size which exceeds the value of > maxblocksize then the mmchconfig needs to be issued to increase the value > of maxblocksize (which requires the entire cluster to be stopped). > > For clusters newly created with 5.0.0, the value of maxblocksize is set to > 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man > pages in 5.0.0 . > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Uwe Falke" ---02/09/2018 06:54:10 > AM---I suppose the new maxBlockSize default is <>1MB, so your conf]"Uwe > Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default > is <>1MB, so your config parameter was properly translated. > > From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM > Subject: Re: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > I suppose the new maxBlockSize default is <>1MB, so your config parameter > was properly translated. I'd see no need to change anything. > > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 <+49%20371%2069782165> > Mobile: +49 175 575 2877 <+49%20175%205752877> > E-Mail: uwefalke at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > > Date: 02/09/2018 10:16 AM > Subject: [gpfsug-discuss] V5 Experience > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hallo All, > we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but > I see after the mmchconfig release=LATEST a new common parameter > ?maxblocksize 1M? > (our fs are on these blocksizes) is happening. > Ok, but if I will change this parameter the hole cluster was requestet > that: > > root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT > Verifying GPFS is stopped on all nodes ... > mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de > mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de > mmchconfig: Command failed. Examine previous error messages to determine > cause. > Can someone explain the behavior here, and same clarification in an update > plan what can we do to go to the defaults without clusterdown. > Is this a bug or a feature;-) > > Regards Renar > Renar Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: > 09561 96-44110 > Telefax: > 09561 96-44104 > E-Mail: > Renar.Grunenberg at huk-coburg.de > Internet: > www.huk.de > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Feb 9 15:07:32 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 9 Feb 2018 15:07:32 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> Message-ID: <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> Hi All, Since several people have made this same suggestion, let me respond to that. We did ask the vendor - twice - to do that. Their response boils down to, ?No, the older version has bugs and we won?t send you a controller with firmware that we know has bugs in it.? We have not had a full cluster downtime since the summer of 2016 - and then it was only a one day downtime to allow the cleaning of our core network switches after an electrical fire in our data center! So the firmware on not only our storage arrays, but our SAN switches as well, it a bit out of date, shall we say? That is an issue we need to address internally ? our users love us not having regularly scheduled downtimes quarterly, yearly, or whatever, but there is a cost to doing business that way... Kevin On Feb 8, 2018, at 10:46 AM, Paul Ward > wrote: We tend to get the maintenance company to down-grade the firmware to match what we have for our aging hardware, before sending it to us. I assume this isn?t an option? Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Feb 9 15:12:13 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 9 Feb 2018 15:12:13 +0000 Subject: [gpfsug-discuss] V5 Experience -- maxblocksize In-Reply-To: References: <4da0f104a1ef474493d44c1f645465e9@SMXRF105.msg.hukrf.de> Message-ID: <8388dda58d064620908b9aa62ca86da5@SMXRF105.msg.hukrf.de> Hallo Sven, thanks, it?s clear now. You have work now ;-) Happy Weekend from Coburg. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sven Oehme Gesendet: Freitag, 9. Februar 2018 16:09 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize you can only create a filesystem with a blocksize of what ever current maxblocksize is set. let me discuss with felipe what//if we can share here to solve this. sven On Fri, Feb 9, 2018 at 6:59 AM Grunenberg, Renar > wrote: Hallo Sven, that stated a mmcrfs ?newfs? -B 4M is possible if the maxblocksize is 1M (from the upgrade) without the requirement to change this parameter before?? Correct or not? Regards Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sven Oehme Gesendet: Freitag, 9. Februar 2018 15:48 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one thing we might consider doing is changing the command to use the current active maxblocksize as input for mmcrfs if maxblocksize is below current default. Sven On Fri, Feb 9, 2018 at 6:30 AM Grunenberg, Renar > wrote: Felipe, all, first thanks for clarification, but what was the reason for this logic? If i upgrade to Version 5 and want to create new filesystems, and the maxblocksize is on 1M, we must shutdown the hole cluster to change this to the defaults to use the new one default. I had no understanding for that decision. We are at 7 x 24h availability with our cluster today, we had no real maintenance window here! Any circumvention are welcome. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Freitag, 9. Februar 2018 14:59 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] V5 Experience -- maxblocksize All, Correct. There is no need to change the value of 'maxblocksize' for existing clusters which are upgraded to the 5.0.0 level. If a new file system needs to be created with a block size which exceeds the value of maxblocksize then the mmchconfig needs to be issued to increase the value of maxblocksize (which requires the entire cluster to be stopped). For clusters newly created with 5.0.0, the value of maxblocksize is set to 4MB. See the references to maxblocksize in the mmchconfig and mmcrfs man pages in 5.0.0 . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Uwe Falke" ---02/09/2018 06:54:10 AM---I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. From: "Uwe Falke" > To: gpfsug main discussion list > Date: 02/09/2018 06:54 AM Subject: Re: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I suppose the new maxBlockSize default is <>1MB, so your config parameter was properly translated. I'd see no need to change anything. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02/09/2018 10:16 AM Subject: [gpfsug-discuss] V5 Experience Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we updated our Test-Cluster from 4.2.3.6 to V5.0.0.1. So good so fine, but I see after the mmchconfig release=LATEST a new common parameter ?maxblocksize 1M? (our fs are on these blocksizes) is happening. Ok, but if I will change this parameter the hole cluster was requestet that: root @sbdl7003(rhel7.4)> mmchconfig maxblocksize=DEFAULT Verifying GPFS is stopped on all nodes ... mmchconfig: GPFS is still active on SAPL7012x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7001x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7013x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7009x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7008x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7003x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SBDL7004x1.t7.lan.tuhuk.de mmchconfig: GPFS is still active on SAPL7001x1.t7.lan.tuhuk.de mmchconfig: Command failed. Examine previous error messages to determine cause. Can someone explain the behavior here, and same clarification in an update plan what can we do to go to the defaults without clusterdown. Is this a bug or a feature;-) Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas (stv.). Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=6lyCPEFGZrRBZrhH_iGlkum-CJi5MkJpfNnkOgs3mO0&s=VLofD771s6d1PyNl8EDOhntcFwAcZTrFbwdsWN9mcas&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Fri Feb 9 15:25:25 2018 From: p.ward at nhm.ac.uk (Paul Ward) Date: Fri, 9 Feb 2018 15:25:25 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> Message-ID: Not sure why it took over a day for my message to be sent out by the list? If it?s the firmware you currently have, I would still prefer to have it sent to me then I am able to do a controller firmware update online during an at risk period rather than a downtime, all the time you are running on one controller is at risk! Seems you have an alternative. Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 09 February 2018 15:08 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmchdisk suspend / stop Hi All, Since several people have made this same suggestion, let me respond to that. We did ask the vendor - twice - to do that. Their response boils down to, ?No, the older version has bugs and we won?t send you a controller with firmware that we know has bugs in it.? We have not had a full cluster downtime since the summer of 2016 - and then it was only a one day downtime to allow the cleaning of our core network switches after an electrical fire in our data center! So the firmware on not only our storage arrays, but our SAN switches as well, it a bit out of date, shall we say? That is an issue we need to address internally ? our users love us not having regularly scheduled downtimes quarterly, yearly, or whatever, but there is a cost to doing business that way... Kevin On Feb 8, 2018, at 10:46 AM, Paul Ward > wrote: We tend to get the maintenance company to down-grade the firmware to match what we have for our aging hardware, before sending it to us. I assume this isn?t an option? Paul Ward Technical Solutions Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From dzieko at wcss.pl Mon Feb 12 15:11:55 2018 From: dzieko at wcss.pl (Pawel Dziekonski) Date: Mon, 12 Feb 2018 16:11:55 +0100 Subject: [gpfsug-discuss] Configuration advice Message-ID: <20180212151155.GD23944@cefeid.wcss.wroc.pl> Hi All, I inherited from previous admin 2 separate gpfs machines. All hardware+software is old so I want to switch to new servers, new disk arrays, new gpfs version and new gpfs "design". Each machine has 4 gpfs filesystems and runs a TSM HSM client that migrates data to tapes using separate TSM servers: GPFS+HSM no 1 -> TSM server no 1 -> tapes GPFS+HSM no 2 -> TSM server no 2 -> tapes Migration is done by HSM (not GPFS policies). All filesystems are used for archiving results from HPC system and other files (a kind of backup - don't ask...). Data is written by users via nfs shares. There are 8 nfs mount points corresponding to 8 gpfs filesystems, but there is no real reason for that. 4 filesystems are large and heavily used, 4 remaining are almost not used. The question is how to configure new gpfs infrastructure? My initial impression is that I should create a GPFS cluster of 2+ nodes and export NFS using CES. The most important question is how many filesystem do I need? Maybe just 2 and 8 filesets? Or how to do that in a flexible way and not to lock myself in stupid configuration? any hints? thanks, Pawel ps. I will recall all data and copy it to new infrastructure. Yes, that's the way I want to do that. :) -- Pawel Dziekonski , http://www.wcss.pl Wroclaw Centre for Networking & Supercomputing, HPC Department From jonathan.buzzard at strath.ac.uk Tue Feb 13 13:43:01 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 13 Feb 2018 13:43:01 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> Message-ID: <1518529381.3326.93.camel@strath.ac.uk> On Fri, 2018-02-09 at 15:07 +0000, Buterbaugh, Kevin L wrote: > Hi All, > > Since several people have made this same suggestion, let me respond > to that. ?We did ask the vendor - twice - to do that. ?Their response > boils down to, ?No, the older version has bugs and we won?t send you > a controller with firmware that we know has bugs in it.? > > We have not had a full cluster downtime since the summer of 2016 - > and then it was only a one day downtime to allow the cleaning of our > core network switches after an electrical fire in our data center! > ?So the firmware on not only our storage arrays, but our SAN switches > as well, it a bit out of date, shall we say? > > That is an issue we need to address internally ? our users love us > not having regularly scheduled downtimes quarterly, yearly, or > whatever, but there is a cost to doing business that way... > What sort of storage arrays are you using that don't allow you to do a live update of the controller firmware? Heck these days even cheapy Dell MD3 series storage arrays allow you to do live drive firmware updates. Similarly with SAN switches surely you have separate A/B fabrics and can upgrade them one at a time live. In a properly designed system one should not need to schedule downtime for firmware updates. He says as he plans a firmware update on his routers for next Tuesday morning, with no scheduled downtime and no interruption to service. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Kevin.Buterbaugh at Vanderbilt.Edu Tue Feb 13 15:56:00 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 13 Feb 2018 15:56:00 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: <1518529381.3326.93.camel@strath.ac.uk> References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> <1518529381.3326.93.camel@strath.ac.uk> Message-ID: Hi JAB, OK, let me try one more time to clarify. I?m not naming the vendor ? they?re a small maker of commodity storage and we?ve been using their stuff for years and, overall, it?s been very solid. The problem in this specific case is that a major version firmware upgrade is required ? if the controllers were only a minor version apart we could do it live. And yes, we can upgrade our QLogic SAN switches firmware live ? in fact, we?ve done that in the past. Should?ve been more clear there ? we just try to do that as infrequently as possible. So the bottom line here is that we were unaware that ?major version? firmware upgrades could not be done live on our storage, but we?ve got a plan to work around this this time. Kevin > On Feb 13, 2018, at 7:43 AM, Jonathan Buzzard wrote: > > On Fri, 2018-02-09 at 15:07 +0000, Buterbaugh, Kevin L wrote: >> Hi All, >> >> Since several people have made this same suggestion, let me respond >> to that. We did ask the vendor - twice - to do that. Their response >> boils down to, ?No, the older version has bugs and we won?t send you >> a controller with firmware that we know has bugs in it.? >> >> We have not had a full cluster downtime since the summer of 2016 - >> and then it was only a one day downtime to allow the cleaning of our >> core network switches after an electrical fire in our data center! >> So the firmware on not only our storage arrays, but our SAN switches >> as well, it a bit out of date, shall we say? >> >> That is an issue we need to address internally ? our users love us >> not having regularly scheduled downtimes quarterly, yearly, or >> whatever, but there is a cost to doing business that way... >> > > What sort of storage arrays are you using that don't allow you to do a > live update of the controller firmware? Heck these days even cheapy > Dell MD3 series storage arrays allow you to do live drive firmware > updates. > > Similarly with SAN switches surely you have separate A/B fabrics and > can upgrade them one at a time live. > > In a properly designed system one should not need to schedule downtime > for firmware updates. He says as he plans a firmware update on his > routers for next Tuesday morning, with no scheduled downtime and no > interruption to service. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C16b7c1eca3d846afc65208d572e7b6f1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636541261898197334&sdata=fY66HEDEia55g2x18VETOmE755IH7lXAfoznAewCe5A%3D&reserved=0 From griznog at gmail.com Wed Feb 14 05:32:39 2018 From: griznog at gmail.com (John Hanks) Date: Tue, 13 Feb 2018 21:32:39 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. Message-ID: Hi, We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty straightforward run of the mill stuff. But are seeing this odd behavior. If I do this in a shell script, given a file called "a" cat a a a a a a a a a a > /path/to/gpfs/mount/test grep ATAG /path/to/gpfs/mount/test | wc -l sleep 4 grep ATAG /path/to/gpfs/mount/test | wc -l The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/gpfs/mount/test matches" The second grep | wc -l returns the correct count of ATAG in the file. Why does it take 4 seconds (3 isn't enough) for that file to be properly recognized as a text file and/or why is it seen as a binary file in the first place since a is a plain text file? Note that I have the same filesystem mounted via NFS and over an NFS mount it works as expected. Any illumination is appreciated, jbh -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Feb 14 06:49:42 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 14 Feb 2018 08:49:42 +0200 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: Hi This seems to be setup specific Care to explain a bit more of the setup. Number of nodes GPFS versions, number of FS, Networking, running from admin node, server / client, number of NSD, separated meta and data, etc? I got interested and run a quick test on a gpfs far from powerful cluster of 3 nodes on KVM [root at specscale01 IBM_REPO]# echo "a a a a a a a a a a" > test && grep ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l 0 0 [root at specscale01 IBM_REPO]# -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: John Hanks To: gpfsug-discuss Date: 14/02/2018 07:33 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty straightforward run of the mill stuff. But are seeing this odd behavior. If I do this in a shell script, given a file called "a" cat a a a a a a a a a a > /path/to/gpfs/mount/test grep ATAG /path/to/gpfs/mount/test | wc -l sleep 4 grep ATAG /path/to/gpfs/mount/test | wc -l The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/gpfs/mount/test matches" The second grep | wc -l returns the correct count of ATAG in the file. Why does it take 4 seconds (3 isn't enough) for that file to be properly recognized as a text file and/or why is it seen as a binary file in the first place since a is a plain text file? Note that I have the same filesystem mounted via NFS and over an NFS mount it works as expected. Any illumination is appreciated, jbh_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=ut35qIIMxjZMX3obFJ2xtUMng4MtGtKz4YHxpkgQbak&s=cNt66GjRD6rVhq7nGcvT76l-0_u2C3UTz9SfwzHf1xw&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Feb 14 06:53:20 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 14 Feb 2018 08:53:20 +0200 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: Sorry With cat [root at specscale01 IBM_REPO]# cp test a [root at specscale01 IBM_REPO]# cat a a a a > test && grep ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l 0 0 -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: Luis Bolinches To: gpfsug main discussion list Date: 14/02/2018 08:49 Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi This seems to be setup specific Care to explain a bit more of the setup. Number of nodes GPFS versions, number of FS, Networking, running from admin node, server / client, number of NSD, separated meta and data, etc? I got interested and run a quick test on a gpfs far from powerful cluster of 3 nodes on KVM [root at specscale01 IBM_REPO]# echo "a a a a a a a a a a" > test && grep ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l 0 0 [root at specscale01 IBM_REPO]# -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: John Hanks To: gpfsug-discuss Date: 14/02/2018 07:33 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty straightforward run of the mill stuff. But are seeing this odd behavior. If I do this in a shell script, given a file called "a" cat a a a a a a a a a a > /path/to/gpfs/mount/test grep ATAG /path/to/gpfs/mount/test | wc -l sleep 4 grep ATAG /path/to/gpfs/mount/test | wc -l The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/gpfs/mount/test matches" The second grep | wc -l returns the correct count of ATAG in the file. Why does it take 4 seconds (3 isn't enough) for that file to be properly recognized as a text file and/or why is it seen as a binary file in the first place since a is a plain text file? Note that I have the same filesystem mounted via NFS and over an NFS mount it works as expected. Any illumination is appreciated, jbh_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=ut35qIIMxjZMX3obFJ2xtUMng4MtGtKz4YHxpkgQbak&s=cNt66GjRD6rVhq7nGcvT76l-0_u2C3UTz9SfwzHf1xw&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=HrR-mBJ82ubcbtBin7NGVl2VenLj726Fcah6-3XFvDs&s=d5YiAyXz4el9bF0zjGL9gVjnTfbX4z-qelZodxRqlz0&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From griznog at gmail.com Wed Feb 14 14:20:32 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 06:20:32 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: Hi Luis, GPFS is 4.2.3 (gpfs.base-4.2.3-6.x86_64), All servers (8 in front of a DDN SFA12K) are RHEL 7.3 (stock DDN setup). All 47 clients are CentOS 7.4. GPFS mount: # mount | grep gpfs gsfs0 on /srv/gsfs0 type gpfs (rw,relatime) NFS mount: mount | grep $HOME 10.210.15.57:/srv/gsfs0/home/griznog on /home/griznog type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.210.15.57,mountvers=3,mountport=20048,mountproto=tcp,local_lock=none,addr=10.210.15.57) Example script: #!/bin/bash cat pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt > /srv/gsfs0/projects/pipetest.tmp.txt grep L1 /srv/gsfs0/projects/pipetest.tmp.txt | wc -l cat pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt pt.txt > $HOME/pipetest.tmp.txt grep L1 $HOME/pipetest.tmp.txt | wc -l Example output: # ./pipetest.sh 1 1836 # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 /srv/gsfs0/projects/pipetest.tmp.txt We can "fix" the user case that exposed this by not using a temp file or inserting a sleep, but I'd still like to know why GPFS is behaving this way and make it stop. mmlsconfig below. Thanks, jbh mmlsconfig Configuration data for cluster SCG-GS.scg-gs0: ---------------------------------------------- clusterName SCG-GS.scg-gs0 clusterId 8456032987852400706 dmapiFileHandleSize 32 maxblocksize 4096K cnfsSharedRoot /srv/gsfs0/GS-NFS cnfsMountdPort 597 socketMaxListenConnections 1024 fileHeatPeriodMinutes 1440 fileHeatLossPercent 1 pingPeriod 5 minMissedPingTimeout 30 afmHashVersion 1 minReleaseLevel 4.2.0.1 [scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7] nsdbufspace 70 [common] healthCheckInterval 20 maxStatCache 512 maxFilesToCache 50000 nsdMinWorkerThreads 512 nsdMaxWorkerThreads 1024 deadlockDetectionThreshold 0 deadlockOverloadThreshold 0 prefetchThreads 288 worker1Threads 320 maxMBpS 2000 [scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7] maxMBpS 24000 [common] atimeDeferredSeconds 300 pitWorkerThreadsPerNode 2 cipherList AUTHONLY pagepool 1G [scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7] pagepool 8G [common] cnfsNFSDprocs 256 nfsPrefetchStrategy 1 autoload yes adminMode central File systems in cluster SCG-GS.scg-gs0: --------------------------------------- /dev/gsfs0 On Tue, Feb 13, 2018 at 10:53 PM, Luis Bolinches wrote: > Sorry > > With cat > > [root at specscale01 IBM_REPO]# cp test a > [root at specscale01 IBM_REPO]# cat a a a a > test && grep ATAG test | wc -l > && sleep 4 && grep ATAG test | wc -l > 0 > 0 > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > Luis Bolinches > Consultant IT Specialist > Mobile Phone: +358503112585 <+358%2050%203112585> > https://www.youracclaim.com/user/luis-bolinches > > "If you always give you will always have" -- Anonymous > > > > From: Luis Bolinches > To: gpfsug main discussion list > Date: 14/02/2018 08:49 > Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by > grep. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi > > This seems to be setup specific > > Care to explain a bit more of the setup. Number of nodes GPFS versions, > number of FS, Networking, running from admin node, server / client, number > of NSD, separated meta and data, etc? > > I got interested and run a quick test on a gpfs far from powerful cluster > of 3 nodes on KVM > > [root at specscale01 IBM_REPO]# echo "a a a a a a a a a a" > test && grep > ATAG test | wc -l && sleep 4 && grep ATAG test | wc -l > 0 > 0 > [root at specscale01 IBM_REPO]# > > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > Luis Bolinches > Consultant IT Specialist > Mobile Phone: +358503112585 <+358%2050%203112585> > *https://www.youracclaim.com/user/luis-bolinches* > > > "If you always give you will always have" -- Anonymous > > > > From: John Hanks > To: gpfsug-discuss > Date: 14/02/2018 07:33 > Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi, > > We have a GPFS filesystem mounted on CentOS 7.4 as type gpfs, pretty > straightforward run of the mill stuff. But are seeing this odd behavior. If > I do this in a shell script, given a file called "a" > > cat a a a a a a a a a a > /path/to/gpfs/mount/test > grep ATAG /path/to/gpfs/mount/test | wc -l > sleep 4 > grep ATAG /path/to/gpfs/mount/test | wc -l > > The first grep | wc -l returns 1, because grep outputs "Binary file > /path/to/gpfs/mount/test matches" > > The second grep | wc -l returns the correct count of ATAG in the file. > > Why does it take 4 seconds (3 isn't enough) for that file to be properly > recognized as a text file and/or why is it seen as a binary file in the > first place since a is a plain text file? > > Note that I have the same filesystem mounted via NFS and over an NFS mount > it works as expected. > > Any illumination is appreciated, > > jbh_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=ut35qIIMxjZMX3obFJ2xtUMng4MtGtKz4YHxpkgQbak&s=cNt66GjRD6rVhq7nGcvT76l-0_u2C3UTz9SfwzHf1xw&e=* > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > 1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=HrR- > mBJ82ubcbtBin7NGVl2VenLj726Fcah6-3XFvDs&s=d5YiAyXz4el9bF0zjGL9gVjnTfbX4z > -qelZodxRqlz0&e= > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Feb 14 15:08:10 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 14 Feb 2018 10:08:10 -0500 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: Message-ID: <11815.1518620890@turing-police.cc.vt.edu> On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > /srv/gsfs0/projects/pipetest.tmp.txt > > We can "fix" the user case that exposed this by not using a temp file or > inserting a sleep, but I'd still like to know why GPFS is behaving this way > and make it stop. May be related to replication, or other behind-the-scenes behavior. Consider this example - 4.2.3.6, data and metadata replication both set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full fiberchannel mesh to 3 Dell MD34something arrays. % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test 4096+0 records in 4096+0 records out 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test Notice that the first /bin/ls shouldn't be starting until after the dd has completed - at which point it's only allocated half the blocks needed to hold the 4M of data at one site. 5 seconds later, it's allocated the blocks at both sites and thus shows the full 8M needed for 2 copies. I've also seen (but haven't replicated it as I write this) a small file (4-8K or so) showing first one full-sized block, then a second full-sized block, and then dropping back to what's needed for 2 1/32nd fragments. That had me scratching my head Having said that, that's all metadata fun and games, while your case appears to have some problems with data integrity (which is a whole lot scarier). It would be *really* nice if we understood the problem here. The scariest part is: > The first grep | wc -l returns 1, because grep outputs ?"Binary file /path/to/ > gpfs/mount/test matches" which seems to be implying that we're failing on semantic consistency. Basically, your 'cat' command is completing and closing the file, but then a temporally later open of the same find is reading something other that only the just-written data. My first guess is that it's a race condition similar to the following: The cat command is causing a write on one NSD server, and the first grep results in a read from a *different* NSD server, returning the data that *used* to be in the block because the read actually happens before the first NSD server actually completes the write. It may be interesting to replace the grep's with pairs of 'ls -ls / dd' commands to grab the raw data and its size, and check the following: 1) does the size (both blocks allocated and logical length) reported by ls match the amount of data actually read by the dd? 2) Is the file length as actually read equal to the written length, or does it overshoot and read all the way to the next block boundary? 3) If the length is correct, what's wrong with the data that's telling grep that it's a binary file? ( od -cx is your friend here). 4) If it overshoots, is the remainder all-zeros (good) or does it return semi-random "what used to be there" data (bad, due to data exposure issues)? (It's certainly not the most perplexing data consistency issue I've hit in 4 decades - the winner *has* to be a intermittent data read corruption on a GPFS 3.5 cluster that had us, IBM, SGI, DDN, and at least one vendor of networking gear all chasing our tails for 18 months before we finally tracked it down. :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From griznog at gmail.com Wed Feb 14 15:21:52 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 07:21:52 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: <11815.1518620890@turing-police.cc.vt.edu> References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Hi Valdis, I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think this is a data integrity issue, thankfully: $ ./pipetestls.sh 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 /srv/gsfs0/projects/pipetest.tmp.txt 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt $ ./pipetestmd5.sh 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt And replacing grep with 'file' even properly sees the files as ASCII: $ ./pipetestfile.sh /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines I'll poke a little harder at grep next and see what the difference in strace of each reveals. Thanks, jbh On Wed, Feb 14, 2018 at 7:08 AM, wrote: > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this > way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep > 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the dd has > completed - at which point it's only allocated half the blocks needed to > hold > the 4M of data at one site. 5 seconds later, it's allocated the blocks at > both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small file > (4-8K > or so) showing first one full-sized block, then a second full-sized block, > and > then dropping back to what's needed for 2 1/32nd fragments. That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier). It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs "Binary file > /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, but then > a > temporally later open of the same find is reading something other that > only the > just-written data. My first guess is that it's a race condition similar > to the > following: The cat command is causing a write on one NSD server, and the > first > grep results in a read from a *different* NSD server, returning the data > that > *used* to be in the block because the read actually happens before the > first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / dd' > commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, or > does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's telling > grep that > it's a binary file? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it return > semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've hit in > 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear all > chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Feb 14 15:33:24 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 14 Feb 2018 17:33:24 +0200 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Hi not going to mention much on DDN setups but first thing that makes my eyes blurry a bit is minReleaseLevel 4.2.0.1 when you mention your whole cluster is already on 4.2.3 -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous From: John Hanks To: gpfsug main discussion list Date: 14/02/2018 17:22 Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Valdis, I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think this is a data integrity issue, thankfully: $ ./pipetestls.sh 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 /srv/gsfs0/projects/pipetest.tmp.txt 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt $ ./pipetestmd5.sh 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt And replacing grep with 'file' even properly sees the files as ASCII: $ ./pipetestfile.sh /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines I'll poke a little harder at grep next and see what the difference in strace of each reveals. Thanks, jbh On Wed, Feb 14, 2018 at 7:08 AM, wrote: On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > /srv/gsfs0/projects/pipetest.tmp.txt > > We can "fix" the user case that exposed this by not using a temp file or > inserting a sleep, but I'd still like to know why GPFS is behaving this way > and make it stop. May be related to replication, or other behind-the-scenes behavior. Consider this example - 4.2.3.6, data and metadata replication both set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full fiberchannel mesh to 3 Dell MD34something arrays. % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test 4096+0 records in 4096+0 records out 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test Notice that the first /bin/ls shouldn't be starting until after the dd has completed - at which point it's only allocated half the blocks needed to hold the 4M of data at one site. 5 seconds later, it's allocated the blocks at both sites and thus shows the full 8M needed for 2 copies. I've also seen (but haven't replicated it as I write this) a small file (4-8K or so) showing first one full-sized block, then a second full-sized block, and then dropping back to what's needed for 2 1/32nd fragments. That had me scratching my head Having said that, that's all metadata fun and games, while your case appears to have some problems with data integrity (which is a whole lot scarier). It would be *really* nice if we understood the problem here. The scariest part is: > The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/ > gpfs/mount/test matches" which seems to be implying that we're failing on semantic consistency. Basically, your 'cat' command is completing and closing the file, but then a temporally later open of the same find is reading something other that only the just-written data. My first guess is that it's a race condition similar to the following: The cat command is causing a write on one NSD server, and the first grep results in a read from a *different* NSD server, returning the data that *used* to be in the block because the read actually happens before the first NSD server actually completes the write. It may be interesting to replace the grep's with pairs of 'ls -ls / dd' commands to grab the raw data and its size, and check the following: 1) does the size (both blocks allocated and logical length) reported by ls match the amount of data actually read by the dd? 2) Is the file length as actually read equal to the written length, or does it overshoot and read all the way to the next block boundary? 3) If the length is correct, what's wrong with the data that's telling grep that it's a binary file? ( od -cx is your friend here). 4) If it overshoots, is the remainder all-zeros (good) or does it return semi-random "what used to be there" data (bad, due to data exposure issues)? (It's certainly not the most perplexing data consistency issue I've hit in 4 decades - the winner *has* to be a intermittent data read corruption on a GPFS 3.5 cluster that had us, IBM, SGI, DDN, and at least one vendor of networking gear all chasing our tails for 18 months before we finally tracked it down. :) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_UFKMxNklx_00YDdSlmEr9lCvnUC9AWFsTVbTn6yAr4&s=JUVyUiTIfln67di06lb-hvwpA8207JNkioGxY1ayAlE&e= Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Feb 14 17:51:04 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 14 Feb 2018 12:51:04 -0500 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Just speculating here (also known as making things up) but I wonder if grep is somehow using the file's size in its determination of binary status. I also see mmap in the strace so maybe there's some issue with mmap where some internal GPFS buffer is getting truncated inappropriately but leaving a bunch of null values which gets returned to grep. -Aaron On 2/14/18 10:21 AM, John Hanks wrote: > Hi Valdis, > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > this is a data integrity issue, thankfully: > > $ ./pipetestls.sh? > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > /srv/gsfs0/projects/pipetest.tmp.txt > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt > > $ ./pipetestmd5.sh? > 15cb81a85c9e450bdac8230309453a0a? /srv/gsfs0/projects/pipetest.tmp.txt > 15cb81a85c9e450bdac8230309453a0a? /home/griznog/pipetest.tmp.txt > > And replacing grep with 'file' even properly sees the files as ASCII: > $ ./pipetestfile.sh? > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > I'll poke a little harder at grep next and see what the difference in > strace of each reveals. > > Thanks, > > jbh > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > wrote: > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > #? ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the > dd has > completed - at which point it's only allocated half the blocks > needed to hold > the 4M of data at one site.? 5 seconds later, it's allocated the > blocks at both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small > file (4-8K > or so) showing first one full-sized block, then a second full-sized > block, and > then dropping back to what's needed for 2 1/32nd fragments.? That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier).? It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs ?"Binary file /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, > but then a > temporally later open of the same find is reading something other > that only the > just-written data.? My first guess is that it's a race condition > similar to the > following: The cat command is causing a write on one NSD server, and > the first > grep results in a read from a *different* NSD server, returning the > data that > *used* to be in the block because the read actually happens before > the first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / > dd' commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, > or does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's > telling grep that > it's a binary file?? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it > return semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've > hit in 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear > all chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From griznog at gmail.com Wed Feb 14 18:30:39 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 10:30:39 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Straces are interesting, but don't immediately open my eyes: strace of grep on NFS (works as expected) openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 530721 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3bf6c43000 write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 strace on GPFS (thinks file is binary) openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 262144 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd45ee88000 close(3) = 0 write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches ) = 72 Do the lseek() results indicate that the grep on the GPFS mounted version thinks the file is a sparse file? For comparison I strace'd md5sum in place of the grep and it does not lseek() with SEEK_HOLE, it's access in both cases look identical, like: open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb7d2c2b000 read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 ...[reads clipped]... read(3, "", 24576) = 0 lseek(3, 0, SEEK_CUR) = 530721 close(3) = 0 jbh On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister wrote: > Just speculating here (also known as making things up) but I wonder if > grep is somehow using the file's size in its determination of binary > status. I also see mmap in the strace so maybe there's some issue with > mmap where some internal GPFS buffer is getting truncated > inappropriately but leaving a bunch of null values which gets returned > to grep. > > -Aaron > > On 2/14/18 10:21 AM, John Hanks wrote: > > Hi Valdis, > > > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > > this is a data integrity issue, thankfully: > > > > $ ./pipetestls.sh > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > > /srv/gsfs0/projects/pipetest.tmp.txt > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 > /home/griznog/pipetest.tmp.txt > > > > $ ./pipetestmd5.sh > > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > > > And replacing grep with 'file' even properly sees the files as ASCII: > > $ ./pipetestfile.sh > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > > > I'll poke a little harder at grep next and see what the difference in > > strace of each reveals. > > > > Thanks, > > > > jbh > > > > > > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > > wrote: > > > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt > $HOME/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > > > We can "fix" the user case that exposed this by not using a temp > file or > > > inserting a sleep, but I'd still like to know why GPFS is behaving > this way > > > and make it stop. > > > > May be related to replication, or other behind-the-scenes behavior. > > > > Consider this example - 4.2.3.6, data and metadata replication both > > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > > a full > > fiberchannel mesh to 3 Dell MD34something arrays. > > > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > > 4096+0 records in > > 4096+0 records out > > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > > Notice that the first /bin/ls shouldn't be starting until after the > > dd has > > completed - at which point it's only allocated half the blocks > > needed to hold > > the 4M of data at one site. 5 seconds later, it's allocated the > > blocks at both > > sites and thus shows the full 8M needed for 2 copies. > > > > I've also seen (but haven't replicated it as I write this) a small > > file (4-8K > > or so) showing first one full-sized block, then a second full-sized > > block, and > > then dropping back to what's needed for 2 1/32nd fragments. That > had me > > scratching my head > > > > Having said that, that's all metadata fun and games, while your case > > appears to have some problems with data integrity (which is a whole > lot > > scarier). It would be *really* nice if we understood the problem > here. > > > > The scariest part is: > > > > > The first grep | wc -l returns 1, because grep outputs "Binary > file /path/to/ > > > gpfs/mount/test matches" > > > > which seems to be implying that we're failing on semantic > consistency. > > Basically, your 'cat' command is completing and closing the file, > > but then a > > temporally later open of the same find is reading something other > > that only the > > just-written data. My first guess is that it's a race condition > > similar to the > > following: The cat command is causing a write on one NSD server, and > > the first > > grep results in a read from a *different* NSD server, returning the > > data that > > *used* to be in the block because the read actually happens before > > the first > > NSD server actually completes the write. > > > > It may be interesting to replace the grep's with pairs of 'ls -ls / > > dd' commands to grab the > > raw data and its size, and check the following: > > > > 1) does the size (both blocks allocated and logical length) reported > by > > ls match the amount of data actually read by the dd? > > > > 2) Is the file length as actually read equal to the written length, > > or does it > > overshoot and read all the way to the next block boundary? > > > > 3) If the length is correct, what's wrong with the data that's > > telling grep that > > it's a binary file? ( od -cx is your friend here). > > > > 4) If it overshoots, is the remainder all-zeros (good) or does it > > return semi-random > > "what used to be there" data (bad, due to data exposure issues)? > > > > (It's certainly not the most perplexing data consistency issue I've > > hit in 4 decades - the > > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > > cluster that > > had us, IBM, SGI, DDN, and at least one vendor of networking gear > > all chasing our > > tails for 18 months before we finally tracked it down. :) > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 14 09:00:10 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 14 Feb 2018 09:00:10 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions Message-ID: I am sure this is a known behavior and I am going to feel very foolish in a few minutes... We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Feb 14 18:38:41 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 14 Feb 2018 18:38:41 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions In-Reply-To: References: Message-ID: Is it an AFM cache? We see this sort of behaviour occasionally where the cache has an "old" view of the directory. Doing an ls, it evidently goes back to home but by then you already have weird stuff. The next ls is usually fine. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of john.hearns at asml.com [john.hearns at asml.com] Sent: 14 February 2018 09:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] Odd d????????? permissions I am sure this is a known behavior and I am going to feel very foolish in a few minutes? We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From bbanister at jumptrading.com Wed Feb 14 18:48:32 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 14 Feb 2018 18:48:32 +0000 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Hi all, We found this a while back and IBM fixed it. Here?s your answer: http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385 Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hanks Sent: Wednesday, February 14, 2018 12:31 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. Note: External Email ________________________________ Straces are interesting, but don't immediately open my eyes: strace of grep on NFS (works as expected) openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 530721 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3bf6c43000 write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 strace on GPFS (thinks file is binary) openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl for device) read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 lseek(3, 32768, SEEK_HOLE) = 262144 lseek(3, 32768, SEEK_SET) = 32768 fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd45ee88000 close(3) = 0 write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches ) = 72 Do the lseek() results indicate that the grep on the GPFS mounted version thinks the file is a sparse file? For comparison I strace'd md5sum in place of the grep and it does not lseek() with SEEK_HOLE, it's access in both cases look identical, like: open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb7d2c2b000 read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 ...[reads clipped]... read(3, "", 24576) = 0 lseek(3, 0, SEEK_CUR) = 530721 close(3) = 0 jbh On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister > wrote: Just speculating here (also known as making things up) but I wonder if grep is somehow using the file's size in its determination of binary status. I also see mmap in the strace so maybe there's some issue with mmap where some internal GPFS buffer is getting truncated inappropriately but leaving a bunch of null values which gets returned to grep. -Aaron On 2/14/18 10:21 AM, John Hanks wrote: > Hi Valdis, > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > this is a data integrity issue, thankfully: > > $ ./pipetestls.sh > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > /srv/gsfs0/projects/pipetest.tmp.txt > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt > > $ ./pipetestmd5.sh > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > And replacing grep with 'file' even properly sees the files as ASCII: > $ ./pipetestfile.sh > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > I'll poke a little harder at grep next and see what the difference in > strace of each reveals. > > Thanks, > > jbh > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > >> wrote: > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the > dd has > completed - at which point it's only allocated half the blocks > needed to hold > the 4M of data at one site. 5 seconds later, it's allocated the > blocks at both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small > file (4-8K > or so) showing first one full-sized block, then a second full-sized > block, and > then dropping back to what's needed for 2 1/32nd fragments. That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier). It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs "Binary file /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, > but then a > temporally later open of the same find is reading something other > that only the > just-written data. My first guess is that it's a race condition > similar to the > following: The cat command is causing a write on one NSD server, and > the first > grep results in a read from a *different* NSD server, returning the > data that > *used* to be in the block because the read actually happens before > the first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / > dd' commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, > or does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's > telling grep that > it's a binary file? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it > return semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've > hit in 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear > all chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From griznog at gmail.com Wed Feb 14 19:17:19 2018 From: griznog at gmail.com (John Hanks) Date: Wed, 14 Feb 2018 11:17:19 -0800 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. In-Reply-To: References: <11815.1518620890@turing-police.cc.vt.edu> Message-ID: Thanks Bryan, mystery solved :) We also stumbled across these related items, in case anyone else wanders into this thread. http://bug-grep.gnu.narkive.com/Y8cfvWDt/bug-27666-grep-on-gpfs-filesystem-seek-hole-problem https://www.ibm.com/developerworks/community/forums/html/topic?id=c2a94433-9ec0-4a4b-abfe-d0a1e721d630 GPFS, the gift that keeps on giving ... me more things to do instead of doing the things I want to be doing. Thanks all, jbh On Wed, Feb 14, 2018 at 10:48 AM, Bryan Banister wrote: > Hi all, > > > > We found this a while back and IBM fixed it. Here?s your answer: > http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385 > > > > Cheers, > > -Bryan > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss- > bounces at spectrumscale.org] *On Behalf Of *John Hanks > *Sent:* Wednesday, February 14, 2018 12:31 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Odd behavior with cat followed by grep. > > > > *Note: External Email* > ------------------------------ > > Straces are interesting, but don't immediately open my eyes: > > > > strace of grep on NFS (works as expected) > > > > openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl > for device) > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > lseek(3, 32768, SEEK_HOLE) = 530721 > > lseek(3, 32768, SEEK_SET) = 32768 > > fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7f3bf6c43000 > > write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 > > > > strace on GPFS (thinks file is binary) > > > > openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl > for device) > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > lseek(3, 32768, SEEK_HOLE) = 262144 > > lseek(3, 32768, SEEK_SET) = 32768 > > fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7fd45ee88000 > > close(3) = 0 > > write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file > /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches > > ) = 72 > > > > Do the lseek() results indicate that the grep on the GPFS mounted version > thinks the file is a sparse file? For comparison I strace'd md5sum in place > of the grep and it does not lseek() with SEEK_HOLE, it's access in both > cases look identical, like: > > > > open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7fb7d2c2b000 > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > ...[reads clipped]... > > read(3, "", 24576) = 0 > > lseek(3, 0, SEEK_CUR) = 530721 > > close(3) = 0 > > > > > > jbh > > > > > > On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister > wrote: > > Just speculating here (also known as making things up) but I wonder if > grep is somehow using the file's size in its determination of binary > status. I also see mmap in the strace so maybe there's some issue with > mmap where some internal GPFS buffer is getting truncated > inappropriately but leaving a bunch of null values which gets returned > to grep. > > -Aaron > > On 2/14/18 10:21 AM, John Hanks wrote: > > Hi Valdis, > > > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > > this is a data integrity issue, thankfully: > > > > $ ./pipetestls.sh > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > > /srv/gsfs0/projects/pipetest.tmp.txt > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 > /home/griznog/pipetest.tmp.txt > > > > $ ./pipetestmd5.sh > > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > > > And replacing grep with 'file' even properly sees the files as ASCII: > > $ ./pipetestfile.sh > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > > > I'll poke a little harder at grep next and see what the difference in > > strace of each reveals. > > > > Thanks, > > > > jbh > > > > > > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > > > wrote: > > > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt > $HOME/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > > > We can "fix" the user case that exposed this by not using a temp > file or > > > inserting a sleep, but I'd still like to know why GPFS is behaving > this way > > > and make it stop. > > > > May be related to replication, or other behind-the-scenes behavior. > > > > Consider this example - 4.2.3.6, data and metadata replication both > > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > > a full > > fiberchannel mesh to 3 Dell MD34something arrays. > > > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > > 4096+0 records in > > 4096+0 records out > > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > > Notice that the first /bin/ls shouldn't be starting until after the > > dd has > > completed - at which point it's only allocated half the blocks > > needed to hold > > the 4M of data at one site. 5 seconds later, it's allocated the > > blocks at both > > sites and thus shows the full 8M needed for 2 copies. > > > > I've also seen (but haven't replicated it as I write this) a small > > file (4-8K > > or so) showing first one full-sized block, then a second full-sized > > block, and > > then dropping back to what's needed for 2 1/32nd fragments. That > had me > > scratching my head > > > > Having said that, that's all metadata fun and games, while your case > > appears to have some problems with data integrity (which is a whole > lot > > scarier). It would be *really* nice if we understood the problem > here. > > > > The scariest part is: > > > > > The first grep | wc -l returns 1, because grep outputs "Binary > file /path/to/ > > > gpfs/mount/test matches" > > > > which seems to be implying that we're failing on semantic > consistency. > > Basically, your 'cat' command is completing and closing the file, > > but then a > > temporally later open of the same find is reading something other > > that only the > > just-written data. My first guess is that it's a race condition > > similar to the > > following: The cat command is causing a write on one NSD server, and > > the first > > grep results in a read from a *different* NSD server, returning the > > data that > > *used* to be in the block because the read actually happens before > > the first > > NSD server actually completes the write. > > > > It may be interesting to replace the grep's with pairs of 'ls -ls / > > dd' commands to grab the > > raw data and its size, and check the following: > > > > 1) does the size (both blocks allocated and logical length) reported > by > > ls match the amount of data actually read by the dd? > > > > 2) Is the file length as actually read equal to the written length, > > or does it > > overshoot and read all the way to the next block boundary? > > > > 3) If the length is correct, what's wrong with the data that's > > telling grep that > > it's a binary file? ( od -cx is your friend here). > > > > 4) If it overshoots, is the remainder all-zeros (good) or does it > > return semi-random > > "what used to be there" data (bad, due to data exposure issues)? > > > > (It's certainly not the most perplexing data consistency issue I've > > hit in 4 decades - the > > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > > cluster that > > had us, IBM, SGI, DDN, and at least one vendor of networking gear > > all chasing our > > tails for 18 months before we finally tracked it down. :) > > > > _______________________________________________ > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Feb 14 20:54:04 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 14 Feb 2018 20:54:04 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions In-Reply-To: References: Message-ID: <9C726F78-D870-4E1E-92B6-96F495F53D54@vanderbilt.edu> Hi John, We had a similar incident happen just a week or so ago here, although in our case it was that certain files within a directory showed up with the question marks, while others didn?t. The problem was simply that the node had been run out of RAM and the GPFS daemon couldn?t allocate memory. Killing the offending process(es) and restarting GPFS fixed the issue. We saw hundreds of messages like: 2018-02-07_16:35:13.267-0600: [E] Failed to allocate 92274688 bytes in memory pool, err -1 In the GPFS log when this was happening. HTHAL? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Feb 14, 2018, at 12:38 PM, Simon Thompson (IT Research Support) > wrote: Is it an AFM cache? We see this sort of behaviour occasionally where the cache has an "old" view of the directory. Doing an ls, it evidently goes back to home but by then you already have weird stuff. The next ls is usually fine. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of john.hearns at asml.com [john.hearns at asml.com] Sent: 14 February 2018 09:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] Odd d????????? permissions I am sure this is a known behavior and I am going to feel very foolish in a few minutes? We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9df4b4d88544447ac29608d573da2d51%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636542303262503651&sdata=v6pnBIEvu6lyP3mGkkRX7hSj58H8vvkUl6R%2FCsq6gmc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From coetzee.ray at gmail.com Wed Feb 14 20:59:52 2018 From: coetzee.ray at gmail.com (Ray Coetzee) Date: Wed, 14 Feb 2018 20:59:52 +0000 Subject: [gpfsug-discuss] Scale 5, filesystem guidelines Message-ID: Since Scale 5.0 was released I've not seen much guidelines provided on how to make the best of the new filesystem layout. For example, is dedicated metadata SSD's still recommended or does the Scale 5 improvements mean we can just do metadata and data pools now? I'd be interested to hear of anyone's experience so far. Kind regards Ray Coetzee -------------- next part -------------- An HTML attachment was scrubbed... URL: From sxiao at us.ibm.com Wed Feb 14 21:53:17 2018 From: sxiao at us.ibm.com (Steve Xiao) Date: Wed, 14 Feb 2018 16:53:17 -0500 Subject: [gpfsug-discuss] Odd behavior with cat followed by grep. (John Hanks) In-Reply-To: References: Message-ID: This could be related to the following flash: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1012054 You should contact IBM service to obtain the fix for your release. Steve Y. Xiao gpfsug-discuss-bounces at spectrumscale.org wrote on 02/14/2018 02:18:02 PM: > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 02/14/2018 02:18 PM > Subject: gpfsug-discuss Digest, Vol 73, Issue 36 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Odd behavior with cat followed by grep. (John Hanks) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 14 Feb 2018 11:17:19 -0800 > From: John Hanks > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep. > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Thanks Bryan, mystery solved :) > > We also stumbled across these related items, in case anyone else wanders > into this thread. > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__bug-2Dgrep.gnu.narkive.com_Y8cfvWDt_bug-2D27666-2Dgrep-2Don-2Dgpfs-2Dfilesystem-2Dseek-2Dhole-2Dproblem&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=FgxYBxqHZ0bHdWirEs1U_B3oDpeHJe8iRd- > TYrXh6FI&e= > > https://www.ibm.com/developerworks/community/forums/html/topic? > id=c2a94433-9ec0-4a4b-abfe-d0a1e721d630 > > GPFS, the gift that keeps on giving ... me more things to do instead of > doing the things I want to be doing. > > Thanks all, > > jbh > > On Wed, Feb 14, 2018 at 10:48 AM, Bryan Banister > wrote: > > > Hi all, > > > > > > > > We found this a while back and IBM fixed it. Here?s your answer: > > http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385 > > > > > > > > Cheers, > > > > -Bryan > > > > > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss- > > bounces at spectrumscale.org] *On Behalf Of *John Hanks > > *Sent:* Wednesday, February 14, 2018 12:31 PM > > *To:* gpfsug main discussion list > > *Subject:* Re: [gpfsug-discuss] Odd behavior with cat followed by grep. > > > > > > > > *Note: External Email* > > ------------------------------ > > > > Straces are interesting, but don't immediately open my eyes: > > > > > > > > strace of grep on NFS (works as expected) > > > > > > > > openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > > > ioctl(3, TCGETS, 0x7ffe2c26b0b0) = -1 ENOTTY (Inappropriate ioctl > > for device) > > > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > > > lseek(3, 32768, SEEK_HOLE) = 530721 > > > > lseek(3, 32768, SEEK_SET) = 32768 > > > > fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0 > > > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0x7f3bf6c43000 > > > > write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1 > > > > > > > > strace on GPFS (thinks file is binary) > > > > > > > > openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3 > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > > > ioctl(3, TCGETS, 0x7ffc9b52caa0) = -1 ENOTTY (Inappropriate ioctl > > for device) > > > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > > > lseek(3, 32768, SEEK_HOLE) = 262144 > > > > lseek(3, 32768, SEEK_SET) = 32768 > > > > fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0 > > > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0x7fd45ee88000 > > > > close(3) = 0 > > > > write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file > > /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches > > > > ) = 72 > > > > > > > > Do the lseek() results indicate that the grep on the GPFS mounted version > > thinks the file is a sparse file? For comparison I strace'd md5sum in place > > of the grep and it does not lseek() with SEEK_HOLE, it's access in both > > cases look identical, like: > > > > > > > > open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3 > > > > fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0 > > > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0x7fb7d2c2b000 > > > > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768 > > > > ...[reads clipped]... > > > > read(3, "", 24576) = 0 > > > > lseek(3, 0, SEEK_CUR) = 530721 > > > > close(3) = 0 > > > > > > > > > > > > jbh > > > > > > > > > > > > On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister > > wrote: > > > > Just speculating here (also known as making things up) but I wonder if > > grep is somehow using the file's size in its determination of binary > > status. I also see mmap in the strace so maybe there's some issue with > > mmap where some internal GPFS buffer is getting truncated > > inappropriately but leaving a bunch of null values which gets returned > > to grep. > > > > -Aaron > > > > On 2/14/18 10:21 AM, John Hanks wrote: > > > Hi Valdis, > > > > > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think > > > this is a data integrity issue, thankfully: > > > > > > $ ./pipetestls.sh > > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 > > /home/griznog/pipetest.tmp.txt > > > > > > $ ./pipetestmd5.sh > > > 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt > > > 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt > > > > > > And replacing grep with 'file' even properly sees the files as ASCII: > > > $ ./pipetestfile.sh > > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines > > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines > > > > > > I'll poke a little harder at grep next and see what the difference in > > > strace of each reveals. > > > > > > Thanks, > > > > > > jbh > > > > > > > > > > > > > > > On Wed, Feb 14, 2018 at 7:08 AM, > > > > > wrote: > > > > > > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > > > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt > > $HOME/pipetest.tmp.txt > > > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > > /home/griznog/pipetest.tmp.txt > > > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > > > > > We can "fix" the user case that exposed this by not using a temp > > file or > > > > inserting a sleep, but I'd still like to know why GPFS is behaving > > this way > > > > and make it stop. > > > > > > May be related to replication, or other behind-the-scenes behavior. > > > > > > Consider this example - 4.2.3.6, data and metadata replication both > > > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with > > > a full > > > fiberchannel mesh to 3 Dell MD34something arrays. > > > > > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; > > > sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test > > > 4096+0 records in > > > 4096+0 records out > > > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > > > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > > > > > Notice that the first /bin/ls shouldn't be starting until after the > > > dd has > > > completed - at which point it's only allocated half the blocks > > > needed to hold > > > the 4M of data at one site. 5 seconds later, it's allocated the > > > blocks at both > > > sites and thus shows the full 8M needed for 2 copies. > > > > > > I've also seen (but haven't replicated it as I write this) a small > > > file (4-8K > > > or so) showing first one full-sized block, then a second full-sized > > > block, and > > > then dropping back to what's needed for 2 1/32nd fragments. That > > had me > > > scratching my head > > > > > > Having said that, that's all metadata fun and games, while your case > > > appears to have some problems with data integrity (which is a whole > > lot > > > scarier). It would be *really* nice if we understood the problem > > here. > > > > > > The scariest part is: > > > > > > > The first grep | wc -l returns 1, because grep outputs "Binary > > file /path/to/ > > > > gpfs/mount/test matches" > > > > > > which seems to be implying that we're failing on semantic > > consistency. > > > Basically, your 'cat' command is completing and closing the file, > > > but then a > > > temporally later open of the same find is reading something other > > > that only the > > > just-written data. My first guess is that it's a race condition > > > similar to the > > > following: The cat command is causing a write on one NSD server, and > > > the first > > > grep results in a read from a *different* NSD server, returning the > > > data that > > > *used* to be in the block because the read actually happens before > > > the first > > > NSD server actually completes the write. > > > > > > It may be interesting to replace the grep's with pairs of 'ls -ls / > > > dd' commands to grab the > > > raw data and its size, and check the following: > > > > > > 1) does the size (both blocks allocated and logical length) reported > > by > > > ls match the amount of data actually read by the dd? > > > > > > 2) Is the file length as actually read equal to the written length, > > > or does it > > > overshoot and read all the way to the next block boundary? > > > > > > 3) If the length is correct, what's wrong with the data that's > > > telling grep that > > > it's a binary file? ( od -cx is your friend here). > > > > > > 4) If it overshoots, is the remainder all-zeros (good) or does it > > > return semi-random > > > "what used to be there" data (bad, due to data exposure issues)? > > > > > > (It's certainly not the most perplexing data consistency issue I've > > > hit in 4 decades - the > > > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > > > cluster that > > > had us, IBM, SGI, DDN, and at least one vendor of networking gear > > > all chasing our > > > tails for 18 months before we finally tracked it down. :) > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > > > gpfsug-discuss at spectrumscale.org urldefense.proofpoint.com/v2/url? > u=http-3A__spectrumscale.org&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=jUBFb8C9yai1TUTu1BVnNTNcOnJXGxupWiEKkEjT4pM&e= > > > > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > > > > > > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > > > > ------------------------------ > > > > Note: This email is for the confidential use of the named addressee(s) > > only and may contain proprietary, confidential or privileged information. > > If you are not the intended recipient, you are hereby notified that any > > review, dissemination or copying of this email is strictly prohibited, and > > to please notify the sender immediately and destroy this email and any > > attachments. Email transmission cannot be guaranteed to be secure or > > error-free. The Company, therefore, does not make any guarantees as to the > > completeness or accuracy of this email or any attachments. This email is > > for informational purposes only and does not constitute a recommendation, > > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > > or perform any type of transaction of a financial product. > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20180214_d62fc203_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=nUcKIKr84CRhS0EbxV5vwjSlEr4p3Wf6Is3EDKvOjJg&e= > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e= > > > End of gpfsug-discuss Digest, Vol 73, Issue 36 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Feb 14 21:54:36 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 14 Feb 2018 21:54:36 +0000 Subject: [gpfsug-discuss] mmchdisk suspend / stop In-Reply-To: References: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu> <1991F4DF-DAC2-4102-9E04-4CB367BBC020@vanderbilt.edu> <1518529381.3326.93.camel@strath.ac.uk> Message-ID: <90827aa7-e03c-7f2c-229a-c9db4c7dc8be@strath.ac.uk> On 13/02/18 15:56, Buterbaugh, Kevin L wrote: > Hi JAB, > > OK, let me try one more time to clarify. I?m not naming the vendor ? > they?re a small maker of commodity storage and we?ve been using their > stuff for years and, overall, it?s been very solid. The problem in > this specific case is that a major version firmware upgrade is > required ? if the controllers were only a minor version apart we > could do it live. > That makes more sense, but still do tell which vendor so I can avoid them. It's 2018 I expect never to need to take my storage down for *ANY* firmware upgrade *EVER* - period. Any vendor that falls short of that needs to go on my naughty list, for specific checking that this is no longer the case before I ever purchase any of their kit. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan at buzzard.me.uk Wed Feb 14 21:47:38 2018 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 14 Feb 2018 21:47:38 +0000 Subject: [gpfsug-discuss] Scale 5, filesystem guidelines In-Reply-To: References: Message-ID: On 14/02/18 20:59, Ray Coetzee wrote: > Since Scale 5.0 was released I've not seen much guidelines provided on > how to make the best of the new filesystem layout. > > For example, is dedicated metadata SSD's still recommended or does the > Scale 5 improvements mean we can just do metadata and data?pools now? > > I'd be interested to?hear of anyone's experience so far. > Well given metadata performance is heavily related to random IO performance I would suspect that dedicated metadata SSD's are still recommended. That is unless you have an all SSD based file system :-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From kkr at lbl.gov Thu Feb 15 01:47:26 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 14 Feb 2018 17:47:26 -0800 Subject: [gpfsug-discuss] RDMA data from Zimon Message-ID: Hi, Can one of the IBMers tell me if port_xmit_data and port_rcv_data from Zimon can be interpreted as RDMA Bytes/sec? Ideally, also how this data is being collected? I?m looking here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1hlp_monnetworksmetrics.htm But then I also look here: https://community.mellanox.com/docs/DOC-2751 and see "Total number of data octets, divided by 4 (lanes), received on all VLs. This is 64 bit counter.? So I wasn?t sure if some multiplication by 4 was in order. Please advise. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Feb 15 09:28:42 2018 From: john.hearns at asml.com (John Hearns) Date: Thu, 15 Feb 2018 09:28:42 +0000 Subject: [gpfsug-discuss] Odd d????????? permissions In-Reply-To: <9C726F78-D870-4E1E-92B6-96F495F53D54@vanderbilt.edu> References: <9C726F78-D870-4E1E-92B6-96F495F53D54@vanderbilt.edu> Message-ID: Simon, Kevin Thankyou for your responses. Simon, indeed we do see this behavior on AFM filesets which have an ?old? view ? and we can watch the AFM fileset change as the information is updated. In this case, this filesystem is not involved with AFM. I Changed the locking semantics from NFSv4 to Posix and the report is that this has solved the problem. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: Wednesday, February 14, 2018 9:54 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Odd d????????? permissions Hi John, We had a similar incident happen just a week or so ago here, although in our case it was that certain files within a directory showed up with the question marks, while others didn?t. The problem was simply that the node had been run out of RAM and the GPFS daemon couldn?t allocate memory. Killing the offending process(es) and restarting GPFS fixed the issue. We saw hundreds of messages like: 2018-02-07_16:35:13.267-0600: [E] Failed to allocate 92274688 bytes in memory pool, err -1 In the GPFS log when this was happening. HTHAL? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Feb 14, 2018, at 12:38 PM, Simon Thompson (IT Research Support) > wrote: Is it an AFM cache? We see this sort of behaviour occasionally where the cache has an "old" view of the directory. Doing an ls, it evidently goes back to home but by then you already have weird stuff. The next ls is usually fine. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of john.hearns at asml.com [john.hearns at asml.com] Sent: 14 February 2018 09:00 To: gpfsug main discussion list Subject: [gpfsug-discuss] Odd d????????? permissions I am sure this is a known behavior and I am going to feel very foolish in a few minutes? We often see this behavior on a GPFS filesystem. I log into a client. [jhearns at pn715 test]$ ls -la ../ ls: cannot access ../..: Permission denied total 160 drwx------ 4 jhearns root 4096 Feb 14 09:46 . d????????? ? ? ? ? ? .. drwxr-xr-x 2 jhearns users 4096 Feb 9 11:13 gpfsperf -rw-r--r-- 1 jhearns users 27336 Feb 9 22:24 iozone.out -rw-r--r-- 1 jhearns users 6083 Feb 9 10:55 IozoneResults.py -rw-r--r-- 1 jhearns users 22959 Feb 9 11:17 iozone.txt -rw-r--r-- 1 jhearns users 2977 Feb 9 10:55 iozone.txtvi -rwxr-xr-x 1 jhearns users 102 Feb 9 10:55 run-iozone.sh drwxr-xr-x 2 jhearns users 4096 Feb 14 09:46 test -r-x------ 1 jhearns users 51504 Feb 9 11:02 tsqosperf This behavior changes after a certain number of minutes, and the .. directory looks normal. For information this filesystem has nfsv4 file locking semantics and ACL semantics set to all -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9df4b4d88544447ac29608d573da2d51%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636542303262503651&sdata=v6pnBIEvu6lyP3mGkkRX7hSj58H8vvkUl6R%2FCsq6gmc%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Feb 15 09:31:34 2018 From: john.hearns at asml.com (John Hearns) Date: Thu, 15 Feb 2018 09:31:34 +0000 Subject: [gpfsug-discuss] Thankyou - d?????? issue Message-ID: Simon, Kevin Thankyou for your responses. Simon, indeed we do see this behavior on AFM filesets which have an 'old' view - and we can watch the AFM fileset change as the information is updated. In this case, this filesystem is not involved with AFM. I changed the locking semantics from NFSv4 to Posix and the report is that this has solved the problem. Sorry for not replying on the thread. The mailing list software reckons I am not who I say I am. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Thu Feb 15 11:58:05 2018 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Thu, 15 Feb 2018 11:58:05 +0000 Subject: [gpfsug-discuss] Registration open for UK SSUG Message-ID: <8f1e98c75e688acf894fc8bb11fe0335@webmail.gpfsug.org> Dear members, The registration page for the next UK Spectrum Scale user group meeting is now live. We're looking forward to seeing you in London on 18th and 19th April where you will have the opportunity to hear the latest Spectrum Scale updates from filesystem experts as well as hear from other users on their experiences. Similar to previous years, we're also holding smaller interactive workshops to allow for more detailed discussion. Thank you for the kind sponsorship from all our sponsors IBM, DDN, E8, Ellexus, Lenovo, NEC, and OCF without which the event would not be possible. To register, please visit the Eventbrite registration page: https://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-2018-registration-41489952565?aff=MailingList [1] We look forward to seeing you in London! -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org Wireless? Wired connection for presenter (for live demo/webcasting?) Are there cameras in the rooms for webcasting at all? Links: ------ [1] https://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-2018-registration-41489952565?aff=MailingList -------------- next part -------------- An HTML attachment was scrubbed... URL: From agar at us.ibm.com Thu Feb 15 17:08:08 2018 From: agar at us.ibm.com (Eric Agar) Date: Thu, 15 Feb 2018 12:08:08 -0500 Subject: [gpfsug-discuss] RDMA data from Zimon In-Reply-To: References: Message-ID: Kristy, I experimented a bit with this some months ago and looked at the ZIMon source code. I came to the conclusion that ZIMon is reporting values obtained from the IB counters (actually, delta values adjusted for time) and that yes, for port_xmit_data and port_rcv_data, one would need to multiply the values by 4 to make sense of them. To obtain a port_xmit_data value, the ZIMon sensor first looks for /sys/class/infiniband//ports//counters_ext/port_xmit_data_64, and if that is not found then looks for /sys/class/infiniband//ports//counters/port_xmit_data. Similarly for other counters/metrics. Full disclosure: I am not an IB expert nor a ZIMon developer. I hope this helps. Eric M. Agar agar at us.ibm.com From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 02/14/2018 08:47 PM Subject: [gpfsug-discuss] RDMA data from Zimon Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Can one of the IBMers tell me if port_xmit_data and port_rcv_data from Zimon can be interpreted as RDMA Bytes/sec? Ideally, also how this data is being collected? I?m looking here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1hlp_monnetworksmetrics.htm But then I also look here: https://community.mellanox.com/docs/DOC-2751 and see "Total number of data octets, divided by 4 (lanes), received on all VLs. This is 64 bit counter.? So I wasn?t sure if some multiplication by 4 was in order. Please advise. Cheers, Kristy_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=zIRb70L9sx_FvvC9IcWVKLOSOOFnx-hIGfjw0kUN7bw&s=D1g4YTG5WeUiHI3rCPr_kkPxbG9V9E-18UGXBeCvfB8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From G.Horton at bham.ac.uk Fri Feb 16 10:28:48 2018 From: G.Horton at bham.ac.uk (Gareth Horton) Date: Fri, 16 Feb 2018 10:28:48 +0000 Subject: [gpfsug-discuss] Hello Message-ID: <85BF558D-7F13-4059-834E-7D655BD17107@bham.ac.uk> Hi All, A short note to introduce myself to all members My name is Gareth Horton and I work at Birmingham University within the Research Computing 'Architecture, Infrastructure and Systems? team I am new to GPFS and HPC, coming from a general Windows / Unix / Linux sys admin background, before moving into VMware server virtualisation and SAN & NAS storage admin. We use GPFS to provide storage and archiving services to researchers for both traditional HPC and cloud (Openstack) environments I?m currently a GPFS novice and I?m hoping to learn a lot from the experience and knowledge of the group and its members Regards Gareth Horton Architecture, Infrastructure and Systems Research Computing- IT Services Computer Centre G5, Elms Road, University of Birmingham B15 2TT g.horton at bham.ac.uk| www.bear.bham.ac.uk| -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Feb 16 18:17:18 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 16 Feb 2018 10:17:18 -0800 Subject: [gpfsug-discuss] Hello In-Reply-To: <85BF558D-7F13-4059-834E-7D655BD17107@bham.ac.uk> References: <85BF558D-7F13-4059-834E-7D655BD17107@bham.ac.uk> Message-ID: <256528BD-CAAC-4D8B-9DD4-B90992D7EFBC@lbl.gov> Welcome Gareth. As a person coming in with fresh eyes, it would be helpful if you let us know if you run into anything that makes you think ?it would be great if there were ?? ?particular documentation, information about UG events, etc. Thanks, Kristy > On Feb 16, 2018, at 2:28 AM, Gareth Horton wrote: > > Hi All, > > A short note to introduce myself to all members > > My name is Gareth Horton and I work at Birmingham University within the Research Computing 'Architecture, Infrastructure and Systems? team > > I am new to GPFS and HPC, coming from a general Windows / Unix / Linux sys admin background, before moving into VMware server virtualisation and SAN & NAS storage admin. > > We use GPFS to provide storage and archiving services to researchers for both traditional HPC and cloud (Openstack) environments > > I?m currently a GPFS novice and I?m hoping to learn a lot from the experience and knowledge of the group and its members > > Regards > > Gareth Horton > > Architecture, Infrastructure and Systems > Research Computing- IT Services > Computer Centre G5, > Elms Road, University of Birmingham > B15 2TT > g.horton at bham.ac.uk | www.bear.bham.ac.uk | > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.raimbach at googlemail.com Mon Feb 19 12:16:43 2018 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Mon, 19 Feb 2018 12:16:43 +0000 Subject: [gpfsug-discuss] GUI reports erroneous NIC errors Message-ID: Hi GUI whizzes, I have a couple of AFM nodes in my cluster with dual-port MLX cards for RDMA. Only the first port on the card is connected to the fabric and the cluster configuration seems correct to me: # mmlsconfig ---8<--- [nsdNodes] verbsPorts mlx5_1/1 [afm] verbsPorts mlx4_1/1 [afm,nsdNodes] verbsRdma enable --->8--- The cluster is working fine, and the mmlfs.log shows me what I expect, i.e. RDMA connections being made over the correct interfaces. Nevertheless the GUI tells me such lies as "Node Degraded" and "ib_rdma_nic_unrecognised" for the second port on the card (which is not explicitly used). Event details are: Event name: ib_rdma_nic_unrecognized Component: Network Entity type: Node Entity name: afm01 Event time: 19/02/18 12:53:39 Message: IB RDMA NIC mlx4_1/2 was not recognized Description: The specified IB RDMA NIC was not correctly recognized for usage by Spectrum Scale Cause: The specified IB RDMA NIC is not reported in 'mmfsadm dump verbs' User action: N/A Reporting node: afm01 Event type: Active health state of an entity which is monitored by the system. Naturally the GUI is for those who like to see reports and this incorrect entry would likely generate a high volume of unwanted questions from such report viewers. How can I bring the GUI reporting back in line with reality? Thanks, Luke. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Feb 19 14:00:49 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 19 Feb 2018 09:00:49 -0500 Subject: [gpfsug-discuss] Configuration advice In-Reply-To: <20180212151155.GD23944@cefeid.wcss.wroc.pl> References: <20180212151155.GD23944@cefeid.wcss.wroc.pl> Message-ID: As I think you understand we can only provide general guidance as regards your questions. If you want a detailed examination of your requirements and a proposal for a solution you will need to engage the appropriate IBM services team. My personal recommendation is to use as few file systems as possible, preferably just one. The reason is that makes general administration, and storage management, easier. If you do use filesets I suggest you use independent filesets because they offer more administrative control than dependent filesets. As for the number of nodes in the cluster that depends on your requirements for performance and availability. If you do have only 2 then you will need a tiebreaker disk to resolve quorum issues should the network between the nodes have problems. If you intend to continue to use HSM I would suggest you use the GPFS policy engine to drive the migrations because it should be more efficient than using HSM directly. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Pawel Dziekonski To: gpfsug-discuss at spectrumscale.org Date: 02/12/2018 10:18 AM Subject: [gpfsug-discuss] Configuration advice Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I inherited from previous admin 2 separate gpfs machines. All hardware+software is old so I want to switch to new servers, new disk arrays, new gpfs version and new gpfs "design". Each machine has 4 gpfs filesystems and runs a TSM HSM client that migrates data to tapes using separate TSM servers: GPFS+HSM no 1 -> TSM server no 1 -> tapes GPFS+HSM no 2 -> TSM server no 2 -> tapes Migration is done by HSM (not GPFS policies). All filesystems are used for archiving results from HPC system and other files (a kind of backup - don't ask...). Data is written by users via nfs shares. There are 8 nfs mount points corresponding to 8 gpfs filesystems, but there is no real reason for that. 4 filesystems are large and heavily used, 4 remaining are almost not used. The question is how to configure new gpfs infrastructure? My initial impression is that I should create a GPFS cluster of 2+ nodes and export NFS using CES. The most important question is how many filesystem do I need? Maybe just 2 and 8 filesets? Or how to do that in a flexible way and not to lock myself in stupid configuration? any hints? thanks, Pawel ps. I will recall all data and copy it to new infrastructure. Yes, that's the way I want to do that. :) -- Pawel Dziekonski , https://urldefense.proofpoint.com/v2/url?u=http-3A__www.wcss.pl&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=-wyO42O-5SDJQfYoGpqeObZNSlFzduC9mlXhsZb65HI&s=__3QSrBGRtG4Rja-QzbpqALX2o8l-67gtrqePi0NrfE&e= Wroclaw Centre for Networking & Supercomputing, HPC Department _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=-wyO42O-5SDJQfYoGpqeObZNSlFzduC9mlXhsZb65HI&s=32gAuk8HDIPkjMjY4L7DB1tFqmJxeaP4ZWIYA_Ya3ts&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 21 09:01:39 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 09:01:39 +0000 Subject: [gpfsug-discuss] GPFS Downloads Message-ID: Would someone else kindly go to this webpage: https://www.ibm.com/support/home/product/10000060/IBM%20Spectrum%20Scale Click on Downloads then confirm you get a choice of two identical Spectrum Scale products. Neither of which has a version fix level you can select on the check box below. I have tried this in Internet Explorer and Chrome. My apology if this is stupidity on my part, but I really would like to download the latest 4.2.3 version with the APAR we need. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Feb 21 09:23:10 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 21 Feb 2018 09:23:10 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: Message-ID: Same for me. What I normally do is just go straight to Fix Central and navigate from there. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 21 February 2018 09:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] GPFS Downloads Would someone else kindly go to this webpage: https://www.ibm.com/support/home/product/10000060/IBM%20Spectrum%20Scale Click on Downloads then confirm you get a choice of two identical Spectrum Scale products. Neither of which has a version fix level you can select on the check box below. I have tried this in Internet Explorer and Chrome. My apology if this is stupidity on my part, but I really would like to download the latest 4.2.3 version with the APAR we need. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 21 08:54:41 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 08:54:41 +0000 Subject: [gpfsug-discuss] Finding all bulletins and APARs Message-ID: Firstly, let me apologise for not thanking people who hav ereplied to me on this list with help. I have indeed replied and thanked you - however the list software has taken a dislike to my email address. I am currently on the myibm support site. I am looking for a specific APAR on Spectrum Scale. However I want to be able to get a list of all APARs and bulletins for Spectrum Scale, right up to date. I do get email alerts but somehow I suspect I am not getting them all, and it is a pain to search back in your email. Thanks John H -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Feb 21 09:31:25 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 21 Feb 2018 09:31:25 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From anencizo at us.ibm.com Wed Feb 21 17:19:09 2018 From: anencizo at us.ibm.com (Angela Encizo) Date: Wed, 21 Feb 2018 17:19:09 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.14571047317701.png Type: image/png Size: 6645 bytes Desc: not available URL: From carlz at us.ibm.com Wed Feb 21 19:54:31 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 21 Feb 2018 19:54:31 +0000 Subject: [gpfsug-discuss] GPFS Downloads In-Reply-To: References: Message-ID: It does look like that link is broken, thanks for letting us know. If you click on the Menu dropdown at the top of the page that says "Downloads" you'll see a link to Fix Central that takes you to the right place. Carl Zetie Offering Manager for Spectrum Scale, IBM (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From valdis.kletnieks at vt.edu Wed Feb 21 20:20:16 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 21 Feb 2018 15:20:16 -0500 Subject: [gpfsug-discuss] GPFS and Wireshark.. Message-ID: <51481.1519244416@turing-police.cc.vt.edu> Has anybody out there done a Wireshark protocol filter for GPFS? Or know where to find enough documentation of the on-the-wire data formats to write even a basic one? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From juantellez at mx1.ibm.com Wed Feb 21 21:20:44 2018 From: juantellez at mx1.ibm.com (Juan Ignacio Tellez Vilchis) Date: Wed, 21 Feb 2018 21:20:44 +0000 Subject: [gpfsug-discuss] SOBAR restore Message-ID: An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Wed Feb 21 21:23:50 2018 From: lgayne at us.ibm.com (Lyle Gayne) Date: Wed, 21 Feb 2018 16:23:50 -0500 Subject: [gpfsug-discuss] SOBAR restore In-Reply-To: References: Message-ID: April Brown should be able to assist. Lyle From: "Juan Ignacio Tellez Vilchis" To: gpfsug-discuss at spectrumscale.org Date: 02/21/2018 04:21 PM Subject: [gpfsug-discuss] SOBAR restore Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, Is there anybody that has some experience with GPFS filesystem restore using SOBAR? I already back filesystem out using SOBAR, but having some troubles with dsmc restore command. Any help would be appreciated! Juan Ignacio Tellez Vilchis Storage Consultant Lab. Services IBM Systems Hardware Phone: 52-55-5270-3218 | Mobile: 52-55-10160692 IBM E-mail: juantellez at mx1.ibm.com Find me on: LinkedIn: http://mx.linkedin.com/in/Ignaciotellez1and within IBM on: IBM Connections: Alfonso Napoles Gandara https://w3-connections.ibm.com/profiles/html/profileView.do?key=2ce9da3f-33ae-4262-9e22-50433170ea46 3111 Mexico City, DIF 01210 Mexico _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=F5mU6o96aI7N9_U21xmoWIM5YmGNLLIi66Drt1r75UY&s=C_BZnOZwvJjElYiXC-xlyQLCNkoD3tUr4qZ2SdPfxok&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57328677.jpg Type: image/jpeg Size: 518 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57450745.jpg Type: image/jpeg Size: 1208 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57307813.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From john.hearns at asml.com Wed Feb 21 16:11:54 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 16:11:54 +0000 Subject: [gpfsug-discuss] mmfind will not exec Message-ID: I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won't work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +' So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Feb 22 01:26:22 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 21 Feb 2018 20:26:22 -0500 Subject: [gpfsug-discuss] mmfind will not exec In-Reply-To: References: Message-ID: Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Wed Feb 21 16:22:07 2018 From: john.hearns at asml.com (John Hearns) Date: Wed, 21 Feb 2018 16:22:07 +0000 Subject: [gpfsug-discuss] mmfind - a ps. Message-ID: Ps. Her is how to get mmfind to run some operation on the files it finds. (I installed mmfind in /usr/local/bin) I find this very hacky, though I suppose it is idiomatic bash #!/bin/bash while read filename do echo -n $filename " " done <<< "`/usr/local/bin/mmfind /hpc/bscratch -type f`" -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ola.Pontusson at kb.se Thu Feb 22 06:23:37 2018 From: Ola.Pontusson at kb.se (Ola Pontusson) Date: Thu, 22 Feb 2018 06:23:37 +0000 Subject: [gpfsug-discuss] SOBAR restore In-Reply-To: References: Message-ID: Hi The SOBAR is documented with Spectrum Scale on IBMs website and if you follow thoose instructions there should be no problem (unless you bump into some of the errors in SOBAR). Have you done your mmimgbackup with TSM and sent the image to TSM and that?s why you try the dsmc restore? The only time I used dsmc restore is if I send the image to TSM. If you don?t send to TSM the image is where you put it and can be moved where you want it. The whole point of SOBAR is to use dsmmigrate so all files as HSM out to TSM not backuped. Just one question, if you do a mmlsfs filesystem ?V which version is your filesystem created with and what level is your Spectrum Scale running where you tries to perform restore? Sincerely, Ola Pontusson IT-Specialist National Library of Sweden Fr?n: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] F?r Juan Ignacio Tellez Vilchis Skickat: den 21 februari 2018 22:21 Till: gpfsug-discuss at spectrumscale.org ?mne: [gpfsug-discuss] SOBAR restore Hello, Is there anybody that has some experience with GPFS filesystem restore using SOBAR? I already back filesystem out using SOBAR, but having some troubles with dsmc restore command. Any help would be appreciated! Juan Ignacio Tellez Vilchis Storage Consultant Lab. Services IBM Systems Hardware ________________________________ Phone: 52-55-5270-3218 | Mobile: 52-55-10160692 E-mail: juantellez at mx1.ibm.com Find me on: [LinkedIn: http://mx.linkedin.com/in/Ignaciotellez1] and within IBM on: [IBM Connections: https://w3-connections.ibm.com/profiles/html/profileView.do?key=2ce9da3f-33ae-4262-9e22-50433170ea46] [IBM] Alfonso Napoles Gandara 3111 Mexico City, DIF 01210 Mexico -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Feb 22 09:01:43 2018 From: john.hearns at asml.com (John Hearns) Date: Thu, 22 Feb 2018 09:01:43 +0000 Subject: [gpfsug-discuss] mmfind will not exec In-Reply-To: References: Message-ID: Stupid me. The space between the {} and \; is significant. /usr/local/bin/mmfind /hpc/bscratch -type f -exec /bin/ls {} \; Still would be nice to have the documentation clarified please. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Thursday, February 22, 2018 2:26 AM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind will not exec Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns > To: gpfsug main discussion list > Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 22 14:20:32 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 22 Feb 2018 09:20:32 -0500 Subject: [gpfsug-discuss] mmfind -ls In-Reply-To: References: Message-ID: Leaving aside the -exec option, and whether you choose classic find or mmfind, why not just use the -ls option - same output, less overhead... mmfind pathname -type f -ls From: John Hearns To: gpfsug main discussion list Cc: "gpfsug-discuss-bounces at spectrumscale.org" Date: 02/22/2018 04:03 AM Subject: Re: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org Stupid me. The space between the {} and \; is significant. /usr/local/bin/mmfind /hpc/bscratch -type f -exec /bin/ls {} \; Still would be nice to have the documentation clarified please. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Thursday, February 22, 2018 2:26 AM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind will not exec Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=77Whh54a5VWNFaaczlMhEzn7B802MGX9m-C2xj4sP1k&s=L4bZlOcrZLwkyth7maRTEmms7Ftarchh_DkBvdTEF7w&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Feb 22 14:27:28 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 22 Feb 2018 09:27:28 -0500 Subject: [gpfsug-discuss] mmfind - Use mmfind ... -xargs In-Reply-To: References: Message-ID: More recent versions of mmfind support an -xargs option... Run mmfind --help and see: -xargs [-L maxlines] [-I rplstr] COMMAND Similar to find ... | xargs [-L x] [-I r] COMMAND but COMMAND executions may run in parallel. This is preferred to -exec. With -xargs mmfind will run the COMMANDs in phase subject to mmapplypolicy options -m, -B, -N. Must be the last option to mmfind This gives you the fully parallelized power of mmapplypolicy without having to write SQL rules nor scripts. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 11:00 PM Subject: [gpfsug-discuss] mmfind - a ps. Sent by: gpfsug-discuss-bounces at spectrumscale.org Ps. Her is how to get mmfind to run some operation on the files it finds. (I installed mmfind in /usr/local/bin) I find this very hacky, though I suppose it is idiomatic bash #!/bin/bash while read filename do echo -n $filename " " done <<< "`/usr/local/bin/mmfind /hpc/bscratch -type f`" -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=vbcae5NoH6gMQCovOqRVJVgj9jJ2USmq47GHxVn6En8&s=F_GqjJRzSzubUSXpcjysWCwCjhVKO9YrbUdzjusY0SY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Thu Feb 22 19:58:48 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 14:58:48 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Message-ID: Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. At the end of this .. the workflow that i am targeting is where: ? If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. ? I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. I have tried LROC too, but it does not behave the same way as i guess AFM works. Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. Regards, Lohit -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Thu Feb 22 20:26:58 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 15:26:58 -0500 Subject: [gpfsug-discuss] GPFS, MMAP and Pagepool Message-ID: Hi all, I wanted to know, how does mmap interact with GPFS pagepool with respect to filesystem block-size? Does the efficiency depend on the mmap read size and the block-size of the filesystem even if all the data is cached in pagepool? GPFS 4.2.3.2 and CentOS7. Here is what i observed: I was testing a user script that uses mmap to read from 100M to 500MB files. The above files are stored on 3 different filesystems. Compute nodes - 10G pagepool and 5G seqdiscardthreshold. 1. 4M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the required files fully cached" from the above GPFS cluster as home. Data and Metadata together on SSDs 3. 16M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs When i run the script first time for ?each" filesystem: I see that GPFS reads from the files, and caches into the pagepool as it reads, from mmdiag -- iohist When i run the second time, i see that there are no IO requests from the compute node to GPFS NSD servers, which is expected since all the data from the 3 filesystems is cached. However - the time taken for the script to run for the files in the 3 different filesystems is different - although i know that they are just "mmapping"/reading from pagepool/cache and not from disk. Here is the difference in time, for IO just from pagepool: 20s 4M block size 15s 1M block size 40S 16M block size. Why do i see a difference when trying to mmap reads from different block-size filesystems, although i see that the IO requests are not hitting disks and just the pagepool? I am willing to share the strace output and mmdiag outputs if needed. Thanks, Lohit -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Feb 22 20:59:27 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 22 Feb 2018 20:59:27 +0000 Subject: [gpfsug-discuss] GPFS, MMAP and Pagepool In-Reply-To: References: Message-ID: Hi Lohit, i am working with ray on a mmap performance improvement right now, which most likely has the same root cause as yours , see --> http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html the thread above is silent after a couple of back and rorth, but ray and i have active communication in the background and will repost as soon as there is something new to share. i am happy to look at this issue after we finish with ray's workload if there is something missing, but first let's finish his, get you try the same fix and see if there is something missing. btw. if people would share their use of MMAP , what applications they use (home grown, just use lmdb which uses mmap under the cover, etc) please let me know so i get a better picture on how wide the usage is with GPFS. i know a lot of the ML/DL workloads are using it, but i would like to know what else is out there i might not think about. feel free to drop me a personal note, i might not reply to it right away, but eventually. thx. sven On Thu, Feb 22, 2018 at 12:33 PM wrote: > Hi all, > > I wanted to know, how does mmap interact with GPFS pagepool with respect > to filesystem block-size? > Does the efficiency depend on the mmap read size and the block-size of the > filesystem even if all the data is cached in pagepool? > > GPFS 4.2.3.2 and CentOS7. > > Here is what i observed: > > I was testing a user script that uses mmap to read from 100M to 500MB > files. > > The above files are stored on 3 different filesystems. > > Compute nodes - 10G pagepool and 5G seqdiscardthreshold. > > 1. 4M block size GPFS filesystem, with separate metadata and data. Data on > Near line and metadata on SSDs > 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the > required files fully cached" from the above GPFS cluster as home. Data and > Metadata together on SSDs > 3. 16M block size GPFS filesystem, with separate metadata and data. Data > on Near line and metadata on SSDs > > When i run the script first time for ?each" filesystem: > I see that GPFS reads from the files, and caches into the pagepool as it > reads, from mmdiag -- iohist > > When i run the second time, i see that there are no IO requests from the > compute node to GPFS NSD servers, which is expected since all the data from > the 3 filesystems is cached. > > However - the time taken for the script to run for the files in the 3 > different filesystems is different - although i know that they are just > "mmapping"/reading from pagepool/cache and not from disk. > > Here is the difference in time, for IO just from pagepool: > > 20s 4M block size > 15s 1M block size > 40S 16M block size. > > Why do i see a difference when trying to mmap reads from different > block-size filesystems, although i see that the IO requests are not hitting > disks and just the pagepool? > > I am willing to share the strace output and mmdiag outputs if needed. > > Thanks, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Feb 22 21:08:06 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 22 Feb 2018 16:08:06 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: References: Message-ID: I do not think AFM is intended to solve the problem you are trying to solve. If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. If that is true that would not be wise especially if you are going to do many metadata operations. I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: valleru at cbio.mskcc.org To: gpfsug main discussion list Date: 02/22/2018 03:11 PM Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. At the end of this .. the workflow that i am targeting is where: ? If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. ? I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. I have tried LROC too, but it does not behave the same way as i guess AFM works. Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. Regards, Lohit _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Thu Feb 22 21:19:08 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 16:19:08 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: References: Message-ID: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> Thank you. I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. Please do let me know, if i understood it wrong. On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , wrote: > I do not think AFM is intended to solve the problem you are trying to solve. ?If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. ?If that is true that would not be wise especially if you are going to do many metadata operations. ?I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. ?You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? ?I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. ?If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact ?1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: ? ? ? ?valleru at cbio.mskcc.org > To: ? ? ? ?gpfsug main discussion list > Date: ? ? ? ?02/22/2018 03:11 PM > Subject: ? ? ? ?[gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > ? > If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. > The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. > ? > > > I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. > I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM works. > > Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. > > Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Feb 22 21:52:01 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 22 Feb 2018 16:52:01 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> Message-ID: My apologies for not being more clear on the flash storage pool. I meant that this would be just another GPFS storage pool in the same cluster, so no separate AFM cache cluster. You would then use the file heat feature to ensure more frequently accessed files are migrated to that all flash storage pool. As for LROC could you please clarify what you mean by a few headers/stubs of the file? In reading the LROC documentation and the LROC variables available in the mmchconfig command I think you might want to take a look a the lrocDataStubFileSize variable since it seems to apply to your situation. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: valleru at cbio.mskcc.org To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Date: 02/22/2018 04:21 PM Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Sent by: gpfsug-discuss-bounces at spectrumscale.org Thank you. I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. Please do let me know, if i understood it wrong. On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , wrote: I do not think AFM is intended to solve the problem you are trying to solve. If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. If that is true that would not be wise especially if you are going to do many metadata operations. I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: valleru at cbio.mskcc.org To: gpfsug main discussion list Date: 02/22/2018 03:11 PM Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. At the end of this .. the workflow that i am targeting is where: ? If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. ? I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. I have tried LROC too, but it does not behave the same way as i guess AFM works. Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. Regards, Lohit _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 23 00:48:12 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 19:48:12 -0500 Subject: [gpfsug-discuss] GPFS, MMAP and Pagepool In-Reply-To: References: Message-ID: Thanks a lot Sven. I was trying out all the scenarios that Ray mentioned, with respect to lroc and all flash GPFS cluster and nothing seemed to be effective. As of now, we are deploying a new test cluster on GPFS 5.0 and it would be good to know the respective features that could be enabled and see if it improves anything. On the other side, i have seen various cases in my past 6 years with GPFS, where different tools do frequently use mmap. This dates back to 2013..?http://www.spectrumscale.org/pipermail/gpfsug-discuss/2013-May/000253.html?when one of my colleagues asked the same question. At that time, it was a homegrown application that was using mmap, along with few other genomic pipelines. An year ago, we had issue with mmap and lot of threads where GPFS would just hang without any traces or logs, which was fixed recently. That was related to relion : https://sbgrid.org/software/titles/relion The issue that we are seeing now is ML/DL workloads, and is related to implementing external tools such as openslide (http://openslide.org/), pytorch (http://pytorch.org/) with field of application being deep learning for thousands of image patches. The IO is really slow when accessed from hard disk, and thus i was trying out other options such as LROC and flash cluster/afm cluster. But everything has a limitation as Ray mentioned. Thanks, Lohit On Feb 22, 2018, 3:59 PM -0500, Sven Oehme , wrote: > Hi Lohit, > > i am working with ray on a mmap performance improvement right now, which most likely has the same root cause as yours , see -->??http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html > the thread above is silent after a couple of back and rorth, but ray and i have active communication in the background and will repost as soon as there is something new to share. > i am happy to look at this issue after we finish with ray's workload if there is something missing, but first let's finish his, get you try the same fix and see if there is something missing. > > btw. if people would share their use of MMAP , what applications they use (home grown, just use lmdb which uses mmap under the cover, etc) please let me know so i get a better picture on how wide the usage is with GPFS. i know a lot of the ML/DL workloads are using it, but i would like to know what else is out there i might not think about. feel free to drop me a personal note, i might not reply to it right away, but eventually. > > thx. sven > > > > On Thu, Feb 22, 2018 at 12:33 PM wrote: > > > Hi all, > > > > > > I wanted to know, how does mmap interact with GPFS pagepool with respect to filesystem block-size? > > > Does the efficiency depend on the mmap read size and the block-size of the filesystem even if all the data is cached in pagepool? > > > > > > GPFS 4.2.3.2 and CentOS7. > > > > > > Here is what i observed: > > > > > > I was testing a user script that uses mmap to read from 100M to 500MB files. > > > > > > The above files are stored on 3 different filesystems. > > > > > > Compute nodes - 10G pagepool and 5G seqdiscardthreshold. > > > > > > 1. 4M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs > > > 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the required files fully cached" from the above GPFS cluster as home. Data and Metadata together on SSDs > > > 3. 16M block size GPFS filesystem, with separate metadata and data. Data on Near line and metadata on SSDs > > > > > > When i run the script first time for ?each" filesystem: > > > I see that GPFS reads from the files, and caches into the pagepool as it reads, from mmdiag -- iohist > > > > > > When i run the second time, i see that there are no IO requests from the compute node to GPFS NSD servers, which is expected since all the data from the 3 filesystems is cached. > > > > > > However - the time taken for the script to run for the files in the 3 different filesystems is different - although i know that they are just "mmapping"/reading from pagepool/cache and not from disk. > > > > > > Here is the difference in time, for IO just from pagepool: > > > > > > 20s 4M block size > > > 15s 1M block size > > > 40S 16M block size. > > > > > > Why do i see a difference when trying to mmap reads from different block-size filesystems, although i see that the IO requests are not hitting disks and just the pagepool? > > > > > > I am willing to share the strace output and mmdiag outputs if needed. > > > > > > Thanks, > > > Lohit > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 23 01:27:58 2018 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Thu, 22 Feb 2018 20:27:58 -0500 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> Message-ID: <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> Thanks, I will try the file heat feature but i am really not sure, if it would work - since the code can access cold files too, and not necessarily files recently accessed/hot files. With respect to LROC. Let me explain as below: The use case is that - The code initially reads headers (small region of data) from thousands of files as the first step. For example about 30,000 of them with each about 300MB to 500MB in size. After the first step, with the help of those headers - it mmaps/seeks across various regions of a set of files in parallel. Since its all small IOs and it was really slow at reading from GPFS over the network directly from disks - Our idea was to use AFM which i believe fetches all file data into flash/ssds, once the initial few blocks of the files are read. But again - AFM seems to not solve the problem, so i want to know if LROC behaves in the same way as AFM, where all of the file data is prefetched in full block size utilizing all the worker threads ?- if few blocks of the file is read initially. Thanks, Lohit On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale , wrote: > My apologies for not being more clear on the flash storage pool. ?I meant that this would be just another GPFS storage pool in the same cluster, so no separate AFM cache cluster. ?You would then use the file heat feature to ensure more frequently accessed files are migrated to that all flash storage pool. > > As for LROC could you please clarify what you mean by a few headers/stubs of the file? ?In reading the LROC documentation and the LROC variables available in the mmchconfig command I think you might want to take a look a the lrocDataStubFileSize variable since it seems to apply to your situation. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact ?1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: ? ? ? ?valleru at cbio.mskcc.org > To: ? ? ? ?gpfsug main discussion list > Cc: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > Date: ? ? ? ?02/22/2018 04:21 PM > Subject: ? ? ? ?Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > > > > Thank you. > > I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. > I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. > You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? > The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? > I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. > Please do let me know, if i understood it wrong. > > On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , wrote: > I do not think AFM is intended to solve the problem you are trying to solve. ?If I understand your scenario correctly you state that you are placing metadata on NL-SAS storage. ?If that is true that would not be wise especially if you are going to do many metadata operations. ?I suspect your performance issues are partially due to the fact that metadata is being stored on NL-SAS storage. ?You stated that you did not think the file heat feature would do what you intended but have you tried to use it to see if it could solve your problem? ?I would think having metadata on SSD/flash storage combined with a all flash storage pool for your heavily used files would perform well. ?If you expect IO usage will be such that there will be far more reads than writes then LROC should be beneficial to your overall performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact ?1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: ? ? ? ?valleru at cbio.mskcc.org > To: ? ? ? ?gpfsug main discussion list > Date: ? ? ? ?02/22/2018 03:11 PM > Subject: ? ? ? ?[gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: ? ? ? ?gpfsug-discuss-bounces at spectrumscale.org > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming bandwidth and enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS SSD cluster in front end that uses AFM and acts as a cache cluster with the backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > ? > If the compute nodes read headers of thousands of large files ranging from 100MB to 1GB, the AFM cluster should be able to bring up enough threads to bring up all of the files from the backend to the faster SSD/Flash GPFS cluster. > The working set might be about 100T, at a time which i want to be on a faster/low latency tier, and the rest of the files to be in slower tier until they are read by the compute nodes. > ? > > > I do not want to use GPFS policies to achieve the above, is because i am not sure - if policies could be written in a way, that files are moved from the slower tier to faster tier depending on how the jobs interact with the files. > I know that the policies could be written depending on the heat, and size/format but i don?t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM cache cluster before the near line storage. However the AFM cluster was really really slow, It took it about few hours to copy the files from near line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not tuned to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM works. > > Has anyone tried or know if GPFS supports an architecture - where the fast tier can bring up thousands of threads and copy the files almost instantly/asynchronously from the slow tier, whenever the jobs from compute nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be really fast, as well as the network between the AFM cluster and the backend cluster. > > Please do also let me know, if the above workflow can be done using GPFS policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Feb 23 03:17:26 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 22 Feb 2018 22:17:26 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory Message-ID: I've been exploring the idea for a while of writing a SLURM SPANK plugin to allow users to dynamically change the pagepool size on a node. Every now and then we have some users who would benefit significantly from a much larger pagepool on compute nodes but by default keep it on the smaller side to make as much physmem available as possible to batch work. In testing, though, it seems as though reducing the pagepool doesn't quite release all of the memory. I don't really understand it because I've never before seen memory that was previously resident become un-resident but still maintain the virtual memory allocation. Here's what I mean. Let's take a node with 128G and a 1G pagepool. If I do the following to simulate what might happen as various jobs tweak the pagepool: - tschpool 64G - tschpool 1G - tschpool 32G - tschpool 1G - tschpool 32G I end up with this: mmfsd thinks there's 32G resident but 64G virt # ps -o vsz,rss,comm -p 24397 VSZ RSS COMMAND 67589400 33723236 mmfsd however, linux thinks there's ~100G used # free -g total used free shared buffers cached Mem: 125 100 25 0 0 0 -/+ buffers/cache: 98 26 Swap: 7 0 7 I can jump back and forth between 1G and 32G *after* allocating 64G pagepool and the overall amount of memory in use doesn't balloon but I can't seem to shed that original 64G. I don't understand what's going on... :) Any ideas? This is with Scale 4.2.3.6. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Fri Feb 23 03:24:00 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 22 Feb 2018 22:24:00 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: This is also interesting (although I don't know what it really means). Looking at pmap run against mmfsd I can see what happens after each step: # baseline 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] Total: 1613580K 1191020K 1189650K 1171836K 0K # tschpool 64G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp [anon] Total: 67706636K 67284108K 67282625K 67264920K 0K # tschpool 1G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] Total: 67706636K 1223820K 1222451K 1204632K 0K Even though mmfsd has that 64G chunk allocated there's none of it *used*. I wonder why Linux seems to be accounting it as allocated. -Aaron On 2/22/18 10:17 PM, Aaron Knister wrote: > I've been exploring the idea for a while of writing a SLURM SPANK plugin > to allow users to dynamically change the pagepool size on a node. Every > now and then we have some users who would benefit significantly from a > much larger pagepool on compute nodes but by default keep it on the > smaller side to make as much physmem available as possible to batch work. > > In testing, though, it seems as though reducing the pagepool doesn't > quite release all of the memory. I don't really understand it because > I've never before seen memory that was previously resident become > un-resident but still maintain the virtual memory allocation. > > Here's what I mean. Let's take a node with 128G and a 1G pagepool. > > If I do the following to simulate what might happen as various jobs > tweak the pagepool: > > - tschpool 64G > - tschpool 1G > - tschpool 32G > - tschpool 1G > - tschpool 32G > > I end up with this: > > mmfsd thinks there's 32G resident but 64G virt > # ps -o vsz,rss,comm -p 24397 > ?? VSZ?? RSS COMMAND > 67589400 33723236 mmfsd > > however, linux thinks there's ~100G used > > # free -g > ???????????? total?????? used?????? free???? shared??? buffers???? cached > Mem:?????????? 125??????? 100???????? 25????????? 0????????? 0????????? 0 > -/+ buffers/cache:???????? 98???????? 26 > Swap:??????????? 7????????? 0????????? 7 > > I can jump back and forth between 1G and 32G *after* allocating 64G > pagepool and the overall amount of memory in use doesn't balloon but I > can't seem to shed that original 64G. > > I don't understand what's going on... :) Any ideas? This is with Scale > 4.2.3.6. > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From john.hearns at asml.com Fri Feb 23 09:37:08 2018 From: john.hearns at asml.com (John Hearns) Date: Fri, 23 Feb 2018 09:37:08 +0000 Subject: [gpfsug-discuss] mmfind -ls In-Reply-To: References: Message-ID: Hi. I hope this reply comes through. I often get bounced when replying here. In fact the reason is because I am not running ls. This was just an example. I am running mmgetlocation to get the chunks allocation on each NSD of a file. Secondly my problem is that a space is needed: mmfind /mountpoint -type f -exec mmgetlocation -D myproblemnsd -f {} \; Note the space before the \ TO my shame this is the same as in the normal find command From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, February 22, 2018 3:21 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind -ls Leaving aside the -exec option, and whether you choose classic find or mmfind, why not just use the -ls option - same output, less overhead... mmfind pathname -type f -ls From: John Hearns > To: gpfsug main discussion list > Cc: "gpfsug-discuss-bounces at spectrumscale.org" > Date: 02/22/2018 04:03 AM Subject: Re: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Stupid me. The space between the {} and \; is significant. /usr/local/bin/mmfind /hpc/bscratch -type f -exec /bin/ls {} \; Still would be nice to have the documentation clarified please. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Thursday, February 22, 2018 2:26 AM To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org Subject: Re: [gpfsug-discuss] mmfind will not exec Looking at the mmfind.README it indicates that it only supports the format you used with the semi-colon. Did you capture any output of the problem? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns > To: gpfsug main discussion list > Date: 02/21/2018 06:45 PM Subject: [gpfsug-discuss] mmfind will not exec Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I would dearly like to use mmfind in a project I am working on (version 4.2.3.4 at the moment) mmfind /hpc/bscratch -type f work fine mmfind /hpc/bscratch -type f -exec /bin/ls {}\ ; crashes and burns I know there are supposed to be problems with exec and mmfind, and this is sample software shipped without warranty etc. But why let me waste hours on this when it won?t work? There is even an example in the README for mmfind ./mmfind /encFS -type f -exec /bin/readMyFile {} \; But in the help for mmfind: -exec COMMANDs are terminated by a standalone ';' or by the string '{} +? So which is it? The normal find version {} \; or {} + -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OC7XNZeulP0vmS8Fq-RJuun5wOqFPootm0QHxBXUfKg&s=LUvpk53AaNcHSGQgDgH8FAiOOsH1H0OPOV9MFGMIi9E&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=77Whh54a5VWNFaaczlMhEzn7B802MGX9m-C2xj4sP1k&s=L4bZlOcrZLwkyth7maRTEmms7Ftarchh_DkBvdTEF7w&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Feb 23 14:35:41 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 23 Feb 2018 09:35:41 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: AFAIK you can increase the pagepool size dynamically but you cannot shrink it dynamically. To shrink it you must restart the GPFS daemon. Also, could you please provide the actual pmap commands you executed? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Aaron Knister To: Date: 02/22/2018 10:30 PM Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all memory Sent by: gpfsug-discuss-bounces at spectrumscale.org This is also interesting (although I don't know what it really means). Looking at pmap run against mmfsd I can see what happens after each step: # baseline 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] Total: 1613580K 1191020K 1189650K 1171836K 0K # tschpool 64G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp [anon] Total: 67706636K 67284108K 67282625K 67264920K 0K # tschpool 1G 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] Total: 67706636K 1223820K 1222451K 1204632K 0K Even though mmfsd has that 64G chunk allocated there's none of it *used*. I wonder why Linux seems to be accounting it as allocated. -Aaron On 2/22/18 10:17 PM, Aaron Knister wrote: > I've been exploring the idea for a while of writing a SLURM SPANK plugin > to allow users to dynamically change the pagepool size on a node. Every > now and then we have some users who would benefit significantly from a > much larger pagepool on compute nodes but by default keep it on the > smaller side to make as much physmem available as possible to batch work. > > In testing, though, it seems as though reducing the pagepool doesn't > quite release all of the memory. I don't really understand it because > I've never before seen memory that was previously resident become > un-resident but still maintain the virtual memory allocation. > > Here's what I mean. Let's take a node with 128G and a 1G pagepool. > > If I do the following to simulate what might happen as various jobs > tweak the pagepool: > > - tschpool 64G > - tschpool 1G > - tschpool 32G > - tschpool 1G > - tschpool 32G > > I end up with this: > > mmfsd thinks there's 32G resident but 64G virt > # ps -o vsz,rss,comm -p 24397 > VSZ RSS COMMAND > 67589400 33723236 mmfsd > > however, linux thinks there's ~100G used > > # free -g > total used free shared buffers cached > Mem: 125 100 25 0 0 0 > -/+ buffers/cache: 98 26 > Swap: 7 0 7 > > I can jump back and forth between 1G and 32G *after* allocating 64G > pagepool and the overall amount of memory in use doesn't balloon but I > can't seem to shed that original 64G. > > I don't understand what's going on... :) Any ideas? This is with Scale > 4.2.3.6. > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OrZQeEmI6chBdguG-h4YPHsxXZ4gTU3CtIuN4e3ijdY&s=hvVIRG5kB1zom2Iql2_TOagchsgl99juKiZfJt5S1tM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Fri Feb 23 14:44:21 2018 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Fri, 23 Feb 2018 15:44:21 +0100 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> hi all, we had the same idea long ago, afaik the issue we had was due to the pinned memory the pagepool uses when RDMA is enabled. at some point we restarted gpfs on the compute nodes for each job, similar to the way we do swapoff/swapon; but in certain scenarios gpfs really did not like it; so we gave up on it. the other issue that needs to be resolved is that the pagepool needs to be numa aware, so the pagepool is nicely allocated across all numa domains, instead of using the first ones available. otherwise compute jobs might start that only do non-local doamin memeory access. stijn On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: > AFAIK you can increase the pagepool size dynamically but you cannot shrink > it dynamically. To shrink it you must restart the GPFS daemon. Also, > could you please provide the actual pmap commands you executed? > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Aaron Knister > To: > Date: 02/22/2018 10:30 PM > Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all > memory > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > This is also interesting (although I don't know what it really means). > Looking at pmap run against mmfsd I can see what happens after each step: > > # baseline > 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] > Total: 1613580K 1191020K 1189650K 1171836K 0K > > # tschpool 64G > 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp > [anon] > Total: 67706636K 67284108K 67282625K 67264920K 0K > > # tschpool 1G > 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] > 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] > 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] > Total: 67706636K 1223820K 1222451K 1204632K 0K > > Even though mmfsd has that 64G chunk allocated there's none of it > *used*. I wonder why Linux seems to be accounting it as allocated. > > -Aaron > > On 2/22/18 10:17 PM, Aaron Knister wrote: >> I've been exploring the idea for a while of writing a SLURM SPANK plugin > >> to allow users to dynamically change the pagepool size on a node. Every >> now and then we have some users who would benefit significantly from a >> much larger pagepool on compute nodes but by default keep it on the >> smaller side to make as much physmem available as possible to batch > work. >> >> In testing, though, it seems as though reducing the pagepool doesn't >> quite release all of the memory. I don't really understand it because >> I've never before seen memory that was previously resident become >> un-resident but still maintain the virtual memory allocation. >> >> Here's what I mean. Let's take a node with 128G and a 1G pagepool. >> >> If I do the following to simulate what might happen as various jobs >> tweak the pagepool: >> >> - tschpool 64G >> - tschpool 1G >> - tschpool 32G >> - tschpool 1G >> - tschpool 32G >> >> I end up with this: >> >> mmfsd thinks there's 32G resident but 64G virt >> # ps -o vsz,rss,comm -p 24397 >> VSZ RSS COMMAND >> 67589400 33723236 mmfsd >> >> however, linux thinks there's ~100G used >> >> # free -g >> total used free shared buffers > cached >> Mem: 125 100 25 0 0 > 0 >> -/+ buffers/cache: 98 26 >> Swap: 7 0 7 >> >> I can jump back and forth between 1G and 32G *after* allocating 64G >> pagepool and the overall amount of memory in use doesn't balloon but I >> can't seem to shed that original 64G. >> >> I don't understand what's going on... :) Any ideas? This is with Scale >> 4.2.3.6. >> >> -Aaron >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From makaplan at us.ibm.com Fri Feb 23 16:53:26 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 23 Feb 2018 11:53:26 -0500 Subject: [gpfsug-discuss] mmfind -ls, -exec but use -xargs wherever you can. In-Reply-To: References: Message-ID: So much the more reasons to use mmfind ... -xargs ... Which, for large number of files, gives you a very much more performant and parallelized execution of the classic find ... | xargs ... The difference is exec is run in line with the evaluation of the other find conditionals (like -type f) but spawns a new command shell for each evaluation of exec... Whereas -xargs is run after the pathnames of all of the (matching) files are discovered ... Like classic xargs, if your command can take a list of files, you save overhead there BUT -xargs also runs multiple instances of your command in multiple parallel processes on multiple nodes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Feb 23 23:41:52 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 23 Feb 2018 15:41:52 -0800 Subject: [gpfsug-discuss] Call for presentations - US Spring 2018 Meeting - Boston, May 16-17th In-Reply-To: References: Message-ID: Agenda work for the US Spring meeting is still underway and in addition to Bob?s request below, let me ask you to comment on what you?d like to hear about from IBM developers, and/or other topics of interest. Even if you can?t attend the event, feel free to contribute ideas as the talks will be posted online after the event. Just reply to the list to generate any follow-on discussion or brainstorming about topics. Best, Kristy Kristy Kallback-Rose Sr HPC Storage Systems Analyst NERSC/LBL > On Feb 8, 2018, at 12:34 PM, Oesterlin, Robert wrote: > > We?re finalizing the details for the Spring 2018 User Group meeting, and we need your help! > > I?ve you?re interested in presenting at this meeting (it will be a full 2 days), then contact me and let me know what?s you?d like to talk about. We?re always looking for presentations on how you are using Scale (GPFS) in your business or project, tools that help you do your job, performance challenges/solutions ? or anything else. Also looking for ideas on breakout sessions. We?re probably looking at talks of about 30 mins each. > > Drop me a note if you?d like to present. Exact details on the event location will be available in a few weeks. We?re hoping to keep it as close to BioIT World in downtown Boston. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > SSUG Co-principal > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Feb 24 12:01:08 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 24 Feb 2018 12:01:08 +0000 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> Message-ID: On 23/02/18 01:27, valleru at cbio.mskcc.org wrote: > Thanks, I will try the file heat feature but i am really not sure, if it > would work - since the code can access cold files too, and not > necessarily files recently accessed/hot files. > > With respect to LROC. Let me explain as below: > > The use case is that - > The code initially reads headers (small region of data) from thousands > of files as the first step. For example about 30,000 of them with each > about 300MB to 500MB in size. > After the first step, with the help of those headers - it mmaps/seeks > across various regions of a set of files in parallel. > Since its all small IOs and it was really slow at reading from GPFS over > the network directly from disks - Our idea was to use AFM which i > believe fetches all file data into flash/ssds, once the initial few > blocks of the files are read. > But again - AFM seems to not solve the problem, so i want to know if > LROC behaves in the same way as AFM, where all of the file data is > prefetched in full block size utilizing all the worker threads ?- if few > blocks of the file is read initially. > Imagine a single GPFS file system, metadata in SSD, a fast data pool and a slow data pool (fast and slow being really good names to avoid the 8 character nonsense). Now if your fast data pool is appropriately sized then your slow data pool will normally be doing diddly squat. We are talking under 10 I/O's per second. Frankly under 5 I/O's per second is more like it from my experience. If your slow pool is 8-10PB in size, then it has thousands of spindles in it, and should be able to absorb the start of the job without breaking sweat. For numbers a 7.2K RPM disk can do around 120 random I/O's per second, so using RAID6 and 8TB disks that's 130 LUN's so around 15,000 random I/O's per second spare overhead, more if it's not random. It should take all of around 1-2s to read in those headers. Therefore unless these jobs only run for a few seconds or you have dozens of them starting every minute it should not be an issue. Finally if GPFS is taking ages to read the files over the network, then it sounds like your network needs an upgrade or GPFS needs tuning which may or may not require a larger fast storage pool. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From aaron.s.knister at nasa.gov Sun Feb 25 16:45:10 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 25 Feb 2018 11:45:10 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: Message-ID: <65453649-77df-2efa-8776-eb2775ca9efa@nasa.gov> Hmm...interesting. It sure seems to try :) The pmap command was this: pmap $(pidof mmfsd) | sort -n -k3 | tail -Aaron On 2/23/18 9:35 AM, IBM Spectrum Scale wrote: > AFAIK you can increase the pagepool size dynamically but you cannot > shrink it dynamically. ?To shrink it you must restart the GPFS daemon. > Also, could you please provide the actual pmap commands you executed? > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of ?Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks Forum > at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please contact > ?1-800-237-5511 in the United States or your local IBM Service Center > in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Aaron Knister > To: > Date: 02/22/2018 10:30 PM > Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all memory > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > This is also interesting (although I don't know what it really means). > Looking at pmap run against mmfsd I can see what happens after each step: > > # baseline > 00007fffe4639000 ?59164K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 00007fffd837e000 ?61960K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 0000020000000000 1048576K 1048576K 1048576K 1048576K ? ? ?0K rwxp [anon] > Total: ? ? ? ? ? 1613580K 1191020K 1189650K 1171836K ? ? ?0K > > # tschpool 64G > 00007fffe4639000 ?59164K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 00007fffd837e000 ?61960K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 0000020000000000 67108864K 67108864K 67108864K 67108864K ?0K rwxp [anon] > Total: ? ? ? ? ? 67706636K 67284108K 67282625K 67264920K ? ? ?0K > > # tschpool 1G > 00007fffe4639000 ?59164K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 00007fffd837e000 ?61960K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K ---p [anon] > 0000020001400000 139264K 139264K 139264K 139264K ? ? ?0K rwxp [anon] > 0000020fc9400000 897024K 897024K 897024K 897024K ? ? ?0K rwxp [anon] > 0000020009c00000 66052096K ? ? ?0K ? ? ?0K ? ? ?0K ? ? ?0K rwxp [anon] > Total: ? ? ? ? ? 67706636K 1223820K 1222451K 1204632K ? ? ?0K > > Even though mmfsd has that 64G chunk allocated there's none of it > *used*. I wonder why Linux seems to be accounting it as allocated. > > -Aaron > > On 2/22/18 10:17 PM, Aaron Knister wrote: > > I've been exploring the idea for a while of writing a SLURM SPANK plugin > > to allow users to dynamically change the pagepool size on a node. Every > > now and then we have some users who would benefit significantly from a > > much larger pagepool on compute nodes but by default keep it on the > > smaller side to make as much physmem available as possible to batch work. > > > > In testing, though, it seems as though reducing the pagepool doesn't > > quite release all of the memory. I don't really understand it because > > I've never before seen memory that was previously resident become > > un-resident but still maintain the virtual memory allocation. > > > > Here's what I mean. Let's take a node with 128G and a 1G pagepool. > > > > If I do the following to simulate what might happen as various jobs > > tweak the pagepool: > > > > - tschpool 64G > > - tschpool 1G > > - tschpool 32G > > - tschpool 1G > > - tschpool 32G > > > > I end up with this: > > > > mmfsd thinks there's 32G resident but 64G virt > > # ps -o vsz,rss,comm -p 24397 > > ??? VSZ?? RSS COMMAND > > 67589400 33723236 mmfsd > > > > however, linux thinks there's ~100G used > > > > # free -g > > total?????? used free???? shared??? buffers cached > > Mem:?????????? 125 100???????? 25 0????????? 0 0 > > -/+ buffers/cache: 98???????? 26 > > Swap: 7????????? 0 7 > > > > I can jump back and forth between 1G and 32G *after* allocating 64G > > pagepool and the overall amount of memory in use doesn't balloon but I > > can't seem to shed that original 64G. > > > > I don't understand what's going on... :) Any ideas? This is with Scale > > 4.2.3.6. > > > > -Aaron > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=OrZQeEmI6chBdguG-h4YPHsxXZ4gTU3CtIuN4e3ijdY&s=hvVIRG5kB1zom2Iql2_TOagchsgl99juKiZfJt5S1tM&e= > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Sun Feb 25 16:54:06 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 25 Feb 2018 11:54:06 -0500 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> References: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> Message-ID: Hi Stijn, Thanks for sharing your experiences-- I'm glad I'm not the only one whose had the idea (and come up empty handed). About the pagpool and numa awareness, I'd remembered seeing something about that somewhere and I did some googling and found there's a parameter called numaMemoryInterleave that "starts mmfsd with numactl --interleave=all". Do you think that provides the kind of numa awareness you're looking for? -Aaron On 2/23/18 9:44 AM, Stijn De Weirdt wrote: > hi all, > > we had the same idea long ago, afaik the issue we had was due to the > pinned memory the pagepool uses when RDMA is enabled. > > at some point we restarted gpfs on the compute nodes for each job, > similar to the way we do swapoff/swapon; but in certain scenarios gpfs > really did not like it; so we gave up on it. > > the other issue that needs to be resolved is that the pagepool needs to > be numa aware, so the pagepool is nicely allocated across all numa > domains, instead of using the first ones available. otherwise compute > jobs might start that only do non-local doamin memeory access. > > stijn > > On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: >> AFAIK you can increase the pagepool size dynamically but you cannot shrink >> it dynamically. To shrink it you must restart the GPFS daemon. Also, >> could you please provide the actual pmap commands you executed? >> >> Regards, The Spectrum Scale (GPFS) team >> >> ------------------------------------------------------------------------------------------------------------------ >> If you feel that your question can benefit other users of Spectrum Scale >> (GPFS), then please post it to the public IBM developerWroks Forum at >> https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 >> . >> >> If your query concerns a potential software error in Spectrum Scale (GPFS) >> and you have an IBM software maintenance contract please contact >> 1-800-237-5511 in the United States or your local IBM Service Center in >> other countries. >> >> The forum is informally monitored as time permits and should not be used >> for priority messages to the Spectrum Scale (GPFS) team. >> >> >> >> From: Aaron Knister >> To: >> Date: 02/22/2018 10:30 PM >> Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all >> memory >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> This is also interesting (although I don't know what it really means). >> Looking at pmap run against mmfsd I can see what happens after each step: >> >> # baseline >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] >> 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] >> Total: 1613580K 1191020K 1189650K 1171836K 0K >> >> # tschpool 64G >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] >> 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp >> [anon] >> Total: 67706636K 67284108K 67282625K 67264920K 0K >> >> # tschpool 1G >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] >> 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] >> 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] >> 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] >> Total: 67706636K 1223820K 1222451K 1204632K 0K >> >> Even though mmfsd has that 64G chunk allocated there's none of it >> *used*. I wonder why Linux seems to be accounting it as allocated. >> >> -Aaron >> >> On 2/22/18 10:17 PM, Aaron Knister wrote: >>> I've been exploring the idea for a while of writing a SLURM SPANK plugin >> >>> to allow users to dynamically change the pagepool size on a node. Every >>> now and then we have some users who would benefit significantly from a >>> much larger pagepool on compute nodes but by default keep it on the >>> smaller side to make as much physmem available as possible to batch >> work. >>> >>> In testing, though, it seems as though reducing the pagepool doesn't >>> quite release all of the memory. I don't really understand it because >>> I've never before seen memory that was previously resident become >>> un-resident but still maintain the virtual memory allocation. >>> >>> Here's what I mean. Let's take a node with 128G and a 1G pagepool. >>> >>> If I do the following to simulate what might happen as various jobs >>> tweak the pagepool: >>> >>> - tschpool 64G >>> - tschpool 1G >>> - tschpool 32G >>> - tschpool 1G >>> - tschpool 32G >>> >>> I end up with this: >>> >>> mmfsd thinks there's 32G resident but 64G virt >>> # ps -o vsz,rss,comm -p 24397 >>> VSZ RSS COMMAND >>> 67589400 33723236 mmfsd >>> >>> however, linux thinks there's ~100G used >>> >>> # free -g >>> total used free shared buffers >> cached >>> Mem: 125 100 25 0 0 >> 0 >>> -/+ buffers/cache: 98 26 >>> Swap: 7 0 7 >>> >>> I can jump back and forth between 1G and 32G *after* allocating 64G >>> pagepool and the overall amount of memory in use doesn't balloon but I >>> can't seem to shed that original 64G. >>> >>> I don't understand what's going on... :) Any ideas? This is with Scale >>> 4.2.3.6. >>> >>> -Aaron >>> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Sun Feb 25 16:59:45 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 25 Feb 2018 11:59:45 -0500 Subject: [gpfsug-discuss] [non-nasa source] Re: pagepool shrink doesn't release all memory In-Reply-To: References: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> Message-ID: <79885b2d-947d-4098-89bd-09b764635847@nasa.gov> Oh, and I think you're absolutely right about the rdma interaction. If I stop the infiniband service on a node and try the same exercise again, I can jump between 100G and 1G several times and the free'd memory is actually released. -Aaron On 2/25/18 11:54 AM, Aaron Knister wrote: > Hi Stijn, > > Thanks for sharing your experiences-- I'm glad I'm not the only one > whose had the idea (and come up empty handed). > > About the pagpool and numa awareness, I'd remembered seeing something > about that somewhere and I did some googling and found there's a > parameter called numaMemoryInterleave that "starts mmfsd with numactl > --interleave=all". Do you think that provides the kind of numa awareness > you're looking for? > > -Aaron > > On 2/23/18 9:44 AM, Stijn De Weirdt wrote: >> hi all, >> >> we had the same idea long ago, afaik the issue we had was due to the >> pinned memory the pagepool uses when RDMA is enabled. >> >> at some point we restarted gpfs on the compute nodes for each job, >> similar to the way we do swapoff/swapon; but in certain scenarios gpfs >> really did not like it; so we gave up on it. >> >> the other issue that needs to be resolved is that the pagepool needs to >> be numa aware, so the pagepool is nicely allocated across all numa >> domains, instead of using the first ones available. otherwise compute >> jobs might start that only do non-local doamin memeory access. >> >> stijn >> >> On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: >>> AFAIK you can increase the pagepool size dynamically but you cannot >>> shrink >>> it dynamically.? To shrink it you must restart the GPFS daemon.?? Also, >>> could you please provide the actual pmap commands you executed? >>> >>> Regards, The Spectrum Scale (GPFS) team >>> >>> ------------------------------------------------------------------------------------------------------------------ >>> >>> If you feel that your question can benefit other users of? Spectrum >>> Scale >>> (GPFS), then please post it to the public IBM developerWroks Forum at >>> https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 >>> >>> . >>> >>> If your query concerns a potential software error in Spectrum Scale >>> (GPFS) >>> and you have an IBM software maintenance contract please contact >>> 1-800-237-5511 in the United States or your local IBM Service Center in >>> other countries. >>> >>> The forum is informally monitored as time permits and should not be used >>> for priority messages to the Spectrum Scale (GPFS) team. >>> >>> >>> >>> From:?? Aaron Knister >>> To:???? >>> Date:?? 02/22/2018 10:30 PM >>> Subject:??????? Re: [gpfsug-discuss] pagepool shrink doesn't release all >>> memory >>> Sent by:??????? gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> This is also interesting (although I don't know what it really means). >>> Looking at pmap run against mmfsd I can see what happens after each >>> step: >>> >>> # baseline >>> 00007fffe4639000? 59164K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 00007fffd837e000? 61960K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 0000020000000000 1048576K 1048576K 1048576K 1048576K????? 0K rwxp [anon] >>> Total:?????????? 1613580K 1191020K 1189650K 1171836K????? 0K >>> >>> # tschpool 64G >>> 00007fffe4639000? 59164K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 00007fffd837e000? 61960K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 0000020000000000 67108864K 67108864K 67108864K 67108864K????? 0K rwxp >>> [anon] >>> Total:?????????? 67706636K 67284108K 67282625K 67264920K????? 0K >>> >>> # tschpool 1G >>> 00007fffe4639000? 59164K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 00007fffd837e000? 61960K????? 0K????? 0K????? 0K????? 0K ---p [anon] >>> 0000020001400000 139264K 139264K 139264K 139264K????? 0K rwxp [anon] >>> 0000020fc9400000 897024K 897024K 897024K 897024K????? 0K rwxp [anon] >>> 0000020009c00000 66052096K????? 0K????? 0K????? 0K????? 0K rwxp [anon] >>> Total:?????????? 67706636K 1223820K 1222451K 1204632K????? 0K >>> >>> Even though mmfsd has that 64G chunk allocated there's none of it >>> *used*. I wonder why Linux seems to be accounting it as allocated. >>> >>> -Aaron >>> >>> On 2/22/18 10:17 PM, Aaron Knister wrote: >>>> I've been exploring the idea for a while of writing a SLURM SPANK >>>> plugin >>> >>>> to allow users to dynamically change the pagepool size on a node. Every >>>> now and then we have some users who would benefit significantly from a >>>> much larger pagepool on compute nodes but by default keep it on the >>>> smaller side to make as much physmem available as possible to batch >>> work. >>>> >>>> In testing, though, it seems as though reducing the pagepool doesn't >>>> quite release all of the memory. I don't really understand it because >>>> I've never before seen memory that was previously resident become >>>> un-resident but still maintain the virtual memory allocation. >>>> >>>> Here's what I mean. Let's take a node with 128G and a 1G pagepool. >>>> >>>> If I do the following to simulate what might happen as various jobs >>>> tweak the pagepool: >>>> >>>> - tschpool 64G >>>> - tschpool 1G >>>> - tschpool 32G >>>> - tschpool 1G >>>> - tschpool 32G >>>> >>>> I end up with this: >>>> >>>> mmfsd thinks there's 32G resident but 64G virt >>>> # ps -o vsz,rss,comm -p 24397 >>>> ???? VSZ?? RSS COMMAND >>>> 67589400 33723236 mmfsd >>>> >>>> however, linux thinks there's ~100G used >>>> >>>> # free -g >>>> ?????????????? total?????? used?????? free???? shared??? buffers >>> cached >>>> Mem:?????????? 125??????? 100???????? 25????????? 0????????? 0 >>> 0 >>>> -/+ buffers/cache:???????? 98???????? 26 >>>> Swap:??????????? 7????????? 0????????? 7 >>>> >>>> I can jump back and forth between 1G and 32G *after* allocating 64G >>>> pagepool and the overall amount of memory in use doesn't balloon but I >>>> can't seem to shed that original 64G. >>>> >>>> I don't understand what's going on... :) Any ideas? This is with Scale >>>> 4.2.3.6. >>>> >>>> -Aaron >>>> >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Sun Feb 25 17:49:38 2018 From: oehmes at gmail.com (Sven Oehme) Date: Sun, 25 Feb 2018 17:49:38 +0000 Subject: [gpfsug-discuss] pagepool shrink doesn't release all memory In-Reply-To: References: <6c9df2df-dbdd-e3c1-07c7-f9906b0d666d@ugent.be> Message-ID: Hi, i guess you saw that in some of my presentations about communication code overhaul. we started in 4.2.X and since then added more and more numa awareness to GPFS. Version 5.0 also has enhancements in this space. sven On Sun, Feb 25, 2018 at 8:54 AM Aaron Knister wrote: > Hi Stijn, > > Thanks for sharing your experiences-- I'm glad I'm not the only one > whose had the idea (and come up empty handed). > > About the pagpool and numa awareness, I'd remembered seeing something > about that somewhere and I did some googling and found there's a > parameter called numaMemoryInterleave that "starts mmfsd with numactl > --interleave=all". Do you think that provides the kind of numa awareness > you're looking for? > > -Aaron > > On 2/23/18 9:44 AM, Stijn De Weirdt wrote: > > hi all, > > > > we had the same idea long ago, afaik the issue we had was due to the > > pinned memory the pagepool uses when RDMA is enabled. > > > > at some point we restarted gpfs on the compute nodes for each job, > > similar to the way we do swapoff/swapon; but in certain scenarios gpfs > > really did not like it; so we gave up on it. > > > > the other issue that needs to be resolved is that the pagepool needs to > > be numa aware, so the pagepool is nicely allocated across all numa > > domains, instead of using the first ones available. otherwise compute > > jobs might start that only do non-local doamin memeory access. > > > > stijn > > > > On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote: > >> AFAIK you can increase the pagepool size dynamically but you cannot > shrink > >> it dynamically. To shrink it you must restart the GPFS daemon. Also, > >> could you please provide the actual pmap commands you executed? > >> > >> Regards, The Spectrum Scale (GPFS) team > >> > >> > ------------------------------------------------------------------------------------------------------------------ > >> If you feel that your question can benefit other users of Spectrum > Scale > >> (GPFS), then please post it to the public IBM developerWroks Forum at > >> > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 > >> . > >> > >> If your query concerns a potential software error in Spectrum Scale > (GPFS) > >> and you have an IBM software maintenance contract please contact > >> 1-800-237-5511 <(800)%20237-5511> in the United States or your local > IBM Service Center in > >> other countries. > >> > >> The forum is informally monitored as time permits and should not be used > >> for priority messages to the Spectrum Scale (GPFS) team. > >> > >> > >> > >> From: Aaron Knister > >> To: > >> Date: 02/22/2018 10:30 PM > >> Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all > >> memory > >> Sent by: gpfsug-discuss-bounces at spectrumscale.org > >> > >> > >> > >> This is also interesting (although I don't know what it really means). > >> Looking at pmap run against mmfsd I can see what happens after each > step: > >> > >> # baseline > >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > >> 0000020000000000 1048576K 1048576K 1048576K 1048576K 0K rwxp [anon] > >> Total: 1613580K 1191020K 1189650K 1171836K 0K > >> > >> # tschpool 64G > >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > >> 0000020000000000 67108864K 67108864K 67108864K 67108864K 0K rwxp > >> [anon] > >> Total: 67706636K 67284108K 67282625K 67264920K 0K > >> > >> # tschpool 1G > >> 00007fffe4639000 59164K 0K 0K 0K 0K ---p [anon] > >> 00007fffd837e000 61960K 0K 0K 0K 0K ---p [anon] > >> 0000020001400000 139264K 139264K 139264K 139264K 0K rwxp [anon] > >> 0000020fc9400000 897024K 897024K 897024K 897024K 0K rwxp [anon] > >> 0000020009c00000 66052096K 0K 0K 0K 0K rwxp [anon] > >> Total: 67706636K 1223820K 1222451K 1204632K 0K > >> > >> Even though mmfsd has that 64G chunk allocated there's none of it > >> *used*. I wonder why Linux seems to be accounting it as allocated. > >> > >> -Aaron > >> > >> On 2/22/18 10:17 PM, Aaron Knister wrote: > >>> I've been exploring the idea for a while of writing a SLURM SPANK > plugin > >> > >>> to allow users to dynamically change the pagepool size on a node. Every > >>> now and then we have some users who would benefit significantly from a > >>> much larger pagepool on compute nodes but by default keep it on the > >>> smaller side to make as much physmem available as possible to batch > >> work. > >>> > >>> In testing, though, it seems as though reducing the pagepool doesn't > >>> quite release all of the memory. I don't really understand it because > >>> I've never before seen memory that was previously resident become > >>> un-resident but still maintain the virtual memory allocation. > >>> > >>> Here's what I mean. Let's take a node with 128G and a 1G pagepool. > >>> > >>> If I do the following to simulate what might happen as various jobs > >>> tweak the pagepool: > >>> > >>> - tschpool 64G > >>> - tschpool 1G > >>> - tschpool 32G > >>> - tschpool 1G > >>> - tschpool 32G > >>> > >>> I end up with this: > >>> > >>> mmfsd thinks there's 32G resident but 64G virt > >>> # ps -o vsz,rss,comm -p 24397 > >>> VSZ RSS COMMAND > >>> 67589400 33723236 mmfsd > >>> > >>> however, linux thinks there's ~100G used > >>> > >>> # free -g > >>> total used free shared buffers > >> cached > >>> Mem: 125 100 25 0 0 > >> 0 > >>> -/+ buffers/cache: 98 26 > >>> Swap: 7 0 7 > >>> > >>> I can jump back and forth between 1G and 32G *after* allocating 64G > >>> pagepool and the overall amount of memory in use doesn't balloon but I > >>> can't seem to shed that original 64G. > >>> > >>> I don't understand what's going on... :) Any ideas? This is with Scale > >>> 4.2.3.6. > >>> > >>> -Aaron > >>> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Feb 26 12:20:52 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 26 Feb 2018 20:20:52 +0800 Subject: [gpfsug-discuss] Finding all bulletins and APARs In-Reply-To: References: Message-ID: Hi John, For all Flashes, alerts and bulletins for IBM Spectrum Scale, please check this link: https://www.ibm.com/support/home/search-results/10000060/system_storage/storage_software/software_defined_storage/ibm_spectrum_scale?filter=DC.Type_avl:CT792,CT555,CT755&sortby=-dcdate_sortrange&ct=fab For any other content which you got in the notification, please check this link: https://www.ibm.com/support/home/search-results/10000060/IBM_Spectrum_Scale?docOnly=true&sortby=-dcdate_sortrange&ct=rc Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: John Hearns To: gpfsug main discussion list Date: 02/21/2018 05:28 PM Subject: [gpfsug-discuss] Finding all bulletins and APARs Sent by: gpfsug-discuss-bounces at spectrumscale.org Firstly, let me apologise for not thanking people who hav ereplied to me on this list with help. I have indeed replied and thanked you ? however the list software has taken a dislike to my email address. I am currently on the myibm support site. I am looking for a specific APAR on Spectrum Scale. However I want to be able to get a list of all APARs and bulletins for Spectrum Scale, right up to date. I do get email alerts but somehow I suspect I am not getting them all, and it is a pain to search back in your email. Thanks John H -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=v0fVzSMP-N6VctcEcAQKTLJlrvu0WUry8rSo41ia-mY&s=_zoOdAst7NdP-PByM7WrniXyNLofARAf9hayK0BF5rU&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jan.sundermann at kit.edu Mon Feb 26 16:38:46 2018 From: jan.sundermann at kit.edu (Sundermann, Jan Erik (SCC)) Date: Mon, 26 Feb 2018 16:38:46 +0000 Subject: [gpfsug-discuss] Problems with remote mount via routed IB Message-ID: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Dear all we are currently trying to remote mount a file system in a routed Infiniband test setup and face problems with dropped RDMA connections. The setup is the following: - Spectrum Scale Cluster 1 is setup on four servers which are connected to the same infiniband network. Additionally they are connected to a fast ethernet providing ip communication in the network 192.168.11.0/24. - Spectrum Scale Cluster 2 is setup on four additional servers which are connected to a second infiniband network. These servers have IPs on their IB interfaces in the network 192.168.12.0/24. - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated machine. - We have a dedicated IB hardware router connected to both IB subnets. We tested that the routing, both IP and IB, is working between the two clusters without problems and that RDMA is working fine both for internal communication inside cluster 1 and cluster 2 When trying to remote mount a file system from cluster 1 in cluster 2, RDMA communication is not working as expected. Instead we see error messages on the remote host (cluster 2) 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 1 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 0 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 2 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 and in the cluster with the file system (cluster 1) 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 Any advice on how to configure the setup in a way that would allow the remote mount via routed IB would be very appreciated. Thank you and best regards Jan Erik -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5252 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Feb 26 19:16:34 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 26 Feb 2018 14:16:34 -0500 Subject: [gpfsug-discuss] Problems with remote mount via routed IB In-Reply-To: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> References: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Message-ID: Hi Jan Erik, It was my understanding that the IB hardware router required RDMA CM to work. By default GPFS doesn't use the RDMA Connection Manager but it can be enabled (e.g. verbsRdmaCm=enable). I think this requires a restart on clients/servers (in both clusters) to take effect. Maybe someone else on the list can comment in more detail-- I've been told folks have successfully deployed IB routers with GPFS. -Aaron On 2/26/18 11:38 AM, Sundermann, Jan Erik (SCC) wrote: > > Dear all > > we are currently trying to remote mount a file system in a routed Infiniband test setup and face problems with dropped RDMA connections. The setup is the following: > > - Spectrum Scale Cluster 1 is setup on four servers which are connected to the same infiniband network. Additionally they are connected to a fast ethernet providing ip communication in the network 192.168.11.0/24. > > - Spectrum Scale Cluster 2 is setup on four additional servers which are connected to a second infiniband network. These servers have IPs on their IB interfaces in the network 192.168.12.0/24. > > - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated machine. > > - We have a dedicated IB hardware router connected to both IB subnets. > > > We tested that the routing, both IP and IB, is working between the two clusters without problems and that RDMA is working fine both for internal communication inside cluster 1 and cluster 2 > > When trying to remote mount a file system from cluster 1 in cluster 2, RDMA communication is not working as expected. Instead we see error messages on the remote host (cluster 2) > > > 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 > 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 > 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 1 > 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 > 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1 > 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 0 > 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 > 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0 > 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 2 > 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 > 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 > 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > > > and in the cluster with the file system (cluster 1) > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 > > > > Any advice on how to configure the setup in a way that would allow the remote mount via routed IB would be very appreciated. > > > Thank you and best regards > Jan Erik > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From john.hearns at asml.com Tue Feb 27 09:17:36 2018 From: john.hearns at asml.com (John Hearns) Date: Tue, 27 Feb 2018 09:17:36 +0000 Subject: [gpfsug-discuss] Problems with remote mount via routed IB In-Reply-To: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> References: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Message-ID: Jan Erik, Can you clarify if you are routing IP traffic between the two Infiniband networks. Or are you routing Infiniband traffic? If I can be of help I manage an Infiniband network which connects to other IP networks using Mellanox VPI gateways, which proxy arp between IB and Ethernet. But I am not running GPFS traffic over these. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sundermann, Jan Erik (SCC) Sent: Monday, February 26, 2018 5:39 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Problems with remote mount via routed IB Dear all we are currently trying to remote mount a file system in a routed Infiniband test setup and face problems with dropped RDMA connections. The setup is the following: - Spectrum Scale Cluster 1 is setup on four servers which are connected to the same infiniband network. Additionally they are connected to a fast ethernet providing ip communication in the network 192.168.11.0/24. - Spectrum Scale Cluster 2 is setup on four additional servers which are connected to a second infiniband network. These servers have IPs on their IB interfaces in the network 192.168.12.0/24. - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated machine. - We have a dedicated IB hardware router connected to both IB subnets. We tested that the routing, both IP and IB, is working between the two clusters without problems and that RDMA is working fine both for internal communication inside cluster 1 and cluster 2 When trying to remote mount a file system from cluster 1 in cluster 2, RDMA communication is not working as expected. Instead we see error messages on the remote host (cluster 2) 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 1 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 0 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 2 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 index 3 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 and in the cluster with the file system (cluster 1) 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3 Any advice on how to configure the setup in a way that would allow the remote mount via routed IB would be very appreciated. Thank you and best regards Jan Erik -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From alex at calicolabs.com Tue Feb 27 22:25:30 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Tue, 27 Feb 2018 14:25:30 -0800 Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage In-Reply-To: <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> References: <522a6dc0-4652-416e-b019-54e2af98191a@Spark> <0c21b5b2-95ff-4cbf-9b07-e23594f58c87@Spark> Message-ID: Hi, My experience has been that you could spend the same money to just make your main pool more performant. Instead of doing two data transfers (one from cold pool to AFM or hot pools, one from AFM/hot to client), you can just make the direct access of the data faster by adding more resources to your main pool. Regards, Alex On Thu, Feb 22, 2018 at 5:27 PM, wrote: > Thanks, I will try the file heat feature but i am really not sure, if it > would work - since the code can access cold files too, and not necessarily > files recently accessed/hot files. > > With respect to LROC. Let me explain as below: > > The use case is that - > The code initially reads headers (small region of data) from thousands of > files as the first step. For example about 30,000 of them with each about > 300MB to 500MB in size. > After the first step, with the help of those headers - it mmaps/seeks > across various regions of a set of files in parallel. > Since its all small IOs and it was really slow at reading from GPFS over > the network directly from disks - Our idea was to use AFM which i believe > fetches all file data into flash/ssds, once the initial few blocks of the > files are read. > But again - AFM seems to not solve the problem, so i want to know if LROC > behaves in the same way as AFM, where all of the file data is prefetched in > full block size utilizing all the worker threads - if few blocks of the > file is read initially. > > Thanks, > Lohit > > On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale , > wrote: > > My apologies for not being more clear on the flash storage pool. I meant > that this would be just another GPFS storage pool in the same cluster, so > no separate AFM cache cluster. You would then use the file heat feature to > ensure more frequently accessed files are migrated to that all flash > storage pool. > > As for LROC could you please clarify what you mean by a few headers/stubs > of the file? In reading the LROC documentation and the LROC variables > available in the mmchconfig command I think you might want to take a look a > the lrocDataStubFileSize variable since it seems to apply to your situation. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: valleru at cbio.mskcc.org > To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org > Date: 02/22/2018 04:21 PM > Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > storage > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Thank you. > > I am sorry if i was not clear, but the metadata pool is all on SSDs in the > GPFS clusters that we use. Its just the data pool that is on Near-Line > Rotating disks. > I understand that AFM might not be able to solve the issue, and I will try > and see if file heat works for migrating the files to flash tier. > You mentioned an all flash storage pool for heavily used files - so you > mean a different GPFS cluster just with flash storage, and to manually copy > the files to flash storage whenever needed? > The IO performance that i am talking is prominently for reads, so you > mention that LROC can work in the way i want it to? that is prefetch all > the files into LROC cache, after only few headers/stubs of data are read > from those files? > I thought LROC only keeps that block of data that is prefetched from the > disk, and will not prefetch the whole file if a stub of data is read. > Please do let me know, if i understood it wrong. > > On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , > wrote: > I do not think AFM is intended to solve the problem you are trying to > solve. If I understand your scenario correctly you state that you are > placing metadata on NL-SAS storage. If that is true that would not be wise > especially if you are going to do many metadata operations. I suspect your > performance issues are partially due to the fact that metadata is being > stored on NL-SAS storage. You stated that you did not think the file heat > feature would do what you intended but have you tried to use it to see if > it could solve your problem? I would think having metadata on SSD/flash > storage combined with a all flash storage pool for your heavily used files > would perform well. If you expect IO usage will be such that there will be > far more reads than writes then LROC should be beneficial to your overall > performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > *https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479* > > . > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > Service Center in other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: valleru at cbio.mskcc.org > To: gpfsug main discussion list > Date: 02/22/2018 03:11 PM > Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage > in front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. > The backend storage will/can be tuned to give out large streaming bandwidth > and enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS > SSD cluster in front end that uses AFM and acts as a cache cluster with the > backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > ? > If the compute nodes read headers of thousands of large files ranging from > 100MB to 1GB, the AFM cluster should be able to bring up enough threads to > bring up all of the files from the backend to the faster SSD/Flash GPFS > cluster. > The working set might be about 100T, at a time which i want to be on a > faster/low latency tier, and the rest of the files to be in slower tier > until they are read by the compute nodes. > ? > > > I do not want to use GPFS policies to achieve the above, is because i am > not sure - if policies could be written in a way, that files are moved from > the slower tier to faster tier depending on how the jobs interact with the > files. > I know that the policies could be written depending on the heat, and > size/format but i don?t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM > cache cluster before the near line storage. However the AFM cluster was > really really slow, It took it about few hours to copy the files from near > line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not > tuned to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM > works. > > Has anyone tried or know if GPFS supports an architecture - where the fast > tier can bring up thousands of threads and copy the files almost > instantly/asynchronously from the slow tier, whenever the jobs from compute > nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be > really fast, as well as the network between the AFM cluster and the backend > cluster. > > Please do also let me know, if the above workflow can be done using GPFS > policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______ > ________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z > 6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From coetzee.ray at gmail.com Tue Feb 27 23:54:17 2018 From: coetzee.ray at gmail.com (Ray Coetzee) Date: Tue, 27 Feb 2018 23:54:17 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 73, Issue 60 In-Reply-To: References: Message-ID: Hi Lohit Using mmap based applications against GPFS has a number of challenges. For me the main challenge is that mmap threads can fragment the IO into multiple strided reads at random offsets which defeats GPFS's attempts in prefetching the file contents. LROC, as the name implies, is only a Local Read Only Cache and functions as an extension of your local page pool on the client. You would only see a performance improvement if the file(s) have been read into the local pagepool on a previous occasion. Depending on the dataset size & the NVMe/SSDs you have for LROC, you could look at using a pre-job to read the file(s) in their entirety on the compute node before the mmap process starts, as this would ensure the relevant data blocks are in the local pagepool or LROC. Another solution I've seen is to stage the dataset into tmpfs. Sven is working on improvements for mmap on GPFS that may make it into a production release so keep an eye out for an update. Kind regards Ray Coetzee On Tue, Feb 27, 2018 at 10:25 PM, wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Problems with remote mount via routed IB (John Hearns) > 2. Re: GPFS and Flash/SSD Storage tiered storage (Alex Chekholko) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 27 Feb 2018 09:17:36 +0000 > From: John Hearns > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Problems with remote mount via routed IB > Message-ID: > eurprd02.prod.outlook.com> > > Content-Type: text/plain; charset="us-ascii" > > Jan Erik, > Can you clarify if you are routing IP traffic between the two > Infiniband networks. > Or are you routing Infiniband traffic? > > > If I can be of help I manage an Infiniband network which connects to other > IP networks using Mellanox VPI gateways, which proxy arp between IB and > Ethernet. But I am not running GPFS traffic over these. > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss- > bounces at spectrumscale.org] On Behalf Of Sundermann, Jan Erik (SCC) > Sent: Monday, February 26, 2018 5:39 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Problems with remote mount via routed IB > > > Dear all > > we are currently trying to remote mount a file system in a routed > Infiniband test setup and face problems with dropped RDMA connections. The > setup is the following: > > - Spectrum Scale Cluster 1 is setup on four servers which are connected to > the same infiniband network. Additionally they are connected to a fast > ethernet providing ip communication in the network 192.168.11.0/24. > > - Spectrum Scale Cluster 2 is setup on four additional servers which are > connected to a second infiniband network. These servers have IPs on their > IB interfaces in the network 192.168.12.0/24. > > - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated > machine. > > - We have a dedicated IB hardware router connected to both IB subnets. > > > We tested that the routing, both IP and IB, is working between the two > clusters without problems and that RDMA is working fine both for internal > communication inside cluster 1 and cluster 2 > > When trying to remote mount a file system from cluster 1 in cluster 2, > RDMA communication is not working as expected. Instead we see error > messages on the remote host (cluster 2) > > > 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 2 > 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 3 > 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to > 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 1 > 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 3 > 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 > 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 1 > 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to > 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 0 > 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 > 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 0 > 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to > 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 2 > 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 2 > 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > fabnum 0 error 733 index 3 > 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > index 3 > > > and in the cluster with the file system (cluster 1) > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > fabnum 0 sl 0 index 3 > > > > Any advice on how to configure the setup in a way that would allow the > remote mount via routed IB would be very appreciated. > > > Thank you and best regards > Jan Erik > > > -- The information contained in this communication and any attachments is > confidential and may be privileged, and is for the sole use of the intended > recipient(s). Any unauthorized review, use, disclosure or distribution is > prohibited. Unless explicitly stated otherwise in the body of this > communication or the attachment thereto (if any), the information is > provided on an AS-IS basis without any express or implied warranties or > liabilities. To the extent you are relying on this information, you are > doing so at your own risk. If you are not the intended recipient, please > notify the sender immediately by replying to this message and destroy all > copies of this message and any attachments. Neither the sender nor the > company/group of companies he or she represents shall be liable for the > proper and complete transmission of the information contained in this > communication, or for any delay in its receipt. > > > ------------------------------ > > Message: 2 > Date: Tue, 27 Feb 2018 14:25:30 -0800 > From: Alex Chekholko > To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org > Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > storage > Message-ID: > mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > My experience has been that you could spend the same money to just make > your main pool more performant. Instead of doing two data transfers (one > from cold pool to AFM or hot pools, one from AFM/hot to client), you can > just make the direct access of the data faster by adding more resources to > your main pool. > > Regards, > Alex > > On Thu, Feb 22, 2018 at 5:27 PM, wrote: > > > Thanks, I will try the file heat feature but i am really not sure, if it > > would work - since the code can access cold files too, and not > necessarily > > files recently accessed/hot files. > > > > With respect to LROC. Let me explain as below: > > > > The use case is that - > > The code initially reads headers (small region of data) from thousands of > > files as the first step. For example about 30,000 of them with each about > > 300MB to 500MB in size. > > After the first step, with the help of those headers - it mmaps/seeks > > across various regions of a set of files in parallel. > > Since its all small IOs and it was really slow at reading from GPFS over > > the network directly from disks - Our idea was to use AFM which i believe > > fetches all file data into flash/ssds, once the initial few blocks of the > > files are read. > > But again - AFM seems to not solve the problem, so i want to know if LROC > > behaves in the same way as AFM, where all of the file data is prefetched > in > > full block size utilizing all the worker threads - if few blocks of the > > file is read initially. > > > > Thanks, > > Lohit > > > > On Feb 22, 2018, 4:52 PM -0500, IBM Spectrum Scale , > > wrote: > > > > My apologies for not being more clear on the flash storage pool. I meant > > that this would be just another GPFS storage pool in the same cluster, so > > no separate AFM cache cluster. You would then use the file heat feature > to > > ensure more frequently accessed files are migrated to that all flash > > storage pool. > > > > As for LROC could you please clarify what you mean by a few headers/stubs > > of the file? In reading the LROC documentation and the LROC variables > > available in the mmchconfig command I think you might want to take a > look a > > the lrocDataStubFileSize variable since it seems to apply to your > situation. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------ > > ------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > > (GPFS), then please post it to the public IBM developerWroks Forum at > > https://www.ibm.com/developerworks/community/ > > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) > > and you have an IBM software maintenance contract please contact > > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > > Service Center in other countries. > > > > The forum is informally monitored as time permits and should not be used > > for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: valleru at cbio.mskcc.org > > To: gpfsug main discussion list > > > Cc: gpfsug-discuss-bounces at spectrumscale.org > > Date: 02/22/2018 04:21 PM > > Subject: Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > > storage > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------ > > > > > > > > Thank you. > > > > I am sorry if i was not clear, but the metadata pool is all on SSDs in > the > > GPFS clusters that we use. Its just the data pool that is on Near-Line > > Rotating disks. > > I understand that AFM might not be able to solve the issue, and I will > try > > and see if file heat works for migrating the files to flash tier. > > You mentioned an all flash storage pool for heavily used files - so you > > mean a different GPFS cluster just with flash storage, and to manually > copy > > the files to flash storage whenever needed? > > The IO performance that i am talking is prominently for reads, so you > > mention that LROC can work in the way i want it to? that is prefetch all > > the files into LROC cache, after only few headers/stubs of data are read > > from those files? > > I thought LROC only keeps that block of data that is prefetched from the > > disk, and will not prefetch the whole file if a stub of data is read. > > Please do let me know, if i understood it wrong. > > > > On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale , > > wrote: > > I do not think AFM is intended to solve the problem you are trying to > > solve. If I understand your scenario correctly you state that you are > > placing metadata on NL-SAS storage. If that is true that would not be > wise > > especially if you are going to do many metadata operations. I suspect > your > > performance issues are partially due to the fact that metadata is being > > stored on NL-SAS storage. You stated that you did not think the file > heat > > feature would do what you intended but have you tried to use it to see if > > it could solve your problem? I would think having metadata on SSD/flash > > storage combined with a all flash storage pool for your heavily used > files > > would perform well. If you expect IO usage will be such that there will > be > > far more reads than writes then LROC should be beneficial to your overall > > performance. > > > > Regards, The Spectrum Scale (GPFS) team > > > > ------------------------------------------------------------ > > ------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > > (GPFS), then please post it to the public IBM developerWroks Forum at > > *https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479* > > forums/html/forum?id=11111111-0000-0000-0000-000000000479> > > . > > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) > > and you have an IBM software maintenance contract please contact > > 1-800-237-5511 <(800)%20237-5511> in the United States or your local IBM > > Service Center in other countries. > > > > The forum is informally monitored as time permits and should not be used > > for priority messages to the Spectrum Scale (GPFS) team. > > > > > > > > From: valleru at cbio.mskcc.org > > To: gpfsug main discussion list > > > Date: 02/22/2018 03:11 PM > > Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered > storage > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------ > > > > > > > > Hi All, > > > > I am trying to figure out a GPFS tiering architecture with flash storage > > in front end and near line storage as backend, for Supercomputing > > > > The Backend storage will be a GPFS storage on near line of about 8-10PB. > > The backend storage will/can be tuned to give out large streaming > bandwidth > > and enough metadata disks to make the stat of all these files fast > enough. > > > > I was thinking if it would be possible to use a GPFS flash cluster or > GPFS > > SSD cluster in front end that uses AFM and acts as a cache cluster with > the > > backend GPFS cluster. > > > > At the end of this .. the workflow that i am targeting is where: > > > > > > ? > > If the compute nodes read headers of thousands of large files ranging > from > > 100MB to 1GB, the AFM cluster should be able to bring up enough threads > to > > bring up all of the files from the backend to the faster SSD/Flash GPFS > > cluster. > > The working set might be about 100T, at a time which i want to be on a > > faster/low latency tier, and the rest of the files to be in slower tier > > until they are read by the compute nodes. > > ? > > > > > > I do not want to use GPFS policies to achieve the above, is because i am > > not sure - if policies could be written in a way, that files are moved > from > > the slower tier to faster tier depending on how the jobs interact with > the > > files. > > I know that the policies could be written depending on the heat, and > > size/format but i don?t think thes policies work in a similar way as > above. > > > > I did try the above architecture, where an SSD GPFS cluster acts as an > AFM > > cache cluster before the near line storage. However the AFM cluster was > > really really slow, It took it about few hours to copy the files from > near > > line storage to AFM cache cluster. > > I am not sure if AFM is not designed to work this way, or if AFM is not > > tuned to work as fast as it should. > > > > I have tried LROC too, but it does not behave the same way as i guess AFM > > works. > > > > Has anyone tried or know if GPFS supports an architecture - where the > fast > > tier can bring up thousands of threads and copy the files almost > > instantly/asynchronously from the slow tier, whenever the jobs from > compute > > nodes reads few blocks from these files? > > I understand that with respect to hardware - the AFM cluster should be > > really fast, as well as the network between the AFM cluster and the > backend > > cluster. > > > > Please do also let me know, if the above workflow can be done using GPFS > > policies and be as fast as it is needed to be. > > > > Regards, > > Lohit > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_ > listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s= > AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=* > > listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s= > AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e=> > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______ > > ________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_ > iaSHvJObTbx-siA1ZOg&r= > > IbxtjdkPAM2Sbon4Lbbi4w&m=DuqESC-4ycoY5GoHpYeH1T8baq0JWY8QfkN8z > > 6b8jPw&s=zNUAH3mFyzxcvXtrep_OroKiwR88QouIrcdN8TLJK8M&e= > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20180227/be7c09c4/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 73, Issue 60 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuartb at 4gh.net Wed Feb 28 17:49:47 2018 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 28 Feb 2018 12:49:47 -0500 (EST) Subject: [gpfsug-discuss] Problems with remote mount via routed IB In-Reply-To: References: <471B111F-5DAA-4912-829C-9AA75DCB76FA@kit.edu> Message-ID: The problem with CM is that it seems to require configuring IP over Infiniband. I'm rather strongly opposed to IP over IB. We did run IPoIB years ago, but pulled it out of our environment as adding unneeded complexity. It requires provisioning IP addresses across the Infiniband infrastructure and possibly adding routers to other portions of the IP infrastructure. It was also confusing some users due to multiple IPs on the compute infrastructure. We have recently been in discussions with a vendor about their support for GPFS over IB and they kept directing us to using CM (which still didn't work). CM wasn't necessary once we found out about the actual problem (we needed the undocumented verbsRdmaUseGidIndexZero configuration option among other things due to their use of SR-IOV based virtual IB interfaces). We don't use routed Infiniband and it might be that CM and IPoIB is required for IB routing, but I doubt it. It sounds like the OP is keeping IB and IP infrastructure separate. Stuart Barkley On Mon, 26 Feb 2018 at 14:16 -0000, Aaron Knister wrote: > Date: Mon, 26 Feb 2018 14:16:34 > From: Aaron Knister > Reply-To: gpfsug main discussion list > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Problems with remote mount via routed IB > > Hi Jan Erik, > > It was my understanding that the IB hardware router required RDMA CM to work. > By default GPFS doesn't use the RDMA Connection Manager but it can be enabled > (e.g. verbsRdmaCm=enable). I think this requires a restart on clients/servers > (in both clusters) to take effect. Maybe someone else on the list can comment > in more detail-- I've been told folks have successfully deployed IB routers > with GPFS. > > -Aaron > > On 2/26/18 11:38 AM, Sundermann, Jan Erik (SCC) wrote: > > > > Dear all > > > > we are currently trying to remote mount a file system in a routed Infiniband > > test setup and face problems with dropped RDMA connections. The setup is the > > following: > > > > - Spectrum Scale Cluster 1 is setup on four servers which are connected to > > the same infiniband network. Additionally they are connected to a fast > > ethernet providing ip communication in the network 192.168.11.0/24. > > > > - Spectrum Scale Cluster 2 is setup on four additional servers which are > > connected to a second infiniband network. These servers have IPs on their IB > > interfaces in the network 192.168.12.0/24. > > > > - IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated > > machine. > > > > - We have a dedicated IB hardware router connected to both IB subnets. > > > > > > We tested that the routing, both IP and IB, is working between the two > > clusters without problems and that RDMA is working fine both for internal > > communication inside cluster 1 and cluster 2 > > > > When trying to remote mount a file system from cluster 1 in cluster 2, RDMA > > communication is not working as expected. Instead we see error messages on > > the remote host (cluster 2) > > > > > > 2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > > 2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 2 > > 2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to > > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 3 > > 2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > > 2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to > > 192.168.11.3 (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 1 > > 2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 3 > > 2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 > > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1 > > 2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 > > (iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 1 > > 2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to > > 192.168.11.2 (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 0 > > 2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 > > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0 > > 2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 > > (iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 0 > > 2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to > > 192.168.11.4 (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 2 > > 2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2 > > 2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 > > (iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 2 > > 2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to > > 192.168.11.1 (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 > > fabnum 0 error 733 index 3 > > 2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3 > > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 > > (iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 > > index 3 > > > > > > and in the cluster with the file system (cluster 1) > > > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:48:32.523+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:48:35.398+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:48:53.135+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:48:55.600+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA rdma read error > > IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in > > gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 > > 2018-02-23_13:49:11.577+0100: [E] VERBS RDMA closed connection to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 3 > > 2018-02-23_13:49:11.939+0100: [I] VERBS RDMA accepted and connected to > > 192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 > > fabnum 0 sl 0 index 3 > > > > > > > > Any advice on how to configure the setup in a way that would allow the > > remote mount via routed IB would be very appreciated. > > > > > > Thank you and best regards > > Jan Erik > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone