From Kevin.Buterbaugh at Vanderbilt.Edu  Wed Apr  5 17:40:30 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Wed, 5 Apr 2017 16:40:30 +0000
Subject: [gpfsug-discuss] Can't delete filesystem
Message-ID: <20E4B082-2BBB-478B-B1E1-2BC8125FE50F@vanderbilt.edu>

Hi All,

First off, I can open a PMR on this if I need to?

I am trying to delete a GPFS filesystem but mmdelfs is telling me that the filesystem is still mounted on 14 nodes and therefore can?t be deleted.  10 of those nodes are my 10 GPFS servers and they have an ?internal mount? still mounted.  IIRC, it?s the other 4 (client) nodes I need to concentrate on ? i.e. once those other 4 clients no longer have it mounted the internal mounts will resolve themselves.  Correct me if I?m wrong on that, please.

So, I have gone to all of the 4 clients and none of them say they have it mounted according to either ?df? or ?mount?.  I?ve gone ahead and run both ?mmunmount? and ?umount -l? on the filesystem anyway, but the mmdelfs still fails saying that they have it mounted.

What do I need to do to resolve this issue on those 4 clients?  Thanks?

Kevin

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170405/4b8586cc/attachment.htm>

From S.J.Thompson at bham.ac.uk  Wed Apr  5 17:47:36 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services))
Date: Wed, 5 Apr 2017 16:47:36 +0000
Subject: [gpfsug-discuss] Can't delete filesystem
In-Reply-To: <20E4B082-2BBB-478B-B1E1-2BC8125FE50F@vanderbilt.edu>
References: <20E4B082-2BBB-478B-B1E1-2BC8125FE50F@vanderbilt.edu>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58D7097@EX13.adf.bham.ac.uk>

Do you have ILM (dsmrecalld and friends) running?

They can also stop the filesystem being released (e.g. mmshutdown fails if they are up).

Simon 
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu]
Sent: 05 April 2017 17:40
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Can't delete filesystem

Hi All,

First off, I can open a PMR on this if I need to?

I am trying to delete a GPFS filesystem but mmdelfs is telling me that the filesystem is still mounted on 14 nodes and therefore can?t be deleted.  10 of those nodes are my 10 GPFS servers and they have an ?internal mount? still mounted.  IIRC, it?s the other 4 (client) nodes I need to concentrate on ? i.e. once those other 4 clients no longer have it mounted the internal mounts will resolve themselves.  Correct me if I?m wrong on that, please.

So, I have gone to all of the 4 clients and none of them say they have it mounted according to either ?df? or ?mount?.  I?ve gone ahead and run both ?mmunmount? and ?umount -l? on the filesystem anyway, but the mmdelfs still fails saying that they have it mounted.

What do I need to do to resolve this issue on those 4 clients?  Thanks?

Kevin

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


From valdis.kletnieks at vt.edu  Wed Apr  5 17:54:16 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Wed, 05 Apr 2017 12:54:16 -0400
Subject: [gpfsug-discuss] Can't delete filesystem
In-Reply-To: <20E4B082-2BBB-478B-B1E1-2BC8125FE50F@vanderbilt.edu>
References: <20E4B082-2BBB-478B-B1E1-2BC8125FE50F@vanderbilt.edu>
Message-ID: <7103.1491411256@turing-police.cc.vt.edu>

On Wed, 05 Apr 2017 16:40:30 -0000, "Buterbaugh, Kevin L" said:
> So, I have gone to all of the 4 clients and none of them say they have it
> mounted according to either ???df??? or ???mount???.  I???ve gone ahead and run both
> ???mmunmount??? and ???umount -l??? on the filesystem anyway, but the mmdelfs still
> fails saying that they have it mounted.

I've over the years seen this a few times. Doing an 'mmshutdown/mmstartup'
pair on the offending nodes has always cleared it up.  I probably should have
opened a PMR, but it always seems to happen when I'm up to <body parts> in
alligators with other issues.

(Am I the only person who wonders why all complex software packages contain
alligator-detector routines? :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 484 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170405/08783c56/attachment.sig>

From Kevin.Buterbaugh at Vanderbilt.Edu  Wed Apr  5 17:54:14 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Wed, 5 Apr 2017 16:54:14 +0000
Subject: [gpfsug-discuss] Can't delete filesystem
In-Reply-To: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58D7097@EX13.adf.bham.ac.uk>
References: <20E4B082-2BBB-478B-B1E1-2BC8125FE50F@vanderbilt.edu>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58D7097@EX13.adf.bham.ac.uk>
Message-ID: <0F877E25-6C58-4790-86CD-7E2108EC8EB5@vanderbilt.edu>

Hi Simon,

No, I do not.

Let me also add that this is a filesystem that I migrated users off of and to another GPFS filesystem.  I moved the last users this morning and then ran an ?mmunmount? across the whole cluster via mmdsh.  Therefore, if the simple solution is to use the ?-p? option to mmdelfs I?m fine with that.  I?m just not sure what the right course of action is at this point.

Thanks again?

Kevin

> On Apr 5, 2017, at 11:47 AM, Simon Thompson (Research Computing - IT Services) <S.J.Thompson at bham.ac.uk> wrote:
> 
> Do you have ILM (dsmrecalld and friends) running?
> 
> They can also stop the filesystem being released (e.g. mmshutdown fails if they are up).
> 
> Simon 
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu]
> Sent: 05 April 2017 17:40
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] Can't delete filesystem
> 
> Hi All,
> 
> First off, I can open a PMR on this if I need to?
> 
> I am trying to delete a GPFS filesystem but mmdelfs is telling me that the filesystem is still mounted on 14 nodes and therefore can?t be deleted.  10 of those nodes are my 10 GPFS servers and they have an ?internal mount? still mounted.  IIRC, it?s the other 4 (client) nodes I need to concentrate on ? i.e. once those other 4 clients no longer have it mounted the internal mounts will resolve themselves.  Correct me if I?m wrong on that, please.
> 
> So, I have gone to all of the 4 clients and none of them say they have it mounted according to either ?df? or ?mount?.  I?ve gone ahead and run both ?mmunmount? and ?umount -l? on the filesystem anyway, but the mmdelfs still fails saying that they have it mounted.
> 
> What do I need to do to resolve this issue on those 4 clients?  Thanks?
> 
> Kevin
> 
> ?
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and Education
> Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From janfrode at tanso.net  Wed Apr  5 22:51:15 2017
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Wed, 05 Apr 2017 21:51:15 +0000
Subject: [gpfsug-discuss] Can't delete filesystem
In-Reply-To: <0F877E25-6C58-4790-86CD-7E2108EC8EB5@vanderbilt.edu>
References: <20E4B082-2BBB-478B-B1E1-2BC8125FE50F@vanderbilt.edu>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58D7097@EX13.adf.bham.ac.uk>
	<0F877E25-6C58-4790-86CD-7E2108EC8EB5@vanderbilt.edu>
Message-ID: <CAHwPatiMosC5Ani_HRHmH4cFpapuB-+Ui1g7oCpc5ydVbYn-dA@mail.gmail.com>

Maybe try mmumount -f on the remaining 4 nodes?


-jf
ons. 5. apr. 2017 kl. 18.54 skrev Buterbaugh, Kevin L <
Kevin.Buterbaugh at vanderbilt.edu>:

> Hi Simon,
>
> No, I do not.
>
> Let me also add that this is a filesystem that I migrated users off of and
> to another GPFS filesystem.  I moved the last users this morning and then
> ran an ?mmunmount? across the whole cluster via mmdsh.  Therefore, if the
> simple solution is to use the ?-p? option to mmdelfs I?m fine with that.
> I?m just not sure what the right course of action is at this point.
>
> Thanks again?
>
> Kevin
>
> > On Apr 5, 2017, at 11:47 AM, Simon Thompson (Research Computing - IT
> Services) <S.J.Thompson at bham.ac.uk> wrote:
> >
> > Do you have ILM (dsmrecalld and friends) running?
> >
> > They can also stop the filesystem being released (e.g. mmshutdown fails
> if they are up).
> >
> > Simon
> > ________________________________________
> > From: gpfsug-discuss-bounces at spectrumscale.org [
> gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin
> L [Kevin.Buterbaugh at Vanderbilt.Edu]
> > Sent: 05 April 2017 17:40
> > To: gpfsug main discussion list
> > Subject: [gpfsug-discuss] Can't delete filesystem
> >
> > Hi All,
> >
> > First off, I can open a PMR on this if I need to?
> >
> > I am trying to delete a GPFS filesystem but mmdelfs is telling me that
> the filesystem is still mounted on 14 nodes and therefore can?t be
> deleted.  10 of those nodes are my 10 GPFS servers and they have an
> ?internal mount? still mounted.  IIRC, it?s the other 4 (client) nodes I
> need to concentrate on ? i.e. once those other 4 clients no longer have it
> mounted the internal mounts will resolve themselves.  Correct me if I?m
> wrong on that, please.
> >
> > So, I have gone to all of the 4 clients and none of them say they have
> it mounted according to either ?df? or ?mount?.  I?ve gone ahead and run
> both ?mmunmount? and ?umount -l? on the filesystem anyway, but the mmdelfs
> still fails saying that they have it mounted.
> >
> > What do I need to do to resolve this issue on those 4 clients?  Thanks?
> >
> > Kevin
> >
> > ?
> > Kevin Buterbaugh - Senior System Administrator
> > Vanderbilt University - Advanced Computing Center for Research and
> Education
> > Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu>
> - (615)875-9633
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170405/34471715/attachment.htm>

From Mark.Bush at siriuscom.com  Thu Apr  6 02:54:07 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Thu, 6 Apr 2017 01:54:07 +0000
Subject: [gpfsug-discuss] AFM misunderstanding
Message-ID: <B469C654-49B2-4D32-9B9D-CF7CDFE3305A@siriuscom.com>

When I setup a AFM relationship (let?s just say I?m doing RO), does prefetch bring bits of the actual file over to the cache or is it only ever metadata?  I know there is a ?metadata-only switch but it appears that if I try a mmafmctl prefetch operation and then I do a ls ?ltrs on the cache it?s still 0 bytes.  I do see the queue increasing when I do a mmafmctl getstate.  I realize that the data truly only flows once the file is requested (I just do a dd if=mycachedfile of=/dev/null).  But this is just my test env.  How to I get the bits to flow before I request them assuming that I will at some point need them?  Or do I just misunderstand AFM altogether?  I?m more used to mirroring so maybe that?s my frame of reference and it?s not the AFM architecture.


Mark

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170406/91f402b6/attachment.htm>

From S.J.Thompson at bham.ac.uk  Thu Apr  6 09:20:31 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services))
Date: Thu, 6 Apr 2017 08:20:31 +0000
Subject: [gpfsug-discuss] Spectrum Scale Encryption
Message-ID: <D50BB8DF.3A059%s.j.thompson@bham.ac.uk>

We are currently looking at adding encryption to our deployment for some
of our data sets and for some of our nodes. Apologies in advance if some
of this is a bit vague, we're not yet at the point where we can test this
stuff out, so maybe some of it will become clear when we try it out.


For a node that we don't want to have access to any encrypted data, what
do we need to set up?

According to the docs:
https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s
cale.v4r22.doc/bl1adv_encryption_prep.htm


"After the file system is configured with encryption policy rules, the
file system is considered encrypted. From that point on, each node that
has access to that file system must have an RKM.conf file present.
Otherwise, the file system might not be mounted or might become unmounted."

So on a node which I don't want to have access to any encrypted files, do
I just need to have an empty RKM.conf file?

(If this is the case, would be good to have this added to the docs)


Secondly ... (and maybe I'm misunderstanding the docs here)

For the Policy
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectru
m.scale.v4r22.doc/bl1adv_encryptionpolicyrules.htm


KEYS ('Keyname'[, 'Keyname', ... ])


KeyId:RkmId


RkmId should match the stanza name in RKM.conf?

If so, it would be useful if the docs used the same names in the examples
(RKMKMIP3 vs rkmname3)

And KeyId should match a "Key UUID" in SKLM?


Third. My understanding from talking to various IBM people is that we need
ISKLM entitlements for NSD Servers, Protocol nodes and AFM gateways
(probably), do we have to do any kind of node registration in ISKLM? Or is
this purely based on the certificates being distributed to clients and
keys are mapped in ISKLM to the client cert to determine if the node is
able to request the key?

Thanks

Simon


From vpuvvada at in.ibm.com  Thu Apr  6 11:45:37 2017
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Thu, 6 Apr 2017 16:15:37 +0530
Subject: [gpfsug-discuss] AFM misunderstanding
In-Reply-To: <B469C654-49B2-4D32-9B9D-CF7CDFE3305A@siriuscom.com>
References: <B469C654-49B2-4D32-9B9D-CF7CDFE3305A@siriuscom.com>
Message-ID: <OFB5A9F07E.363C8C05-ON652580FA.003A3852-652580FA.003B1BB9@notes.na.collabserv.com>

Could you explain  "bits of actual file"  mentioned below ?  Prefetch with 
?metadata-only pulls everything (xattrs, ACLs etc..) except data. Doing "
ls ?ltrs" shows file allocation size as zero if data prefetch  not  yet 
completed on them.

~Venkat (vpuvvada at in.ibm.com)


From:   Mark Bush <Mark.Bush at siriuscom.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/06/2017 07:24 AM
Subject:        [gpfsug-discuss] AFM misunderstanding
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


When I setup a AFM relationship (let?s just say I?m doing RO), does 
prefetch bring bits of the actual file over to the cache or is it only 
ever metadata?  I know there is a ?metadata-only switch but it appears 
that if I try a mmafmctl prefetch operation and then I do a ls ?ltrs on 
the cache it?s still 0 bytes.  I do see the queue increasing when I do a 
mmafmctl getstate.  I realize that the data truly only flows once the file 
is requested (I just do a dd if=mycachedfile of=/dev/null).  But this is 
just my test env.  How to I get the bits to flow before I request them 
assuming that I will at some point need them?  Or do I just misunderstand 
AFM altogether?  I?m more used to mirroring so maybe that?s my frame of 
reference and it?s not the AFM architecture. 
 
 
Mark
This message (including any attachments) is intended only for the use of 
the individual or entity to which it is addressed and may contain 
information that is non-public, proprietary, privileged, confidential, and 
exempt from disclosure under applicable law. If you are not the intended 
recipient, you are hereby notified that any use, dissemination, 
distribution, or copying of this communication is strictly prohibited. 
This message may be viewed by parties at Sirius Computer Solutions other 
than those named in the message header. This message does not contain an 
official representation of Sirius Computer Solutions. If you have received 
this communication in error, notify Sirius Computer Solutions immediately 
and (i) destroy this message if a facsimile or (ii) delete this message 
immediately if this is an electronic communication. Thank you. 
Sirius Computer Solutions _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170406/8b9a46da/attachment.htm>

From Mark.Bush at siriuscom.com  Thu Apr  6 13:28:40 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Thu, 6 Apr 2017 12:28:40 +0000
Subject: [gpfsug-discuss] AFM misunderstanding
In-Reply-To: <OFB5A9F07E.363C8C05-ON652580FA.003A3852-652580FA.003B1BB9@notes.na.collabserv.com>
References: <B469C654-49B2-4D32-9B9D-CF7CDFE3305A@siriuscom.com>
	<OFB5A9F07E.363C8C05-ON652580FA.003A3852-652580FA.003B1BB9@notes.na.collabserv.com>
Message-ID: <425C32E7-B752-4B61-BDF5-83C219D89ADB@siriuscom.com>

I think I was missing a key piece in that I thought that just doing a mmafmctl fs1 prefetch ?j cache would start grabbing everything (data and metadata) but it appears that the ?list-file myfiles.txt is the trigger for the prefetch to work properly.  I mistakenly assumed that omitting the ?list-file switch would prefetch all the data in the fileset.

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Thursday, April 6, 2017 at 5:45 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM misunderstanding

Could you explain  "bits of actual file"  mentioned below ?  Prefetch with ?metadata-onlypulls everything (xattrs, ACLs etc..) except data. Doing "ls ?ltrs" shows file allocation size as zero if data prefetch  not  yet completed on them.

~Venkat (vpuvvada at in.ibm.com)


From:        Mark Bush <Mark.Bush at siriuscom.com>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        04/06/2017 07:24 AM
Subject:        [gpfsug-discuss] AFM misunderstanding
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


When I setup a AFM relationship (let?s just say I?m doing RO), does prefetch bring bits of the actual file over to the cache or is it only ever metadata?  I know there is a ?metadata-only switch but it appears that if I try a mmafmctl prefetch operation and then I do a ls ?ltrs on the cache it?s still 0 bytes.  I do see the queue increasing when I do a mmafmctl getstate.  I realize that the data truly only flows once the file is requested (I just do a dd if=mycachedfile of=/dev/null).  But this is just my test env.  How to I get the bits to flow before I request them assuming that I will at some point need them?  Or do I just misunderstand AFM altogether?  I?m more used to mirroring so maybe that?s my frame of reference and it?s not the AFM architecture.


Mark

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com/> _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170406/b09d5dd6/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Thu Apr  6 15:33:18 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Thu, 6 Apr 2017 14:33:18 +0000
Subject: [gpfsug-discuss] Can't delete filesystem
In-Reply-To: <CAHwPatiMosC5Ani_HRHmH4cFpapuB-+Ui1g7oCpc5ydVbYn-dA@mail.gmail.com>
References: <20E4B082-2BBB-478B-B1E1-2BC8125FE50F@vanderbilt.edu>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58D7097@EX13.adf.bham.ac.uk>
	<0F877E25-6C58-4790-86CD-7E2108EC8EB5@vanderbilt.edu>
	<CAHwPatiMosC5Ani_HRHmH4cFpapuB-+Ui1g7oCpc5ydVbYn-dA@mail.gmail.com>
Message-ID: <B934FF3A-9ADC-4BBD-B183-3269BC8A00CA@vanderbilt.edu>

Hi JF,

I actually tried that - to no effect.  Yesterday evening I rebooted the 4 clients and, as expected, the 10 servers released their internal mounts as well ? and then I was able to delete the filesystem successfully.  Thanks for the suggestions, all?

Kevin

On Apr 5, 2017, at 4:51 PM, Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>> wrote:

Maybe try mmumount -f on the remaining 4 nodes?


-jf
ons. 5. apr. 2017 kl. 18.54 skrev Buterbaugh, Kevin L <Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu>>:
Hi Simon,

No, I do not.

Let me also add that this is a filesystem that I migrated users off of and to another GPFS filesystem.  I moved the last users this morning and then ran an ?mmunmount? across the whole cluster via mmdsh.  Therefore, if the simple solution is to use the ?-p? option to mmdelfs I?m fine with that.  I?m just not sure what the right course of action is at this point.

Thanks again?

Kevin

> On Apr 5, 2017, at 11:47 AM, Simon Thompson (Research Computing - IT Services) <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>> wrote:
>
> Do you have ILM (dsmrecalld and friends) running?
>
> They can also stop the filesystem being released (e.g. mmshutdown fails if they are up).
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>]
> Sent: 05 April 2017 17:40
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] Can't delete filesystem
>
> Hi All,
>
> First off, I can open a PMR on this if I need to?
>
> I am trying to delete a GPFS filesystem but mmdelfs is telling me that the filesystem is still mounted on 14 nodes and therefore can?t be deleted.  10 of those nodes are my 10 GPFS servers and they have an ?internal mount? still mounted.  IIRC, it?s the other 4 (client) nodes I need to concentrate on ? i.e. once those other 4 clients no longer have it mounted the internal mounts will resolve themselves.  Correct me if I?m wrong on that, please.
>
> So, I have gone to all of the 4 clients and none of them say they have it mounted according to either ?df? or ?mount?.  I?ve gone ahead and run both ?mmunmount? and ?umount -l? on the filesystem anyway, but the mmdelfs still fails saying that they have it mounted.
>
> What do I need to do to resolve this issue on those 4 clients?  Thanks?
>
> Kevin
>
> ?
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and Education
> Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu><mailto:Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu>> - (615)875-9633
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170406/2b49c80a/attachment.htm>

From ewahl at osc.edu  Thu Apr  6 15:54:42 2017
From: ewahl at osc.edu (Wahl, Edward)
Date: Thu, 6 Apr 2017 14:54:42 +0000
Subject: [gpfsug-discuss] Spectrum Scale Encryption
In-Reply-To: <D50BB8DF.3A059%s.j.thompson@bham.ac.uk>
References: <D50BB8DF.3A059%s.j.thompson@bham.ac.uk>
Message-ID: <9DA9EC7A281AC7428A9618AFDC490499591F4BDB@CIO-KRC-D1MBX02.osuad.osu.edu>

This is rather dependant on SS version. 

So what used to happen before 4.2.2.* is that a client would be unable to mount the filesystem in question and would give an error in the mmfs.log.latest for an SGPanic,   In 4.2.2.* It appears it will now mount the file system and then give errors on file access instead.  (just tested this on 4.2.2.3) I'll have to read through the changelogs looking for this one. 

Depending on your policy for encryption then, this might be exactly what you want,  but I REALLY REALLY dislike this behaviour.

To me this means clients can now mount an encrypted FS now and then fail during operation.  If I get a client node that comes up improperly, user work will start, and it will fail with "Operation not permitted" errors on file access.  I imagine my batch system could run through a massive amount of jobs on a bad client without anyone noticing immeadiately.  Yet another thing we now have to monitor now I guess.  *shrug*

A couple other gotcha's we've seen with Encryption:

Encrypted file systems do not store data in large MD blocks.  Makes sense. This means large MD blocks aren't as useful as they are in unencrypted FS, if you are using this. 

Having at least one backup SKLM server is a good idea.  "kmipServerUri[N+1]" in the conf.

While the documentation claims the FS can continue operation once it caches the MEK if an SKLM server goes away, in operation this does NOT work as you may expect.  Your users still need access to the FEKs for the files your clients work on.  Logs will fill with Key <key> could not be fetched. errors. 

Ed Wahl
OSC

________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Simon Thompson (Research Computing - IT Services) [S.J.Thompson at bham.ac.uk]
Sent: Thursday, April 06, 2017 4:20 AM
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Spectrum Scale Encryption

We are currently looking at adding encryption to our deployment for some
of our data sets and for some of our nodes. Apologies in advance if some
of this is a bit vague, we're not yet at the point where we can test this
stuff out, so maybe some of it will become clear when we try it out.


For a node that we don't want to have access to any encrypted data, what
do we need to set up?

According to the docs:
https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s
cale.v4r22.doc/bl1adv_encryption_prep.htm


"After the file system is configured with encryption policy rules, the
file system is considered encrypted. From that point on, each node that
has access to that file system must have an RKM.conf file present.
Otherwise, the file system might not be mounted or might become unmounted."

So on a node which I don't want to have access to any encrypted files, do
I just need to have an empty RKM.conf file?

(If this is the case, would be good to have this added to the docs)


Secondly ... (and maybe I'm misunderstanding the docs here)

For the Policy
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectru
m.scale.v4r22.doc/bl1adv_encryptionpolicyrules.htm


KEYS ('Keyname'[, 'Keyname', ... ])


KeyId:RkmId


RkmId should match the stanza name in RKM.conf?

If so, it would be useful if the docs used the same names in the examples
(RKMKMIP3 vs rkmname3)

And KeyId should match a "Key UUID" in SKLM?


Third. My understanding from talking to various IBM people is that we need
ISKLM entitlements for NSD Servers, Protocol nodes and AFM gateways
(probably), do we have to do any kind of node registration in ISKLM? Or is
this purely based on the certificates being distributed to clients and
keys are mapped in ISKLM to the client cert to determine if the node is
able to request the key?

Thanks

Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From S.J.Thompson at bham.ac.uk  Thu Apr  6 16:11:38 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services))
Date: Thu, 6 Apr 2017 15:11:38 +0000
Subject: [gpfsug-discuss] Spectrum Scale Encryption
Message-ID: <D50C17B8.3A0F1%s.j.thompson@bham.ac.uk>

Hi Ed,

Thanks.

We already have several SKLM servers (tape backups).

For me, we plan to encrypt specific parts of the FS (probably by
file-set), so as long as all that is needed is an empty RKM.conf file,
sounds like it will work. I suppose I could have an MEK that is granted to
all clients, but then never actually use it for encryption if RKM.conf
needs at least one key (hack hack hack).

(We are at 4.2.2-2 (mostly) or higher (a few nodes)).

I *thought* the FEK was wrapped in the metadata with the MEK (possibly
multiple times with different MEKs), so what the docs say about operation
continuing with no SKLM server sounds sensible, but of course that might
not be what actually happens I guess...

Simon

On 06/04/2017, 15:54, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of Wahl, Edward" <gpfsug-discuss-bounces at spectrumscale.org on behalf of
ewahl at osc.edu> wrote:

>This is rather dependant on SS version.
>
>So what used to happen before 4.2.2.* is that a client would be unable to
>mount the filesystem in question and would give an error in the
>mmfs.log.latest for an SGPanic,   In 4.2.2.* It appears it will now mount
>the file system and then give errors on file access instead.  (just
>tested this on 4.2.2.3) I'll have to read through the changelogs looking
>for this one. 
>
>Depending on your policy for encryption then, this might be exactly what
>you want,  but I REALLY REALLY dislike this behaviour.
>
>To me this means clients can now mount an encrypted FS now and then fail
>during operation.  If I get a client node that comes up improperly, user
>work will start, and it will fail with "Operation not permitted" errors
>on file access.  I imagine my batch system could run through a massive
>amount of jobs on a bad client without anyone noticing immeadiately.  Yet
>another thing we now have to monitor now I guess.  *shrug*
>
>A couple other gotcha's we've seen with Encryption:
>
>Encrypted file systems do not store data in large MD blocks.  Makes
>sense. This means large MD blocks aren't as useful as they are in
>unencrypted FS, if you are using this.
>
>Having at least one backup SKLM server is a good idea.
>"kmipServerUri[N+1]" in the conf.
>
>While the documentation claims the FS can continue operation once it
>caches the MEK if an SKLM server goes away, in operation this does NOT
>work as you may expect.  Your users still need access to the FEKs for the
>files your clients work on.  Logs will fill with Key <key> could not be
>fetched. errors. 
>
>Ed Wahl
>OSC
>
>________________________________________
>From: gpfsug-discuss-bounces at spectrumscale.org
>[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Simon Thompson
>(Research Computing - IT Services) [S.J.Thompson at bham.ac.uk]
>Sent: Thursday, April 06, 2017 4:20 AM
>To: gpfsug-discuss at spectrumscale.org
>Subject: [gpfsug-discuss] Spectrum Scale Encryption
>
>We are currently looking at adding encryption to our deployment for some
>of our data sets and for some of our nodes. Apologies in advance if some
>of this is a bit vague, we're not yet at the point where we can test this
>stuff out, so maybe some of it will become clear when we try it out.
>
>
>For a node that we don't want to have access to any encrypted data, what
>do we need to set up?
>
>According to the docs:
>https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.
>s
>cale.v4r22.doc/bl1adv_encryption_prep.htm
>
>
>"After the file system is configured with encryption policy rules, the
>file system is considered encrypted. From that point on, each node that
>has access to that file system must have an RKM.conf file present.
>Otherwise, the file system might not be mounted or might become
>unmounted."
>
>So on a node which I don't want to have access to any encrypted files, do
>I just need to have an empty RKM.conf file?
>
>(If this is the case, would be good to have this added to the docs)
>
>
>Secondly ... (and maybe I'm misunderstanding the docs here)
>
>For the Policy
>https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectr
>u
>m.scale.v4r22.doc/bl1adv_encryptionpolicyrules.htm
>
>
>KEYS ('Keyname'[, 'Keyname', ... ])
>
>
>KeyId:RkmId
>
>
>RkmId should match the stanza name in RKM.conf?
>
>If so, it would be useful if the docs used the same names in the examples
>(RKMKMIP3 vs rkmname3)
>
>And KeyId should match a "Key UUID" in SKLM?
>
>
>Third. My understanding from talking to various IBM people is that we need
>ISKLM entitlements for NSD Servers, Protocol nodes and AFM gateways
>(probably), do we have to do any kind of node registration in ISKLM? Or is
>this purely based on the certificates being distributed to clients and
>keys are mapped in ISKLM to the client cert to determine if the node is
>able to request the key?
>
>Thanks
>
>Simon
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jon.Edwards at newbase.com.au  Fri Apr  7 05:56:33 2017
From: Jon.Edwards at newbase.com.au (Jon Edwards)
Date: Fri, 7 Apr 2017 04:56:33 +0000
Subject: [gpfsug-discuss] Spectrum scale sending cluster traffic across the
	management network
Message-ID: <7929c064d6df4d7b88065b4d882daa98@newbase.com.au>

Hi All,

Just getting started with spectrum scale,


Just wondering if anyone has come across the issue where when doing a mmcrfs or mmdelfs you get the error

Failed to connect to file system daemon: Connection timed out
mmdelfs: tsdelfs failed.
mmdelfs: Command failed. Examine previous error messages to determine cause.


When viewing the logs in  /var/mmfs/gen/mmfslog on a node other than the one I am running the command on i get:

2017-04-07_14:03:13.354+1000: [N] Filtered log entry: 'connect to node 192.168.0.1:1191' occurred 10 times between 2017-04-07_11:38:19.058+1000 and 2017-04-07_11:54:58.649+1000

192.168.0.0/24  In this case is the management network configured on eth0 of all the nodes.
It is failing because port 1191 is not allowed on this interface.

The dns and hostname for each node resolves to a dedicated cluster network, lets say 10.0.0.0/24 (ETH1)


For some reason when I run the mmcrfs or mmdelfs it tries to talk back over the management network instead of the cluster network which fails to connect due to firewall blocking cluster traffic over management.

Anyone seen this before?

Kind Regards,


Jon Edwards
Senior Systems Engineer
NewBase
Email: jon.edwards at newbase.com.au<mailto:jon.edwards at newbase.com.au>
Ph: + 61 7 3216 0776
Fax: + 61 7 3216 0779

http://www.newbase.com.au<http://www.newbase.com.au/>


Opinions contained in this e-mail do not necessarily reflect the opinions of NewBase Computer Services Pty Ltd. This e-mail is for the exclusive use of the addressee and should not be disseminated further or copied without permission of the sender. If you have received this message in error, please immediately notify the sender and delete the message from your computer.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170407/636c939c/attachment.htm>

From Jon.Edwards at newbase.com.au  Fri Apr  7 06:26:56 2017
From: Jon.Edwards at newbase.com.au (Jon Edwards)
Date: Fri, 7 Apr 2017 05:26:56 +0000
Subject: [gpfsug-discuss] Spectrum scale sending cluster traffic across
 the management network
Message-ID: <6e02ed91cb404d46b7b5cd3515ad8fe9@newbase.com.au>

Please disregard, found the solution.

Found the subnets= parameter for the cluster config

mmchconfig subnets="192.168.0.0/24 192.168.1.0/24"

Which forces it to use this subnet.

Kind Regards,

Jon Edwards | Senior Systems Engineer
NewBase
Ph: + 61 7 3216 0776 | Email: jon.edwards at newbase.com.au<mailto:jon.edwards at newbase.com.au>
http://www.newbase.com.au<http://www.newbase.com.au/>

From: Jon Edwards
Sent: Friday, 7 April 2017 2:56 PM
To: 'gpfsug-discuss at spectrumscale.org' <gpfsug-discuss at spectrumscale.org>
Cc: 'Andrew Beattie' <abeattie at au1.ibm.com>
Subject: Spectrum scale sending cluster traffic across the management network

Hi All,

Just getting started with spectrum scale,


Just wondering if anyone has come across the issue where when doing a mmcrfs or mmdelfs you get the error

Failed to connect to file system daemon: Connection timed out
mmdelfs: tsdelfs failed.
mmdelfs: Command failed. Examine previous error messages to determine cause.


When viewing the logs in  /var/mmfs/gen/mmfslog on a node other than the one I am running the command on i get:

2017-04-07_14:03:13.354+1000: [N] Filtered log entry: 'connect to node 192.168.0.1:1191' occurred 10 times between 2017-04-07_11:38:19.058+1000 and 2017-04-07_11:54:58.649+1000

192.168.0.0/24  In this case is the management network configured on eth0 of all the nodes.
It is failing because port 1191 is not allowed on this interface.

The dns and hostname for each node resolves to a dedicated cluster network, lets say 10.0.0.0/24 (ETH1)


For some reason when I run the mmcrfs or mmdelfs it tries to talk back over the management network instead of the cluster network which fails to connect due to firewall blocking cluster traffic over management.

Anyone seen this before?

Kind Regards,


Jon Edwards
Senior Systems Engineer
NewBase
Email: jon.edwards at newbase.com.au<mailto:jon.edwards at newbase.com.au>
Ph: + 61 7 3216 0776
Fax: + 61 7 3216 0779

http://www.newbase.com.au<http://www.newbase.com.au/>


Opinions contained in this e-mail do not necessarily reflect the opinions of NewBase Computer Services Pty Ltd. This e-mail is for the exclusive use of the addressee and should not be disseminated further or copied without permission of the sender. If you have received this message in error, please immediately notify the sender and delete the message from your computer.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170407/17ff9b9f/attachment.htm>

From knop at us.ibm.com  Fri Apr  7 15:00:09 2017
From: knop at us.ibm.com (Felipe Knop)
Date: Fri, 7 Apr 2017 10:00:09 -0400
Subject: [gpfsug-discuss] Spectrum Scale Encryption
In-Reply-To: <9DA9EC7A281AC7428A9618AFDC490499591F4BDB@CIO-KRC-D1MBX02.osuad.osu.edu>
References: <D50BB8DF.3A059%s.j.thompson@bham.ac.uk>
	<9DA9EC7A281AC7428A9618AFDC490499591F4BDB@CIO-KRC-D1MBX02.osuad.osu.edu>
Message-ID: <OFD3CDAAD8.387AB22D-ON852580FB.0048A632-852580FB.004CEA32@notes.na.collabserv.com>

All,

A few comments on the topics raised below.

1) All nodes that mount an encrypted file system, and also the nodes with 
management roles on the file system will need access to the keys have the 
proper setup (RKM.conf, etc).

Edward is correct that there was some change in behavior, introduced in 
4.2.1 . Before the change, a mount would fail unless RKM.conf is present 
on the node. In addition, once a policy with encryption rules was applied, 
nodes without the proper encryption setup would unmount the file system. 
With the change, the error gets delayed to when encrypted files are 
accessed.

The change in behavior was introduced based on feedback that unmounting 
the file system in that case was too drastic in that scenario.

>> So on a node which I don't want to have access to any encrypted files, 
do
I just need to have an empty RKM.conf file?

All nodes which mount an encrypted file system should have proper setup 
for encryption, even for a node from where only unencrypted files are 
being accessed.


2)
>> Encrypted file systems do not store data in large MD blocks.  Makes 
sense. This means large MD blocks aren't as useful as they are in 
unencrypted FS, if you are using this. 

Correct. Data is not stored in the inode for encrypted files. On the other 
hand, since encryption metadata is stored as an extended attribute in the 
inode, 4K inodes are still recommended -- especially in cases where a more 
complicated encryption policy is used.

3)
>> Having at least one backup SKLM server is a good idea. 
"kmipServerUri[N+1]" in the conf.

While the documentation claims the FS can continue operation once it 
caches the MEK if an SKLM server goes away, in operation this does NOT 
work as you may expect.  Your users still need access to the FEKs for the 
files your clients work on.  Logs will fill with Key <key> could not be 
fetched. errors. 

Using a backup key server is strongly recommended.

While it's true that the files may still be accessed for a while if the 
key server becomes unreachable, this was not something to be counted on. 
First because keys (MEK) may expire at any time, requiring the key to be 
retrieved from the key server again. Second because a file may require a 
key may be needed that has not been cached before.


4)
>> Secondly ... (and maybe I'm misunderstanding the docs here)

For the Policy
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectru

m.scale.v4r22.doc/bl1adv_encryptionpolicyrules.htm

KEYS ('Keyname'[, 'Keyname', ... ])

KeyId:RkmId

RkmId should match the stanza name in RKM.conf?

Correct.

>> If so, it would be useful if the docs used the same names in the 
examples
(RKMKMIP3 vs rkmname3)

And KeyId should match a "Key UUID" in SKLM?

Correct.

We'll review the documentation to ensure that the meaning of the RkmId in 
the examples is clear.


5)

>> Third. My understanding from talking to various IBM people is that we 
need
ISKLM entitlements for NSD Servers, Protocol nodes and AFM gateways
(probably), do we have to do any kind of node registration in ISKLM? Or is
this purely based on the certificates being distributed to clients and
keys are mapped in ISKLM to the client cert to determine if the node is
able to request the key?

I'll work on getting clarifications from the ISKLM folks on this aspect.

  Felipe

----
Felipe Knop                                     knop at us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314  T/L 293-9314


From:   "Wahl, Edward" <ewahl at osc.edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/06/2017 10:55 AM
Subject:        Re: [gpfsug-discuss] Spectrum Scale Encryption
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


This is rather dependant on SS version. 

So what used to happen before 4.2.2.* is that a client would be unable to 
mount the filesystem in question and would give an error in the 
mmfs.log.latest for an SGPanic,   In 4.2.2.* It appears it will now mount 
the file system and then give errors on file access instead.  (just tested 
this on 4.2.2.3) I'll have to read through the changelogs looking for this 
one. 

Depending on your policy for encryption then, this might be exactly what 
you want,  but I REALLY REALLY dislike this behaviour.

To me this means clients can now mount an encrypted FS now and then fail 
during operation.  If I get a client node that comes up improperly, user 
work will start, and it will fail with "Operation not permitted" errors on 
file access.  I imagine my batch system could run through a massive amount 
of jobs on a bad client without anyone noticing immeadiately.  Yet another 
thing we now have to monitor now I guess.  *shrug*

A couple other gotcha's we've seen with Encryption:

Encrypted file systems do not store data in large MD blocks.  Makes sense. 
This means large MD blocks aren't as useful as they are in unencrypted FS, 
if you are using this. 

Having at least one backup SKLM server is a good idea. 
"kmipServerUri[N+1]" in the conf.

While the documentation claims the FS can continue operation once it 
caches the MEK if an SKLM server goes away, in operation this does NOT 
work as you may expect.  Your users still need access to the FEKs for the 
files your clients work on.  Logs will fill with Key <key> could not be 
fetched. errors. 

Ed Wahl
OSC

________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org 
[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Simon Thompson 
(Research Computing - IT Services) [S.J.Thompson at bham.ac.uk]
Sent: Thursday, April 06, 2017 4:20 AM
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Spectrum Scale Encryption

We are currently looking at adding encryption to our deployment for some
of our data sets and for some of our nodes. Apologies in advance if some
of this is a bit vague, we're not yet at the point where we can test this
stuff out, so maybe some of it will become clear when we try it out.


For a node that we don't want to have access to any encrypted data, what
do we need to set up?

According to the docs:
https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s

cale.v4r22.doc/bl1adv_encryption_prep.htm


"After the file system is configured with encryption policy rules, the
file system is considered encrypted. From that point on, each node that
has access to that file system must have an RKM.conf file present.
Otherwise, the file system might not be mounted or might become 
unmounted."

So on a node which I don't want to have access to any encrypted files, do
I just need to have an empty RKM.conf file?

(If this is the case, would be good to have this added to the docs)


Secondly ... (and maybe I'm misunderstanding the docs here)

For the Policy
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectru

m.scale.v4r22.doc/bl1adv_encryptionpolicyrules.htm


KEYS ('Keyname'[, 'Keyname', ... ])


KeyId:RkmId


RkmId should match the stanza name in RKM.conf?

If so, it would be useful if the docs used the same names in the examples
(RKMKMIP3 vs rkmname3)

And KeyId should match a "Key UUID" in SKLM?


Third. My understanding from talking to various IBM people is that we need
ISKLM entitlements for NSD Servers, Protocol nodes and AFM gateways
(probably), do we have to do any kind of node registration in ISKLM? Or is
this purely based on the certificates being distributed to clients and
keys are mapped in ISKLM to the client cert to determine if the node is
able to request the key?

Thanks

Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170407/fb35fa44/attachment.htm>

From mweil at wustl.edu  Fri Apr  7 15:58:29 2017
From: mweil at wustl.edu (Matt Weil)
Date: Fri, 7 Apr 2017 09:58:29 -0500
Subject: [gpfsug-discuss] AFM gateways
Message-ID: <f49e9fe5-5b49-6e71-ac4f-83f1673b6a82@wustl.edu>

Hello,

any reason to not enable all NSD servers as gateway when using native
gpfs AFM?  Will they all pass traffic?

Thanks

Matt


________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.


From vpuvvada at in.ibm.com  Mon Apr 10 11:56:16 2017
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Mon, 10 Apr 2017 16:26:16 +0530
Subject: [gpfsug-discuss] AFM gateways
In-Reply-To: <f49e9fe5-5b49-6e71-ac4f-83f1673b6a82@wustl.edu>
References: <f49e9fe5-5b49-6e71-ac4f-83f1673b6a82@wustl.edu>
Message-ID: <OFE66BE3A0.DCC08083-ON652580FE.003B060D-652580FE.003C151E@notes.na.collabserv.com>

It is not recommended to make NSD servers as gateway nodes for native GPFS 
protocol. Unresponsive remote cluster mount might cause gateway node to 
hang on synchronous operations (ex. Lookup, Read, Open etc..), this will 
affect NSD server functionality. More information is documented @

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1ins_NFSvsGPFSAFM.htm

~Venkat (vpuvvada at in.ibm.com)


From:   Matt Weil <mweil at wustl.edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/07/2017 08:28 PM
Subject:        [gpfsug-discuss] AFM gateways
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,

any reason to not enable all NSD servers as gateway when using native
gpfs AFM?  Will they all pass traffic?

Thanks

Matt


________________________________
The materials in this message are private and may contain Protected 
Healthcare Information or other information of a sensitive nature. If you 
are not the intended recipient, be advised that any unauthorized use, 
disclosure, copying or the taking of any action in reliance on the 
contents of this information is strictly prohibited. If you have received 
this email in error, please immediately notify the sender via telephone or 
return mail.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170410/b4568926/attachment.htm>

From Sandra.McLaughlin at astrazeneca.com  Mon Apr 10 12:20:53 2017
From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M)
Date: Mon, 10 Apr 2017 11:20:53 +0000
Subject: [gpfsug-discuss] AFM gateways
In-Reply-To: <OFE66BE3A0.DCC08083-ON652580FE.003B060D-652580FE.003C151E@notes.na.collabserv.com>
References: <f49e9fe5-5b49-6e71-ac4f-83f1673b6a82@wustl.edu>
	<OFE66BE3A0.DCC08083-ON652580FE.003B060D-652580FE.003C151E@notes.na.collabserv.com>
Message-ID: <DB5PR04MB1463EFFEFCEC10D3BE417C4BE1010@DB5PR04MB1463.eurprd04.prod.outlook.com>

Hi,

I agree with Venkat.

I did exactly what you said below, enabled my NSD servers as gateways to get additional throughput (with both native gpfs protocol and NFS protocol), which worked well; we definitely got the increased traffic. However, I wouldn't do it again through choice. As Venkat says, if there is a problem with the remote cluster, that can affect any of the gateway nodes (if using gpfs protocol), but also, we had a problem with one of the gateway nodes, where it kept crashing (which is now resolved) and then all filesets for which that node was the gateway had to failover to other gateway servers and this really messes everything up while the failover is taking place. I am also, stupidly, serving NFS and samba from the NSD servers (via ctdb) which I also, would not do again !

It would be nice if there was a way to specify which gateway server is the primary gateway for a specific fileset.

Regards, Sandra

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada
Sent: 10 April 2017 11:56
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM gateways

It is not recommended to make NSD servers as gateway nodes for native GPFS protocol. Unresponsive remote cluster mount might cause gateway node to hang on synchronous operations (ex. Lookup, Read, Open etc..), this will affect NSD server functionality. More information is documented @

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1ins_NFSvsGPFSAFM.htm

~Venkat (vpuvvada at in.ibm.com<mailto:vpuvvada at in.ibm.com>)


From:        Matt Weil <mweil at wustl.edu<mailto:mweil at wustl.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/07/2017 08:28 PM
Subject:        [gpfsug-discuss] AFM gateways
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hello,

any reason to not enable all NSD servers as gateway when using native
gpfs AFM?  Will they all pass traffic?

Thanks

Matt


________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://clicktime.symantec.com/a/1/8aBOwOn_FKxazPLofFX3cA0dDfWXyb5axOe5dISv0MQ=?d=j_6XzU1IV9LQ-e532TEWOiD1RS4MpcmKAZyY2sOb5ZFKeKraYKzPKVQ4DFQQvcoLaFlXYvSpJrBNPXgTLo9lvlh_-tLXZz6eK3RvlxqOvMh7u61FPNKvX3imIyz4oKgEui5fq5PAZtfg30umWRmQiMC4IXcZP4tBPCofBgPeN1QnVLFzY9StBzIWmH1VXwf9-MBET1k5ltix5IHkz4Q7JkSDZjHbVjiq_zycxFUQ4u92eEKq-4RzX8jPpKXVMjL-HQPjf2Y8oGNZhDBbiAFrwI-vUSs9VLha4rRcdOn4k0gN4IrFgvtlJY6r65vqwyrYb50stU_BmJgIX94nHlh1AYUa-bDwHNn7aPX-MOHCgHM86sfnDwm_hzuia4YUXmzxcbU4NR00eyHApIKhXXAgxBfua-s25zX32om-i8jWCKpl_AOfX7CZqMuvSp48hzs%3D&u=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss>


________________________________

AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170410/16f41dce/attachment.htm>

From Christian.Fey at sva.de  Mon Apr 10 17:04:31 2017
From: Christian.Fey at sva.de (Fey, Christian)
Date: Mon, 10 Apr 2017 16:04:31 +0000
Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes
In-Reply-To: <E2BAD67C-4357-469C-9481-F544DA8BA254@colorado.edu>
References: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58BB822@EX13.adf.bham.ac.uk>
	<OF64263F76.A69998A4-ON002580B9.007D301E-1485902843446@notes.na.collabserv.com>
	<OF64A94EA0.D169D1BA-ONC12580EC.004F1AAE-C12580EC.004F2910@notes.na.collabserv.com>
	<E2BAD67C-4357-469C-9481-F544DA8BA254@colorado.edu>
Message-ID: <455e54150cd04cd8808619acbf7d8d2b@sva.de>

Hi,

I'm just dealing with a maybe similar issue that also seems to be related to the output of "tsctl shownodes up" (before CES i actually never had to do with this command).

In my case the output of a "mmlscluster" for example shows the nodes like "node1.acme.local" but in " tsctl shownodes up" they are displayed as "node1.acme.local.acme.local" for example.

This maybe causes a fresh CES implementation in a existing GPFS cluster to also not spread ip-adresses. It instead loops in the same way as it did in your case @jonathon. I think it tries to search for "node1.acme.local" but doesn't find it since tsctl shows it with doubled suffix.

Can anyone explain, from where the "tsctl shownodes up" reads the data? Additionally does anyone have an idea why the dns suffix is doubled?


Kind regards
Christian

-----Urspr?ngliche Nachricht-----
Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Jonathon A Anderson
Gesendet: Donnerstag, 23. M?rz 2017 16:02
An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Betreff: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes

Achtung! Die Absender-Adresse ist m?glicherweise gef?lscht. Bitte ?berpr?fen Sie die Plausibilit?t der Email und lassen bei enthaltenen Anh?ngen und Links besondere Vorsicht walten.
Wenden Sie sich im Zweifelsfall an das CIT unter cit at sva.de oder 06122 536 350.
(Stichwort: DKIM Test Fehlgeschlagen)

----------------------------------------------------------------------------------------------------------------

Thanks! I?m looking forward to upgrading our CES nodes and resuming work on the project.

~jonathon


On 3/23/17, 8:24 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser" <gpfsug-discuss-bounces at spectrumscale.org on behalf of olaf.weiser at de.ibm.com> wrote:

    the issue is fixed, 
    an APAR will be released soon - IV93100
    
    
    From:        Olaf Weiser/Germany/IBM at IBMDE
    To:        "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
    Cc:        "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
    Date:        01/31/2017 11:47 PM
    Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    Sent by:        gpfsug-discuss-bounces at spectrumscale.org
    ________________________________________
    
    
    Yeah... depending on the #nodes you 're affected or not. .....
    So if your remote ces  cluster is small enough in terms of the #nodes ... you'll neuer hit into this issue  
    
    Gesendet von IBM Verse
    
    Simon Thompson (Research Computing - IT Services) --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ---
    
    Von:"Simon Thompson (Research Computing - IT Services)" <S.J.Thompson at bham.ac.uk>An:"gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>Datum:Di. 31.01.2017 21:07Betreff:Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    ________________________________________
    
    We use multicluster for our environment, storage systems in a separate cluster to hpc nodes on a separate cluster from protocol nodes.
    
    According to the docs, this isn't supported, but we haven't seen any issues. Note unsupported as opposed to broken.
    
    Simon
    ________________________________________
    From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu]
    Sent: 31 January 2017 17:47
    To: gpfsug main discussion list
    Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    
    Yeah, I searched around for places where ` tsctl shownodes up` appears in the GPFS code I have access to (i.e., the ksh and python stuff); but it?s only in CES. I suspect there just haven?t been that many people exporting CES out of an HPC cluster environment.
    
    ~jonathon
    
    
    From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
    Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Date: Tuesday, January 31, 2017 at 10:45 AM
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    
    I ll open a pmr here for my env ... the issue may hurt you in a ces env. only... but needs to be fixed in core gpfs.base  i thi k
    
    Gesendet von IBM Verse
    Jonathon A Anderson --- Re: [gpfsug-discuss] CES doesn't assign addresses to nodes ---
    
    Von:
    
    "Jonathon A Anderson" <jonathon.anderson at colorado.edu>
    
    An:
    
    "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
    
    Datum:
    
    Di. 31.01.2017 17:32
    
    Betreff:
    
    Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    
    ________________________________
    
    No, I?m having trouble getting this through DDN support because, while we have a GPFS server license and GRIDScaler support, apparently we don?t have ?protocol node? support, so they?ve pushed back on supporting this as an overall CES-rooted effort.
    
    I do have a DDN case open, though: 78804. If you are (as I suspect) a GPFS developer, do you mind if I cite your info from here in my DDN case to get them to open a PMR?
    
    Thanks.
    
    ~jonathon
    
    
    From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
    Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Date: Tuesday, January 31, 2017 at 8:42 AM
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    
    ok.. so obviously ... it seems , that we have several issues..
    the 3983 characters is obviously a defect
    have you already raised a PMR , if so , can you send me the number ?
    
    
    From:        Jonathon A Anderson <jonathon.anderson at colorado.edu>
    To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Date:        01/31/2017 04:14 PM
    Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    Sent by:        gpfsug-discuss-bounces at spectrumscale.org
    ________________________________
    
    
    The tail isn?t the issue; that? my addition, so that I didn?t have to paste the hundred or so line nodelist into the thread.
    
    The actual command is
    
    tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile
    
    But you can see in my tailed output that the last hostname listed is cut-off halfway through the hostname. Less obvious in the example, but true, is the fact that it?s only showing the first 120 hosts, when we have 403 nodes in our gpfs cluster.
    
    [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | wc -l
    120
    
    [root at sgate2 ~]# mmlscluster | grep '\-opa' | wc -l
    403
    
    Perhaps more explicitly, it looks like `tsctl shownodes up` can only transmit 3983 characters.
    
    [root at sgate2 ~]# tsctl shownodes up | wc -c
    3983
    
    Again, I?m convinced this is a bug not only because the command doesn?t actually produce a list of all of the up nodes in our cluster; but because the last name listed is incomplete.
    
    [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail -n 1
    shas0260-opa.rc.int.col[root at sgate2 ~]#
    
    I?d continue my investigation within tsctl itself but, alas, it?s a binary with no source code available to me. :)
    
    I?m trying to get this opened as a bug / PMR; but I?m still working through the DDN support infrastructure. Thanks for reporting it, though.
    
    For the record:
    
    [root at sgate2 ~]# rpm -qa | grep -i gpfs
    gpfs.base-4.2.1-2.x86_64
    gpfs.msg.en_US-4.2.1-2.noarch
    gpfs.gplbin-3.10.0-327.el7.x86_64-4.2.1-0.x86_64
    gpfs.gskit-8.0.50-57.x86_64
    gpfs.gpl-4.2.1-2.noarch
    nfs-ganesha-gpfs-2.3.2-0.ibm24.el7.x86_64
    gpfs.ext-4.2.1-2.x86_64
    gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.1-2.x86_64
    gpfs.docs-4.2.1-2.noarch
    
    ~jonathon
    
    
    From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
    Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Date: Tuesday, January 31, 2017 at 1:30 AM
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    
    Hi ...same thing here.. everything after 10 nodes will be truncated..
    though I don't have an issue with it ... I 'll open a PMR .. and I recommend you to do the same thing.. ;-)
    
    the reason seems simple.. it is the "| tail" .at the end of the command.. .. which truncates the output to the last 10 items...
    
    should be easy to fix..
    cheers
    olaf
    
    
    From:        Jonathon A Anderson <jonathon.anderson at colorado.edu>
    To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
    Date:        01/30/2017 11:11 PM
    Subject:        Re: [gpfsug-discuss] CES doesn't assign addresses to nodes
    Sent by:        gpfsug-discuss-bounces at spectrumscale.org
    ________________________________
    
    
    In trying to figure this out on my own, I?m relatively certain I?ve found a bug in GPFS related to the truncation of output from `tsctl shownodes up`. Any chance someone in development can confirm?
    
    
    Here are the details of my investigation:
    
    
    ## GPFS is up on sgate2
    
    [root at sgate2 ~]# mmgetstate
    
    Node number  Node name        GPFS state
    ------------------------------------------
      414      sgate2-opa       active
    
    
    ## but if I tell ces to explicitly put one of our ces addresses on that node, it says that GPFS is down
    
    [root at sgate2 ~]# mmces address move --ces-ip 10.225.71.102 --ces-node sgate2-opa
    mmces address move: GPFS is down on this node.
    mmces address move: Command failed. Examine previous error messages to determine cause.
    
    
    ## the ?GPFS is down on this node? message is defined as code 109 in mmglobfuncs
    
    [root at sgate2 ~]# grep --before-context=1 "GPFS is down on this node." /usr/lpp/mmfs/bin/mmglobfuncs
     109 ) msgTxt=\
    "%s: GPFS is down on this node."
    
    
    ## and is generated by printErrorMsg in mmcesnetmvaddress when it detects that the current node is identified as ?down? by getDownCesNodeList
    
    [root at sgate2 ~]# grep --before-context=5 'printErrorMsg 109' /usr/lpp/mmfs/bin/mmcesnetmvaddress
    downNodeList=$(getDownCesNodeList)
    for downNode in $downNodeList
    do
     if [[ $toNodeName == $downNode ]]
     then
       printErrorMsg 109 "$mmcmd"
    
    
    ## getDownCesNodeList is the intersection of all ces nodes with GPFS cluster nodes listed in `tsctl shownodes up`
    
    [root at sgate2 ~]# grep --after-context=16 '^function getDownCesNodeList' /usr/lpp/mmfs/bin/mmcesfuncs
    function getDownCesNodeList
    {
    typeset sourceFile="mmcesfuncs.sh"
    [[ -n $DEBUG || -n $DEBUGgetDownCesNodeList ]] &&set -x
    $mmTRACE_ENTER "$*"
    
    typeset upnodefile=${cmdTmpDir}upnodefile
    typeset downNodeList
    
    # get all CES nodes
    $sort -o $nodefile $mmfsCesNodes.dae
    
    $tsctl shownodes up | $tr ',' '\n' | $sort -o $upnodefile
    
    downNodeList=$($comm -23 $nodefile $upnodefile)
    print -- $downNodeList
    }  #----- end of function getDownCesNodeList --------------------
    
    
    ## but not only are the sgate nodes not listed by `tsctl shownodes up`; its output is obviously and erroneously truncated
    
    [root at sgate2 ~]# tsctl shownodes up | tr ',' '\n' | tail
    shas0251-opa.rc.int.colorado.edu
    shas0252-opa.rc.int.colorado.edu
    shas0253-opa.rc.int.colorado.edu
    shas0254-opa.rc.int.colorado.edu
    shas0255-opa.rc.int.colorado.edu
    shas0256-opa.rc.int.colorado.edu
    shas0257-opa.rc.int.colorado.edu
    shas0258-opa.rc.int.colorado.edu
    shas0259-opa.rc.int.colorado.edu
    shas0260-opa.rc.int.col[root at sgate2 ~]#
    
    
    ## I expect that this is a bug in GPFS, likely related to a maximum output buffer for `tsctl shownodes up`.
    
    
    On 1/24/17, 12:48 PM, "Jonathon A Anderson" <jonathon.anderson at colorado.edu> wrote:
    
     I think I'm having the same issue described here:
    
     http://www.spectrumscale.org/pipermail/gpfsug-discuss/2016-October/002288.html
    
     Any advice or further troubleshooting steps would be much appreciated. Full disclosure: I also have a DDN case open. (78804)
    
     We've got a four-node (snsd{1..4}) DDN gridscaler system. I'm trying to add two CES protocol nodes (sgate{1,2}) to serve NFS.
    
     Here's the steps I took:
    
     ---
     mmcrnodeclass protocol -N sgate1-opa,sgate2-opa
     mmcrnodeclass nfs -N sgate1-opa,sgate2-opa
     mmchconfig cesSharedRoot=/gpfs/summit/ces
     mmchcluster --ccr-enable
     mmchnode --ces-enable -N protocol
     mmces service enable NFS
     mmces service start NFS -N nfs
     mmces address add --ces-ip 10.225.71.104,10.225.71.105
     mmces address policy even-coverage
     mmces address move --rebalance
     ---
    
     This worked the very first time I ran it, but the CES addresses weren't re-distributed after restarting GPFS or a node reboot.
    
     Things I've tried:
    
     * disabling ces on the sgate nodes and re-running the above procedure
     * moving the cluster and filesystem managers to different snsd nodes
     * deleting and re-creating the cesSharedRoot directory
    
     Meanwhile, the following log entry appears in mmfs.log.latest every ~30s:
    
     ---
     Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.104
     Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Found unassigned address 10.225.71.105
     Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: handleNetworkProblem with lock held: assignIP 10.225.71.104_0-_+,10.225.71.105_0-_+ 1
     Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: Assigning addresses: 10.225.71.104_0-_+,10.225.71.105_0-_+
     Mon Jan 23 20:31:20 MST 2017: mmcesnetworkmonitor: moveCesIPs: 10.225.71.104_0-_+,10.225.71.105_0-_+
     ---
    
     Also notable, whenever I add or remove addresses now, I see this in mmsysmonitor.log (among a lot of other entries):
    
     ---
     2017-01-23T20:40:56.363 sgate1 D ET_cesnetwork Entity state without requireUnique: ces_network_ips_down WARNING No CES relevant NICs detected - Service.calculateAndUpdateState:275
     2017-01-23T20:40:11.364 sgate1 D ET_cesnetwork Update multiple entities at once {'p2p2': 1, 'bond0': 1, 'p2p1': 1} - Service.setLocalState:333
     ---
    
     For the record, here's the interface I expect to get the address on sgate1:
    
     ---
     11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP
     link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
     inet 10.225.71.107/20 brd 10.225.79.255 scope global bond0
     valid_lft forever preferred_lft forever
     inet6 fe80::3efd:feff:fe08:a7c0/64 scope link
     valid_lft forever preferred_lft forever
     ---
    
     which is a bond of p2p1 and p2p2.
    
     ---
     6: p2p1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
     link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
     7: p2p2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
     link/ether 3c:fd:fe:08:a7:c0 brd ff:ff:ff:ff:ff:ff
     ---
    
     A similar bond0 exists on sgate2.
    
     I crawled around in /usr/lpp/mmfs/lib/mmsysmon/CESNetworkService.py for a while trying to figure it out, but have been unsuccessful so far.
    
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5467 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170410/d7b024a0/attachment.bin>

From service at metamodul.com  Mon Apr 10 17:47:41 2017
From: service at metamodul.com (Hans-Joachim Ehlers)
Date: Mon, 10 Apr 2017 18:47:41 +0200 (CEST)
Subject: [gpfsug-discuss] GPFS Network Configuration - 1 Daemon Network ,
	1 Admin Network
Message-ID: <788130355.197989.1491842861235@email.1und1.de>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170410/718decfb/attachment.htm>

From eric.wonderley at vt.edu  Mon Apr 10 17:58:36 2017
From: eric.wonderley at vt.edu (J. Eric Wonderley)
Date: Mon, 10 Apr 2017 12:58:36 -0400
Subject: [gpfsug-discuss] GPFS Network Configuration - 1 Daemon Network ,
 1 Admin Network
In-Reply-To: <788130355.197989.1491842861235@email.1und1.de>
References: <788130355.197989.1491842861235@email.1und1.de>
Message-ID: <CABOSGQe82-bQJ64hSgFZp8Wg-ixs=haXZRG2cCqXj-3D8FuSDg@mail.gmail.com>

1)  You want more that one quorum node on your server cluster.  The
non-quorum node does need a daemon network interface exposed to the client
cluster as does the quorum nodes.

2)  No.  Admin network is for intra cluster communications...not inter
cluster(between clusters).  Daemon interface(port 1191) is used for
communications between clusters.  I think there is little benefit gained by
having designated an admin network...maybe someone can point out benefits
of an admin network.


Eric Wonderley

On Mon, Apr 10, 2017 at 12:47 PM, Hans-Joachim Ehlers <service at metamodul.com
> wrote:

> My understanding of the GPFS networks is not quite clear.
>
> For an GPFS setup i would like to use 2 Networks
>
> 1 Daemon (data)  network using port 1191 using for example. 10.1.1.0/24
>
> 2 Admin Network using for example: 192.168.1.0/24 network
>
> Questions
>
> 1) Thus in a 2+1 Cluster ( 2 GPFS Server + 1 Quorum Server ) Config -
> Does the Tiebreaker Node needs to have access to the daemon(data) 10.1.1.
> network or is it sufficient for the tiebreaker node to be configured as
> part of the admin 192.168.1 network ?
>
> 2) Does a remote cluster needs access to the GPFS Admin 192.168.1
> network or is it sufficient for the remote cluster to access the 10.1.1
> network ? If so i assume that remotecluster commands and ping to/from
> remote cluster are going via the Daemon network ?
>
> Note:
>
> I am aware and read https://www.ibm.com/developerworks/community/
> wikis/home?lang=en#!/wiki/General%20Parallel%20File%
> 20System%20(GPFS)/page/GPFS%20Network%20Communication%20Overview
>
> --
> Unix Systems Engineer
> --------------------------------------------------
> MetaModul GmbH
> S?derstr. 12
> 25336 Elmshorn
> HRB: 11873 PI
> UstID: DE213701983
> Mobil: + 49 177 4393994 <+49%20177%204393994>
> Mail: service at metamodul.com
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170410/c4acdc68/attachment.htm>

From laurence at qsplace.co.uk  Mon Apr 10 18:13:08 2017
From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow)
Date: Mon, 10 Apr 2017 18:13:08 +0100
Subject: [gpfsug-discuss] GPFS Network Configuration - 1 Daemon Network ,
 1 Admin Network
In-Reply-To: <CABOSGQe82-bQJ64hSgFZp8Wg-ixs=haXZRG2cCqXj-3D8FuSDg@mail.gmail.com>
References: <788130355.197989.1491842861235@email.1und1.de>
	<CABOSGQe82-bQJ64hSgFZp8Wg-ixs=haXZRG2cCqXj-3D8FuSDg@mail.gmail.com>
Message-ID: <3a8f72c6-407a-0f4d-cf3c-f4698ca7b8e5@qsplace.co.uk>

All nodes in a GPFS cluster need to be able to communicate over the data 
and admin network with the exception of remote clusters which can have 
their own separate admin network (for their own cluster that they are a 
member of) but still require communications over the daemon network.

The networks can be routed and on different subnets, however the each 
member of the cluster will need to be able to communicate with every 
other member.

With this in mind:

1) The quorum node will need to be accessible on both the 10.1.1.0/24 
and 192.168.1.0/24 however again the network that the quorum node is on 
could be routed.
2) Remote clusters don't need access to the home clusters admin network, 
as they will use their own clusters admin network.

As Eric has mentioned I would double check your 2+1 cluster suggestion, 
do you mean 2 x Servers with NSD's (with a quorum role) and 1 quorum 
node without NSD's? which gives you 3 quorum, or are you only going to 
have 1 quorum?

If the latter that I would suggest using all 3 servers for quorum as 
they should be licensed as GPFS servers anyway due to their roles.

-- Lauz


On 10/04/2017 17:58, J. Eric Wonderley wrote:
> 1)  You want more that one quorum node on your server cluster.  The 
> non-quorum node does need a daemon network interface exposed to the 
> client cluster as does the quorum nodes.
>
> 2)  No.  Admin network is for intra cluster communications...not inter 
> cluster(between clusters).  Daemon interface(port 1191) is used for 
> communications between clusters.  I think there is little benefit 
> gained by having designated an admin network...maybe someone can point 
> out benefits of an admin network.
>
>
>
> Eric Wonderley
>
> On Mon, Apr 10, 2017 at 12:47 PM, Hans-Joachim Ehlers 
> <service at metamodul.com <mailto:service at metamodul.com>> wrote:
>
>     My understanding of the GPFS networks is not quite clear.
>
>     For an GPFS setup i would like to use 2 Networks
>
>     1 Daemon (data)  network using port 1191 using for example.
>     10.1.1.0/24 <http://10.1.1.0/24>
>
>     2 Admin Network using for example: 192.168.1.0/24
>     <http://192.168.1.0/24> network
>
>     Questions
>
>     1) Thus in a 2+1 Cluster ( 2 GPFS Server + 1 Quorum Server )
>     Config -  Does the Tiebreaker Node needs to have access to the
>     daemon(data) 10.1.1. network or is it sufficient for the
>     tiebreaker node to be configured as part of the admin 192.168.1
>     network ?
>
>     2) Does a remote cluster needs access to the GPFS Admin 192.168.1
>     network or is it sufficient for the remote cluster to access the
>     10.1.1 network ? If so i assume that remotecluster commands and
>     ping to/from remote cluster are going via the Daemon network ?
>
>     Note:
>
>     I am aware and read
>     https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/GPFS%20Network%20Communication%20Overview
>     <https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/GPFS%20Network%20Communication%20Overview>
>
>     -- 
>     Unix Systems Engineer
>     --------------------------------------------------
>     MetaModul GmbH
>     S?derstr. 12
>     25336 Elmshorn
>     HRB: 11873 PI
>     UstID: DE213701983
>     Mobil: + 49 177 4393994 <tel:+49%20177%204393994>
>     Mail: service at metamodul.com <mailto:service at metamodul.com>
>
>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170410/993178b8/attachment.htm>

From S.J.Thompson at bham.ac.uk  Mon Apr 10 18:26:42 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Mon, 10 Apr 2017 17:26:42 +0000
Subject: [gpfsug-discuss] GPFS Network Configuration - 1 Daemon Network ,
 1 Admin Network
In-Reply-To: <CABOSGQe82-bQJ64hSgFZp8Wg-ixs=haXZRG2cCqXj-3D8FuSDg@mail.gmail.com>
References: <788130355.197989.1491842861235@email.1und1.de>,
	<CABOSGQe82-bQJ64hSgFZp8Wg-ixs=haXZRG2cCqXj-3D8FuSDg@mail.gmail.com>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DADF5@EX13.adf.bham.ac.uk>

If you have network congestion, then a separate admin network is of benefit. Maybe less important if you have 10GbE networks, but if (for example), you normally rely on IB to talk data, and gpfs fails back to the Ethernet (which may be only 1GbE), then you may have cluster issues, for example missing gpfs pings.

Having a separate physical admin network can protect you from this.

Having been bitten by this several years back, it's a good idea IMHO to have a separate admin network.

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of J. Eric Wonderley [eric.wonderley at vt.edu]
Sent: 10 April 2017 17:58
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS Network Configuration - 1 Daemon Network , 1 Admin Network

1)  You want more that one quorum node on your server cluster.  The non-quorum node does need a daemon network interface exposed to the client cluster as does the quorum nodes.

2)  No.  Admin network is for intra cluster communications...not inter cluster(between clusters).  Daemon interface(port 1191) is used for communications between clusters.  I think there is little benefit gained by having designated an admin network...maybe someone can point out benefits of an admin network.


Eric Wonderley

On Mon, Apr 10, 2017 at 12:47 PM, Hans-Joachim Ehlers <service at metamodul.com<mailto:service at metamodul.com>> wrote:

My understanding of the GPFS networks is not quite clear.

For an GPFS setup i would like to use 2 Networks

1 Daemon (data)  network using port 1191 using for example. 10.1.1.0/24<http://10.1.1.0/24>

2 Admin Network using for example: 192.168.1.0/24<http://192.168.1.0/24> network

Questions

1) Thus in a 2+1 Cluster ( 2 GPFS Server + 1 Quorum Server ) Config -  Does the Tiebreaker Node needs to have access to the daemon(data) 10.1.1. network or is it sufficient for the tiebreaker node to be configured as part of the admin 192.168.1 network ?

2) Does a remote cluster needs access to the GPFS Admin 192.168.1 network or is it sufficient for the remote cluster to access the 10.1.1 network ? If so i assume that remotecluster commands and ping to/from remote cluster are going via the Daemon network ?

Note:

I am aware and read https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/GPFS%20Network%20Communication%20Overview

--
Unix Systems Engineer
--------------------------------------------------
MetaModul GmbH
S?derstr. 12
25336 Elmshorn
HRB: 11873 PI
UstID: DE213701983
Mobil: + 49 177 4393994<tel:+49%20177%204393994>
Mail: service at metamodul.com<mailto:service at metamodul.com>

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From service at metamodul.com  Mon Apr 10 18:44:47 2017
From: service at metamodul.com (Hans-Joachim Ehlers)
Date: Mon, 10 Apr 2017 19:44:47 +0200 (CEST)
Subject: [gpfsug-discuss] GPFS Network Configuration - 1 Daemon Network ,
 1 Admin Network
In-Reply-To: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DADF5@EX13.adf.bham.ac.uk>
References: <788130355.197989.1491842861235@email.1und1.de>,
	<CABOSGQe82-bQJ64hSgFZp8Wg-ixs=haXZRG2cCqXj-3D8FuSDg@mail.gmail.com>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DADF5@EX13.adf.bham.ac.uk>
Message-ID: <795203366.199195.1491846287405@email.1und1.de>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170410/9f136553/attachment.htm>

From luis.bolinches at fi.ibm.com  Mon Apr 10 19:02:30 2017
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Mon, 10 Apr 2017 21:02:30 +0300
Subject: [gpfsug-discuss] GPFS Network Configuration - 1 Daemon Network ,
 1 Admin Network
In-Reply-To: <795203366.199195.1491846287405@email.1und1.de>
References: <788130355.197989.1491842861235@email.1und1.de>,
	<CABOSGQe82-bQJ64hSgFZp8Wg-ixs=haXZRG2cCqXj-3D8FuSDg@mail.gmail.com>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DADF5@EX13.adf.bham.ac.uk>
	<795203366.199195.1491846287405@email.1und1.de>
Message-ID: <OF83518BC1.DA89B083-ONC22580FE.0062FAA0-C22580FE.00631B77@notes.na.collabserv.com>

Hi

Out of curiosity. Are you using Failure groups and doing replication of 
data/metadata too?

If you you do need to deal with the file system descriptors as well on the 
3rd node.

Thanks 


From:   Hans-Joachim Ehlers <service at metamodul.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   10/04/2017 20:44
Subject:        Re: [gpfsug-discuss] GPFS Network Configuration - 1 Daemon 
Network , 1 Admin Network
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Sorry for not being clear. The setup is of course a 3 Node Cluster where 
each node is a quorum node - 2 NSD Server and 1 TieBreaker/Quorum Buster 
node.
For me it was not clear if the Tiebreaker/Quorum Buster node - which does 
nothing in terms of data serving - must be part of the daemon/data network 
or not.
So i get the understanding that a Tiebreaker Node must be also part of the 
Daemon network.
Thx a lot to all
Hajo


"Simon Thompson (IT Research Support)" <S.J.Thompson at bham.ac.uk> hat am 
10. April 2017 um 19:26 geschrieben:
If you have network congestion, then a separate admin network is of 
benefit. Maybe less important if you have 10GbE networks, but if (for 
example), you normally rely on IB to talk data, and gpfs fails back to the 
Ethernet (which may be only 1GbE), then you may have cluster issues, for 
example missing gpfs pings.
Having a separate physical admin network can protect you from this.
Having been bitten by this several years back, it's a good idea IMHO to 
have a separate admin network.
Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org 
[gpfsug-discuss-bounces at spectrumscale.org] on behalf of J. Eric Wonderley 
[eric.wonderley at vt.edu]
Sent: 10 April 2017 17:58
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS Network Configuration - 1 Daemon 
Network , 1 Admin Network
1) You want more that one quorum node on your server cluster. The 
non-quorum node does need a daemon network interface exposed to the client 
cluster as does the quorum nodes.
2) No. Admin network is for intra cluster communications...not inter 
cluster(between clusters). Daemon interface(port 1191) is used for 
communications between clusters. I think there is little benefit gained by 
having designated an admin network...maybe someone can point out benefits 
of an admin network.
Eric Wonderley
On Mon, Apr 10, 2017 at 12:47 PM, Hans-Joachim Ehlers 
<service at metamodul.com<mailto:service at metamodul.com>> wrote:
My understanding of the GPFS networks is not quite clear.
For an GPFS setup i would like to use 2 Networks
1 Daemon (data) network using port 1191 using for example. 10.1.1.0/24<
http://10.1.1.0/24>
2 Admin Network using for example: 192.168.1.0/24<http://192.168.1.0/24> 
network
Questions
1) Thus in a 2+1 Cluster ( 2 GPFS Server + 1 Quorum Server ) Config - Does 
the Tiebreaker Node needs to have access to the daemon(data) 10.1.1. 
network or is it sufficient for the tiebreaker node to be configured as 
part of the admin 192.168.1 network ?
2) Does a remote cluster needs access to the GPFS Admin 192.168.1 network 
or is it sufficient for the remote cluster to access the 10.1.1 network ? 
If so i assume that remotecluster commands and ping to/from remote cluster 
are going via the Daemon network ?
Note:
I am aware and read 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/GPFS%20Network%20Communication%20Overview
--
Unix Systems Engineer
--------------------------------------------------
MetaModul GmbH
S?derstr. 12
25336 Elmshorn
HRB: 11873 PI
UstID: DE213701983
Mobil: + 49 177 4393994<tel:+49%20177%204393994>
Mail: service at metamodul.com<mailto:service at metamodul.com>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170410/35f38176/attachment.htm>

From mweil at wustl.edu  Mon Apr 10 21:15:38 2017
From: mweil at wustl.edu (Matt Weil)
Date: Mon, 10 Apr 2017 15:15:38 -0500
Subject: [gpfsug-discuss] AFM gateways
In-Reply-To: <DB5PR04MB1463EFFEFCEC10D3BE417C4BE1010@DB5PR04MB1463.eurprd04.prod.outlook.com>
References: <f49e9fe5-5b49-6e71-ac4f-83f1673b6a82@wustl.edu>
	<OFE66BE3A0.DCC08083-ON652580FE.003B060D-652580FE.003C151E@notes.na.collabserv.com>
	<DB5PR04MB1463EFFEFCEC10D3BE417C4BE1010@DB5PR04MB1463.eurprd04.prod.outlook.com>
Message-ID: <524d253e-b825-4e6a-7cbf-884af394ddc5@wustl.edu>

Thanks for the answers..  For fail over I believe we will want to keep it separate then.   Next question. Is it licensed as a client or a server?

On 4/10/17 6:20 AM, McLaughlin, Sandra M wrote:
Hi,

I agree with Venkat.

I did exactly what you said below, enabled my NSD servers as gateways to get additional throughput (with both native gpfs protocol and NFS protocol), which worked well; we definitely got the increased traffic. However, I wouldn?t do it again through choice. As Venkat says, if there is a problem with the remote cluster, that can affect any of the gateway nodes (if using gpfs protocol), but also, we had a problem with one of the gateway nodes, where it kept crashing (which is now resolved) and then all filesets for which that node was the gateway had to failover to other gateway servers and this really messes everything up while the failover is taking place. I am also, stupidly, serving NFS and samba from the NSD servers (via ctdb) which I also, would not do again !

It would be nice if there was a way to specify which gateway server is the primary gateway for a specific fileset.

Regards, Sandra

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada
Sent: 10 April 2017 11:56
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org><mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM gateways

It is not recommended to make NSD servers as gateway nodes for native GPFS protocol. Unresponsive remote cluster mount might cause gateway node to hang on synchronous operations (ex. Lookup, Read, Open etc..), this will affect NSD server functionality. More information is documented @

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1ins_NFSvsGPFSAFM.htm

~Venkat (vpuvvada at in.ibm.com<mailto:vpuvvada at in.ibm.com>)


From:        Matt Weil <mweil at wustl.edu<mailto:mweil at wustl.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/07/2017 08:28 PM
Subject:        [gpfsug-discuss] AFM gateways
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hello,

any reason to not enable all NSD servers as gateway when using native
gpfs AFM?  Will they all pass traffic?

Thanks

Matt


________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://clicktime.symantec.com/a/1/8aBOwOn_FKxazPLofFX3cA0dDfWXyb5axOe5dISv0MQ=?d=j_6XzU1IV9LQ-e532TEWOiD1RS4MpcmKAZyY2sOb5ZFKeKraYKzPKVQ4DFQQvcoLaFlXYvSpJrBNPXgTLo9lvlh_-tLXZz6eK3RvlxqOvMh7u61FPNKvX3imIyz4oKgEui5fq5PAZtfg30umWRmQiMC4IXcZP4tBPCofBgPeN1QnVLFzY9StBzIWmH1VXwf9-MBET1k5ltix5IHkz4Q7JkSDZjHbVjiq_zycxFUQ4u92eEKq-4RzX8jPpKXVMjL-HQPjf2Y8oGNZhDBbiAFrwI-vUSs9VLha4rRcdOn4k0gN4IrFgvtlJY6r65vqwyrYb50stU_BmJgIX94nHlh1AYUa-bDwHNn7aPX-MOHCgHM86sfnDwm_hzuia4YUXmzxcbU4NR00eyHApIKhXXAgxBfua-s25zX32om-i8jWCKpl_AOfX7CZqMuvSp48hzs%3D&u=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss>


________________________________

AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170410/a97a1831/attachment.htm>

From mitsugi at linux.vnet.ibm.com  Tue Apr 11 05:29:16 2017
From: mitsugi at linux.vnet.ibm.com (Masanori Mitsugi)
Date: Tue, 11 Apr 2017 13:29:16 +0900
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
Message-ID: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com>

Hello,

Does anyone have experience to do mmapplypolicy against billion files  
for ILM/HSM?

Currently I'm planning/designing

* 1 Scale filesystem (5-10 PB)
* 10-20 filesets which includes 1 billion files each

And our biggest concern is "How log does it take for mmapplypolicy  
policy scan against billion files?"

I know it depends on how to write the policy,
but I don't have no billion files policy scan experience,
so I'd like to know the order of time (min/hour/day...).

It would be helpful if anyone has experience of such large number of  
files scan and let me know any considerations or points for policy design.

-- 
Masanori Mitsugi
mitsugi at linux.vnet.ibm.com


From zgiles at gmail.com  Tue Apr 11 05:49:10 2017
From: zgiles at gmail.com (Zachary Giles)
Date: Tue, 11 Apr 2017 00:49:10 -0400
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
In-Reply-To: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com>
References: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com>
Message-ID: <CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>

It's definitely doable, and these days not too hard. Flash for
metadata is the key.
The basics of it are:
* Latest GPFS for performance benefits.
* A few 10's of TBs of flash ( or more ! ) setup in a good design..
lots of SAS, well balanced RAID that can consume the flash fully,
tuned for IOPs, and available in parallel from multiple servers.
* Tune up mmapplypolicy with -g somewhere-on-gpfs; --choice-algorithm
fast; -a, -m and -n to reasonable values ( number of cores on the
servers ); -A to ~1000
* Test first on a smaller fileset to confirm you like it. -I test
should work well and be around the same speed minus the migration
phase.
* Then throw ~8 well tuned Infiniband attached nodes at it using -N,
If they're the same as the NSD servers serving the flash, even better.

Should be able to do 1B in 5-30m depending on the idiosyncrasies of
above choices. Even 60m isn't bad and quite respectable if less gear
is used or if they system is busy while the policy is running.
Parallel metadata, it's a beautiful thing.


On Tue, Apr 11, 2017 at 12:29 AM, Masanori Mitsugi
<mitsugi at linux.vnet.ibm.com> wrote:
> Hello,
>
> Does anyone have experience to do mmapplypolicy against billion files for
> ILM/HSM?
>
> Currently I'm planning/designing
>
> * 1 Scale filesystem (5-10 PB)
> * 10-20 filesets which includes 1 billion files each
>
> And our biggest concern is "How log does it take for mmapplypolicy policy
> scan against billion files?"
>
> I know it depends on how to write the policy,
> but I don't have no billion files policy scan experience,
> so I'd like to know the order of time (min/hour/day...).
>
> It would be helpful if anyone has experience of such large number of files
> scan and let me know any considerations or points for policy design.
>
> --
> Masanori Mitsugi
> mitsugi at linux.vnet.ibm.com
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Zach Giles
zgiles at gmail.com


From olaf.weiser at de.ibm.com  Tue Apr 11 07:51:48 2017
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Tue, 11 Apr 2017 08:51:48 +0200
Subject: [gpfsug-discuss] CES doesn't assign addresses to nodes
In-Reply-To: <455e54150cd04cd8808619acbf7d8d2b@sva.de>
References: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58BB822@EX13.adf.bham.ac.uk><OF64263F76.A69998A4-ON002580B9.007D301E-1485902843446@notes.na.collabserv.com><OF64A94EA0.D169D1BA-ONC12580EC.004F1AAE-C12580EC.004F2910@notes.na.collabserv.com><E2BAD67C-4357-469C-9481-F544DA8BA254@colorado.edu>
	<455e54150cd04cd8808619acbf7d8d2b@sva.de>
Message-ID: <OF776D3AB8.5194B064-ONC12580FF.002585FD-C12580FF.0025B403@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170411/e8a1fe32/attachment.htm>

From ckrafft at de.ibm.com  Tue Apr 11 09:24:35 2017
From: ckrafft at de.ibm.com (Christoph Krafft)
Date: Tue, 11 Apr 2017 10:24:35 +0200
Subject: [gpfsug-discuss] Does SVC / Spectrum Virtualize support IBM
 Spectrum Scale with SCSI-3 Persistent Reservations?
Message-ID: <OF1A98F7F5.B325E1A4-ON002580FF.002DCEF1-C12580FF.002E323E@notes.na.collabserv.com>


Hi folks,

there is a list of storage devices that support SCSI-3 PR in the GPFS FAQ
Doc (see Answer 4.5).

https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html#scsi3

Since this list contains IBM V-model storage subsystems that include
Storage Virtualization - I was wondering if SVC / Spectrum Virtualize
can also support SCSI-3 PR (although not explicitly on the list)?

Any hints and help is warmla welcome - thank you in advance.


Mit freundlichen Gr??en / Sincerely

Christoph Krafft

Client Technical Specialist - Power Systems, IBM Systems
Certified IT Specialist @ The Open Group
                                                                                                              
                                                                                                              
 Phone:            +49 (0) 7034 643 2171                     IBM Deutschland GmbH                             
                                                                                                              
 Mobile:           +49 (0) 160 97 81 86 12                   Am Weiher 24                                     
                                                                                                              
 Email:            ckrafft at de.ibm.com                        65451 Kelsterbach                                
                                                                                                              
                                                             Germany                                          
                                                                                                              
                                                                                                              
 IBM Deutschland                                                                                              
 GmbH /                                                                                                       
 Vorsitzender des                                                                                             
 Aufsichtsrats:                                                                                               
 Martin Jetter                                                                                                
 Gesch?ftsf?hrung:                                                                                            
 Martina Koederitz                                                                                            
 (Vorsitzende),                                                                                               
 Nicole Reimer,                                                                                               
 Norbert Janzen,                                                                                              
 Dr. Christian                                                                                                
 Keller, Ivo                                                                                                  
 Koerner, Stefan                                                                                              
 Lutz                                                                                                         
 Sitz der                                                                                                     
 Gesellschaft:                                                                                                
 Ehningen /                                                                                                   
 Registergericht:                                                                                             
 Amtsgericht                                                                                                  
 Stuttgart, HRB                                                                                               
 14562 /                                                                                                      
 WEEE-Reg.-Nr. DE                                                                                             
 99369940                                                                                                     
                                                                                                              

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170411/0f342ea7/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170411/0f342ea7/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A788784.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170411/0f342ea7/attachment-0001.gif>

From p.childs at qmul.ac.uk  Tue Apr 11 09:57:44 2017
From: p.childs at qmul.ac.uk (Peter Childs)
Date: Tue, 11 Apr 2017 08:57:44 +0000
Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
Message-ID: <HE1PR0701MB25545C6B69266A1691A3A173A4000@HE1PR0701MB2554.eurprd07.prod.outlook.com>

This is a curious issue which I'm trying to get to the bottom of.

We currently have two Spectrum Scale file systems, both are running GPFS 4.2.1-1 some of the servers have been upgraded to 4.2.1-2.

The older one which was upgraded from GPFS 3.5 works find create a directory is always fast and no issue.

The new one, which has nice new SSD for metadata and hence should be faster. can take up to 30 seconds to create a directory but usually takes less than a second, The longer directory creates usually happen on busy nodes that have not used the new storage in a while. (Its new so we've not moved much of the data over yet) But it can also happen randomly anywhere, including from the NSD servers them selves. (times of 3-4 seconds from the NSD servers have been seen, on a single directory create)

We've been pointed at the network and suggested we check all network settings, and its been suggested to build an admin network, but I'm not sure I entirely understand why and how this would help. Its a mixed 1G/10G network with the NSD servers connected at 40G with an MTU of 9000. 

However as I say, the older filesystem is fine, and it does not matter if the nodes are connected to the old GPFS cluster or the new one, (although the delay is worst on the old gpfs cluster), So I'm really playing spot the difference. and the network is not really an obvious difference.

Its been suggested to look at a trace when it occurs but as its difficult to recreate collecting one is difficult. 

Any ideas would be most helpful.

Thanks


Peter Childs
ITS Research Infrastructure
Queen Mary, University of London


From jonathan at buzzard.me.uk  Tue Apr 11 11:21:05 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Tue, 11 Apr 2017 11:21:05 +0100
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
In-Reply-To: <CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>
References: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com>
	<CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>
Message-ID: <1491906065.4102.87.camel@buzzard.me.uk>

On Tue, 2017-04-11 at 00:49 -0400, Zachary Giles wrote:

[SNIP]

> * Then throw ~8 well tuned Infiniband attached nodes at it using -N,
> If they're the same as the NSD servers serving the flash, even better.
> 

Exactly how much are you going to gain from Infiniband over 40Gbps or
even 100Gbps Ethernet? Not a lot I would have thought. Even with flash
all your latency is going to be in the flash not the Ethernet.

Unless you have a compute cluster and need Infiniband for the MPI
traffic, it is surely better to stick to Ethernet. Infiniband is rather
esoteric, what I call a minority sport best avoided if at all possible.

Even if you have an Infiniband fabric, I would argue that give current
core counts and price points for 10Gbps Ethernet, that actually you are
better off keeping your storage traffic on the Ethernet, and reserving
the Infiniband for MPI duties. That is 10Gbps Ethernet to the compute
nodes and 40/100Gbps Ethernet on the storage nodes.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From zgiles at gmail.com  Tue Apr 11 12:50:26 2017
From: zgiles at gmail.com (Zachary Giles)
Date: Tue, 11 Apr 2017 07:50:26 -0400
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
In-Reply-To: <1491906065.4102.87.camel@buzzard.me.uk>
References: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com>
	<CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>
	<1491906065.4102.87.camel@buzzard.me.uk>
Message-ID: <CAMYZk=fv1k6Hkx+aWcU0uLMS__BrAKe-7f57+Y8bynO03o774g@mail.gmail.com>

Yeah, that can be true. I was just trying to show the size/shape that
can achieve this. There's a good chance 10G or 40G ethernet would
yield similar results, especially if you're running the policy on the
NSD servers.

On Tue, Apr 11, 2017 at 6:21 AM, Jonathan Buzzard
<jonathan at buzzard.me.uk> wrote:
> On Tue, 2017-04-11 at 00:49 -0400, Zachary Giles wrote:
>
> [SNIP]
>
>> * Then throw ~8 well tuned Infiniband attached nodes at it using -N,
>> If they're the same as the NSD servers serving the flash, even better.
>>
>
> Exactly how much are you going to gain from Infiniband over 40Gbps or
> even 100Gbps Ethernet? Not a lot I would have thought. Even with flash
> all your latency is going to be in the flash not the Ethernet.
>
> Unless you have a compute cluster and need Infiniband for the MPI
> traffic, it is surely better to stick to Ethernet. Infiniband is rather
> esoteric, what I call a minority sport best avoided if at all possible.
>
> Even if you have an Infiniband fabric, I would argue that give current
> core counts and price points for 10Gbps Ethernet, that actually you are
> better off keeping your storage traffic on the Ethernet, and reserving
> the Infiniband for MPI duties. That is 10Gbps Ethernet to the compute
> nodes and 40/100Gbps Ethernet on the storage nodes.
>
> JAB.
>
> --
> Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
> Fife, United Kingdom.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Zach Giles
zgiles at gmail.com


From stockf at us.ibm.com  Tue Apr 11 12:53:33 2017
From: stockf at us.ibm.com (Frederick Stock)
Date: Tue, 11 Apr 2017 07:53:33 -0400
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
In-Reply-To: <CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>
References: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com>
	<CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>
Message-ID: <OF6CE32902.78672ADE-ON852580FF.0040E922-852580FF.0041539E@notes.na.collabserv.com>

As Zachary noted the location of your metadata is the key and for the 
scanning you have planned flash is necessary.  If you have the resources 
you may consider setting up your flash in a mirrored RAID configuration 
(RAID1/RAID10) and have GPFS only keep one copy of metadata since the 
underlying storage is replicating it via the RAID.  This should improve 
metadata write performance but likely has little impact on your scanning, 
assuming you are just reading through the metadata.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com


From:   Zachary Giles <zgiles at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/11/2017 12:49 AM
Subject:        Re: [gpfsug-discuss] Policy scan against billion files for 
ILM/HSM
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


It's definitely doable, and these days not too hard. Flash for
metadata is the key.
The basics of it are:
* Latest GPFS for performance benefits.
* A few 10's of TBs of flash ( or more ! ) setup in a good design..
lots of SAS, well balanced RAID that can consume the flash fully,
tuned for IOPs, and available in parallel from multiple servers.
* Tune up mmapplypolicy with -g somewhere-on-gpfs; --choice-algorithm
fast; -a, -m and -n to reasonable values ( number of cores on the
servers ); -A to ~1000
* Test first on a smaller fileset to confirm you like it. -I test
should work well and be around the same speed minus the migration
phase.
* Then throw ~8 well tuned Infiniband attached nodes at it using -N,
If they're the same as the NSD servers serving the flash, even better.

Should be able to do 1B in 5-30m depending on the idiosyncrasies of
above choices. Even 60m isn't bad and quite respectable if less gear
is used or if they system is busy while the policy is running.
Parallel metadata, it's a beautiful thing.


On Tue, Apr 11, 2017 at 12:29 AM, Masanori Mitsugi
<mitsugi at linux.vnet.ibm.com> wrote:
> Hello,
>
> Does anyone have experience to do mmapplypolicy against billion files 
for
> ILM/HSM?
>
> Currently I'm planning/designing
>
> * 1 Scale filesystem (5-10 PB)
> * 10-20 filesets which includes 1 billion files each
>
> And our biggest concern is "How log does it take for mmapplypolicy 
policy
> scan against billion files?"
>
> I know it depends on how to write the policy,
> but I don't have no billion files policy scan experience,
> so I'd like to know the order of time (min/hour/day...).
>
> It would be helpful if anyone has experience of such large number of 
files
> scan and let me know any considerations or points for policy design.
>
> --
> Masanori Mitsugi
> mitsugi at linux.vnet.ibm.com
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Zach Giles
zgiles at gmail.com
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170411/68b8c708/attachment.htm>

From chair at spectrumscale.org  Tue Apr 11 16:18:01 2017
From: chair at spectrumscale.org (Spectrum Scale UG Chair (Simon Thompson))
Date: Tue, 11 Apr 2017 16:18:01 +0100
Subject: [gpfsug-discuss] May Meeting Registration
Message-ID: <D512B238.3A43B%chair@spectrumscale.org>


Hi all,

Just a reminder that the next UK user group meeting is taking place on
9th/10th May. If you are planning on attending, please do register at:

https://www.eventbrite.com/e/spectrum-scalegpfs-user-group-spring-2017-regi
stration-32113696932


(or try https://goo.gl/tRptru )

As last year, this is a 2 day event and we're planning a fun evening event
on the Tuesday night at Manchester Museum of Science.

Thanks to our sponsors Arcastream, DDN, Ellexus, Lenovo, IBM, Mellanox,
OCF and Seagate for helping make this happen!

We also still have some customer talk slots to fill, so please let me know
if you are interested in speaking.

Thanks

Simon


From bbanister at jumptrading.com  Tue Apr 11 16:29:25 2017
From: bbanister at jumptrading.com (Bryan Banister)
Date: Tue, 11 Apr 2017 15:29:25 +0000
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
In-Reply-To: <OF6CE32902.78672ADE-ON852580FF.0040E922-852580FF.0041539E@notes.na.collabserv.com>
References: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com>
	<CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>
	<OF6CE32902.78672ADE-ON852580FF.0040E922-852580FF.0041539E@notes.na.collabserv.com>
Message-ID: <1e86aa0c2e4344f19cb5eedf8f03efa9@jumptrading.com>

A word of caution, be careful about where you run this kind of policy scan as the sort process can consume all memory on your hosts and that could lead to issues with the OS deciding to kill off GPFS or other similar bad things can occur.  I recommend restricting the ILM policy scan to a subset of servers, no quorum nodes, and ensuring at least one NSD server is available for all NSDs in the file system(s).  Watch the memory consumption on your nodes during the sort operations to see if you need to tune that down in the mmapplypolicy options.

Hope that helps,
-Bryan

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Frederick Stock
Sent: Tuesday, April 11, 2017 6:54 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Policy scan against billion files for ILM/HSM

As Zachary noted the location of your metadata is the key and for the scanning you have planned flash is necessary.  If you have the resources you may consider setting up your flash in a mirrored RAID configuration (RAID1/RAID10) and have GPFS only keep one copy of metadata since the underlying storage is replicating it via the RAID.  This should improve metadata write performance but likely has little impact on your scanning, assuming you are just reading through the metadata.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com<mailto:stockf at us.ibm.com>


From:        Zachary Giles <zgiles at gmail.com<mailto:zgiles at gmail.com>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/11/2017 12:49 AM
Subject:        Re: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


It's definitely doable, and these days not too hard. Flash for
metadata is the key.
The basics of it are:
* Latest GPFS for performance benefits.
* A few 10's of TBs of flash ( or more ! ) setup in a good design..
lots of SAS, well balanced RAID that can consume the flash fully,
tuned for IOPs, and available in parallel from multiple servers.
* Tune up mmapplypolicy with -g somewhere-on-gpfs; --choice-algorithm
fast; -a, -m and -n to reasonable values ( number of cores on the
servers ); -A to ~1000
* Test first on a smaller fileset to confirm you like it. -I test
should work well and be around the same speed minus the migration
phase.
* Then throw ~8 well tuned Infiniband attached nodes at it using -N,
If they're the same as the NSD servers serving the flash, even better.

Should be able to do 1B in 5-30m depending on the idiosyncrasies of
above choices. Even 60m isn't bad and quite respectable if less gear
is used or if they system is busy while the policy is running.
Parallel metadata, it's a beautiful thing.


On Tue, Apr 11, 2017 at 12:29 AM, Masanori Mitsugi
<mitsugi at linux.vnet.ibm.com<mailto:mitsugi at linux.vnet.ibm.com>> wrote:
> Hello,
>
> Does anyone have experience to do mmapplypolicy against billion files for
> ILM/HSM?
>
> Currently I'm planning/designing
>
> * 1 Scale filesystem (5-10 PB)
> * 10-20 filesets which includes 1 billion files each
>
> And our biggest concern is "How log does it take for mmapplypolicy policy
> scan against billion files?"
>
> I know it depends on how to write the policy,
> but I don't have no billion files policy scan experience,
> so I'd like to know the order of time (min/hour/day...).
>
> It would be helpful if anyone has experience of such large number of files
> scan and let me know any considerations or points for policy design.
>
> --
> Masanori Mitsugi
> mitsugi at linux.vnet.ibm.com<mailto:mitsugi at linux.vnet.ibm.com>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Zach Giles
zgiles at gmail.com<mailto:zgiles at gmail.com>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170411/27c1a146/attachment.htm>

From k.leach at ed.ac.uk  Tue Apr 11 16:32:41 2017
From: k.leach at ed.ac.uk (Kieran Leach)
Date: Tue, 11 Apr 2017 16:32:41 +0100
Subject: [gpfsug-discuss] May Meeting Registration
In-Reply-To: <D512B238.3A43B%chair@spectrumscale.org>
References: <D512B238.3A43B%chair@spectrumscale.org>
Message-ID: <275b54d9-6779-774e-69bb-d26fead278a2@ed.ac.uk>

Hi Simon,
would you be interested in a customer talk about the RDF 
(http://rdf.ac.uk/). We manage the RDF at EPCC, providing a 23PB 
filestore to complement ARCHER (the national research HPC service) and 
other UK Research HPC services. This is of course a GPFS system. If 
you've any questions or want more info please let me know but I thought 
I'd get an email off to you while I remember.

Cheers

Kieran

On 11/04/17 16:18, Spectrum Scale UG Chair (Simon Thompson) wrote:
> Hi all,
>
> Just a reminder that the next UK user group meeting is taking place on
> 9th/10th May. If you are planning on attending, please do register at:
>
> https://www.eventbrite.com/e/spectrum-scalegpfs-user-group-spring-2017-regi
> stration-32113696932
>
>
> (or try https://goo.gl/tRptru )
>
> As last year, this is a 2 day event and we're planning a fun evening event
> on the Tuesday night at Manchester Museum of Science.
>
> Thanks to our sponsors Arcastream, DDN, Ellexus, Lenovo, IBM, Mellanox,
> OCF and Seagate for helping make this happen!
>
> We also still have some customer talk slots to fill, so please let me know
> if you are interested in speaking.
>
> Thanks
>
> Simon
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From k.leach at ed.ac.uk  Tue Apr 11 16:33:29 2017
From: k.leach at ed.ac.uk (Kieran Leach)
Date: Tue, 11 Apr 2017 16:33:29 +0100
Subject: [gpfsug-discuss] May Meeting Registration
In-Reply-To: <275b54d9-6779-774e-69bb-d26fead278a2@ed.ac.uk>
References: <D512B238.3A43B%chair@spectrumscale.org>
	<275b54d9-6779-774e-69bb-d26fead278a2@ed.ac.uk>
Message-ID: <f0f9793d-c11d-ee94-7951-8c34155362a0@ed.ac.uk>

Apologies all, wrong reply button.

Cheers

Kieran

On 11/04/17 16:32, Kieran Leach wrote:
> Hi Simon,
> would you be interested in a customer talk about the RDF 
> (http://rdf.ac.uk/). We manage the RDF at EPCC, providing a 23PB 
> filestore to complement ARCHER (the national research HPC service) and 
> other UK Research HPC services. This is of course a GPFS system. If 
> you've any questions or want more info please let me know but I 
> thought I'd get an email off to you while I remember.
>
> Cheers
>
> Kieran
>
> On 11/04/17 16:18, Spectrum Scale UG Chair (Simon Thompson) wrote:
>> Hi all,
>>
>> Just a reminder that the next UK user group meeting is taking place on
>> 9th/10th May. If you are planning on attending, please do register at:
>>
>> https://www.eventbrite.com/e/spectrum-scalegpfs-user-group-spring-2017-regi 
>>
>> stration-32113696932
>>
>>
>> (or try https://goo.gl/tRptru )
>>
>> As last year, this is a 2 day event and we're planning a fun evening 
>> event
>> on the Tuesday night at Manchester Museum of Science.
>>
>> Thanks to our sponsors Arcastream, DDN, Ellexus, Lenovo, IBM, Mellanox,
>> OCF and Seagate for helping make this happen!
>>
>> We also still have some customer talk slots to fill, so please let me 
>> know
>> if you are interested in speaking.
>>
>> Thanks
>>
>> Simon
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From makaplan at us.ibm.com  Tue Apr 11 16:36:47 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Tue, 11 Apr 2017 11:36:47 -0400
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
In-Reply-To: <OF6CE32902.78672ADE-ON852580FF.0040E922-852580FF.0041539E@notes.na.collabserv.com>
References: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com><CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>
	<OF6CE32902.78672ADE-ON852580FF.0040E922-852580FF.0041539E@notes.na.collabserv.com>
Message-ID: <OF7811876E.8E9EE6F5-ON852580FF.0052CEC9-852580FF.0055C732@notes.na.collabserv.com>

As primary developer of mmapplypolicy, please allow me to comment:

1) Fast access to metadata in system pool is most important, as several 
have commented on.  These days SSD is the favorite, but you can still go 
with "spinning" media.
If you do go with disks, it's extremely important to spread your metadata 
over independent disk "arms" -- so you can have many concurrent seeks in 
progress at the same time.  IOW, if there is a virtualization/mapping 
layer, watchout that your logical disks don't get mapped to the same 
physical disk.

2) Crucial to use both -g and -N :: -g 
/gpfs-not-necessarily-the-same-fs-as-Im-scanning/tempdir  and -N 
several-nodes-that-will-be-accessing-the-system-pool

3a) If at all possible, encourage your data and application designers to 
"pack" their directories with lots of files.   Keep in mind that, 
mmapplypolicy will read every directory.  The more directories, the more 
seeks, more time spent waiting for IO.   OTOH, in more typical Unix/Linux 
usage, we tend to low average number of files per directory. 

3b) As admin, you may not be able to change your data design to pack 
hundreds of files per directory, BUT you can  make sure you are running a 
sufficiently modern release of Spectrum Scale that supports "data in 
inode" -- "Data in inode" also means "directory entries in inode" -- which 
means practically any small directory, up to a few hundred files, will fit 
in an an inode -- which means mmapplypolicy can read small directories 
with one seek, instead of two. 

(Someone will please remind us of the release number that first supported 
"directories in inode".)

4) Sorry, Fred, but the recommendation to use RAID mirroring of metadata 
on SSD, is not necessarily, important for metadata scanning. In fact it 
may work against you.  If you use GPFS replication of metadata - that can 
work for you -- since then GPFS can direct read operations to either copy, 
preferring a locally attached copy, depending on how storage is attached 
to node, etc, etc.   Choice of how to replicate metadata - either using 
GPFS replication or the RAID controller - is probably best made based on 
reliability and recoverability requirements.

5) YMMV - We'd love to hear/see your performance results for 
mmapplypolicy, especially if they're good.  Even if they're bad, come back 
here for more tuning tips!

-- marc of Spectrum Scale (ne GPFS)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170411/264bb85a/attachment.htm>

From bbanister at jumptrading.com  Tue Apr 11 16:51:56 2017
From: bbanister at jumptrading.com (Bryan Banister)
Date: Tue, 11 Apr 2017 15:51:56 +0000
Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
In-Reply-To: <HE1PR0701MB25545C6B69266A1691A3A173A4000@HE1PR0701MB2554.eurprd07.prod.outlook.com>
References: <HE1PR0701MB25545C6B69266A1691A3A173A4000@HE1PR0701MB2554.eurprd07.prod.outlook.com>
Message-ID: <d0fec4055195418484152336003e89c8@jumptrading.com>

There are so many things to look at and many tools for doing so (iostat, htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would recommend a review of the presentation that Yuri gave at the most recent GPFS User Group:
https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs

Cheers,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs
Sent: Tuesday, April 11, 2017 3:58 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories

This is a curious issue which I'm trying to get to the bottom of.

We currently have two Spectrum Scale file systems, both are running GPFS 4.2.1-1 some of the servers have been upgraded to 4.2.1-2.

The older one which was upgraded from GPFS 3.5 works find create a directory is always fast and no issue.

The new one, which has nice new SSD for metadata and hence should be faster. can take up to 30 seconds to create a directory but usually takes less than a second, The longer directory creates usually happen on busy nodes that have not used the new storage in a while. (Its new so we've not moved much of the data over yet) But it can also happen randomly anywhere, including from the NSD servers them selves. (times of 3-4 seconds from the NSD servers have been seen, on a single directory create)

We've been pointed at the network and suggested we check all network settings, and its been suggested to build an admin network, but I'm not sure I entirely understand why and how this would help. Its a mixed 1G/10G network with the NSD servers connected at 40G with an MTU of 9000.

However as I say, the older filesystem is fine, and it does not matter if the nodes are connected to the old GPFS cluster or the new one, (although the delay is worst on the old gpfs cluster), So I'm really playing spot the difference. and the network is not really an obvious difference.

Its been suggested to look at a trace when it occurs but as its difficult to recreate collecting one is difficult.

Any ideas would be most helpful.

Thanks


Peter Childs
ITS Research Infrastructure
Queen Mary, University of London
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


From S.J.Thompson at bham.ac.uk  Tue Apr 11 16:55:35 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Tue, 11 Apr 2017 15:55:35 +0000
Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
Message-ID: <D512BA7E.3A481%s.j.thompson@bham.ac.uk>

We actually saw this for a while on one of our clusters which was new. But
by the time I'd got round to looking deeper, it had gone, maybe we were
using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2,
so might be worth trying to bump the version and see if it goes away.

We saw it on the NSD servers directly as well, so not some client trying
to talk to it, so maybe there was some buggy code?

Simon

On 11/04/2017, 16:51, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of Bryan Banister" <gpfsug-discuss-bounces at spectrumscale.org on behalf of
bbanister at jumptrading.com> wrote:

>There are so many things to look at and many tools for doing so (iostat,
>htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would
>recommend a review of the presentation that Yuri gave at the most recent
>GPFS User Group:
>https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs
>
>Cheers,
>-Bryan
>
>-----Original Message-----
>From: gpfsug-discuss-bounces at spectrumscale.org
>[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter
>Childs
>Sent: Tuesday, April 11, 2017 3:58 AM
>To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
>
>This is a curious issue which I'm trying to get to the bottom of.
>
>We currently have two Spectrum Scale file systems, both are running GPFS
>4.2.1-1 some of the servers have been upgraded to 4.2.1-2.
>
>The older one which was upgraded from GPFS 3.5 works find create a
>directory is always fast and no issue.
>
>The new one, which has nice new SSD for metadata and hence should be
>faster. can take up to 30 seconds to create a directory but usually takes
>less than a second, The longer directory creates usually happen on busy
>nodes that have not used the new storage in a while. (Its new so we've
>not moved much of the data over yet) But it can also happen randomly
>anywhere, including from the NSD servers them selves. (times of 3-4
>seconds from the NSD servers have been seen, on a single directory create)
>
>We've been pointed at the network and suggested we check all network
>settings, and its been suggested to build an admin network, but I'm not
>sure I entirely understand why and how this would help. Its a mixed
>1G/10G network with the NSD servers connected at 40G with an MTU of 9000.
>
>However as I say, the older filesystem is fine, and it does not matter if
>the nodes are connected to the old GPFS cluster or the new one, (although
>the delay is worst on the old gpfs cluster), So I'm really playing spot
>the difference. and the network is not really an obvious difference.
>
>Its been suggested to look at a trace when it occurs but as its difficult
>to recreate collecting one is difficult.
>
>Any ideas would be most helpful.
>
>Thanks
>
>
>
>Peter Childs
>ITS Research Infrastructure
>Queen Mary, University of London
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>________________________________
>
>Note: This email is for the confidential use of the named addressee(s)
>only and may contain proprietary, confidential or privileged information.
>If you are not the intended recipient, you are hereby notified that any
>review, dissemination or copying of this email is strictly prohibited,
>and to please notify the sender immediately and destroy this email and
>any attachments. Email transmission cannot be guaranteed to be secure or
>error-free. The Company, therefore, does not make any guarantees as to
>the completeness or accuracy of this email or any attachments. This email
>is for informational purposes only and does not constitute a
>recommendation, offer, request or solicitation of any kind to buy, sell,
>subscribe, redeem or perform any type of transaction of a financial
>product.
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From jonathon.anderson at colorado.edu  Tue Apr 11 16:56:56 2017
From: jonathon.anderson at colorado.edu (Jonathon A Anderson)
Date: Tue, 11 Apr 2017 15:56:56 +0000
Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
Message-ID: <DEA2E0C8-0B71-4FB0-AA3D-8B84D3CCA58F@colorado.edu>

Bryan,

That looks like a really useful set of presentation slides! Thanks for sharing!

Which one in particular is the one Yuri gave that you?re referring to?

~jonathon


On 4/11/17, 9:51 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Bryan Banister" <gpfsug-discuss-bounces at spectrumscale.org on behalf of bbanister at jumptrading.com> wrote:

    There are so many things to look at and many tools for doing so (iostat, htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would recommend a review of the presentation that Yuri gave at the most recent GPFS User Group:
    https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs
    
    Cheers,
    -Bryan
    
    -----Original Message-----
    From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs
    Sent: Tuesday, April 11, 2017 3:58 AM
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
    
    This is a curious issue which I'm trying to get to the bottom of.
    
    We currently have two Spectrum Scale file systems, both are running GPFS 4.2.1-1 some of the servers have been upgraded to 4.2.1-2.
    
    The older one which was upgraded from GPFS 3.5 works find create a directory is always fast and no issue.
    
    The new one, which has nice new SSD for metadata and hence should be faster. can take up to 30 seconds to create a directory but usually takes less than a second, The longer directory creates usually happen on busy nodes that have not used the new storage in a while. (Its new so we've not moved much of the data over yet) But it can also happen randomly anywhere, including from the NSD servers them selves. (times of 3-4 seconds from the NSD servers have been seen, on a single directory create)
    
    We've been pointed at the network and suggested we check all network settings, and its been suggested to build an admin network, but I'm not sure I entirely understand why and how this would help. Its a mixed 1G/10G network with the NSD servers connected at 40G with an MTU of 9000.
    
    However as I say, the older filesystem is fine, and it does not matter if the nodes are connected to the old GPFS cluster or the new one, (although the delay is worst on the old gpfs cluster), So I'm really playing spot the difference. and the network is not really an obvious difference.
    
    Its been suggested to look at a trace when it occurs but as its difficult to recreate collecting one is difficult.
    
    Any ideas would be most helpful.
    
    Thanks
    
    
    Peter Childs
    ITS Research Infrastructure
    Queen Mary, University of London
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    ________________________________
    
    Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

From bbanister at jumptrading.com  Tue Apr 11 16:59:51 2017
From: bbanister at jumptrading.com (Bryan Banister)
Date: Tue, 11 Apr 2017 15:59:51 +0000
Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
In-Reply-To: <DEA2E0C8-0B71-4FB0-AA3D-8B84D3CCA58F@colorado.edu>
References: <DEA2E0C8-0B71-4FB0-AA3D-8B84D3CCA58F@colorado.edu>
Message-ID: <be5e6e3066d54b7e8d0d4d8ab09a5a80@jumptrading.com>

Problem Determination and GPFS Internals.

My security group won't let me go to the google docs site from my work compute... I'm sure there is malicious malware on that site!!  j/k,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson
Sent: Tuesday, April 11, 2017 10:57 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

Bryan,

That looks like a really useful set of presentation slides! Thanks for sharing!

Which one in particular is the one Yuri gave that you?re referring to?

~jonathon


On 4/11/17, 9:51 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Bryan Banister" <gpfsug-discuss-bounces at spectrumscale.org on behalf of bbanister at jumptrading.com> wrote:

    There are so many things to look at and many tools for doing so (iostat, htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would recommend a review of the presentation that Yuri gave at the most recent GPFS User Group:
    https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs

    Cheers,
    -Bryan

    -----Original Message-----
    From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs
    Sent: Tuesday, April 11, 2017 3:58 AM
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories

    This is a curious issue which I'm trying to get to the bottom of.

    We currently have two Spectrum Scale file systems, both are running GPFS 4.2.1-1 some of the servers have been upgraded to 4.2.1-2.

    The older one which was upgraded from GPFS 3.5 works find create a directory is always fast and no issue.

    The new one, which has nice new SSD for metadata and hence should be faster. can take up to 30 seconds to create a directory but usually takes less than a second, The longer directory creates usually happen on busy nodes that have not used the new storage in a while. (Its new so we've not moved much of the data over yet) But it can also happen randomly anywhere, including from the NSD servers them selves. (times of 3-4 seconds from the NSD servers have been seen, on a single directory create)

    We've been pointed at the network and suggested we check all network settings, and its been suggested to build an admin network, but I'm not sure I entirely understand why and how this would help. Its a mixed 1G/10G network with the NSD servers connected at 40G with an MTU of 9000.

    However as I say, the older filesystem is fine, and it does not matter if the nodes are connected to the old GPFS cluster or the new one, (although the delay is worst on the old gpfs cluster), So I'm really playing spot the difference. and the network is not really an obvious difference.

    Its been suggested to look at a trace when it occurs but as its difficult to recreate collecting one is difficult.

    Any ideas would be most helpful.

    Thanks


    Peter Childs
    ITS Research Infrastructure
    Queen Mary, University of London
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss

    ________________________________

    Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

From p.childs at qmul.ac.uk  Tue Apr 11 20:35:40 2017
From: p.childs at qmul.ac.uk (Peter Childs)
Date: Tue, 11 Apr 2017 19:35:40 +0000
Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
In-Reply-To: <D512BA7E.3A481%s.j.thompson@bham.ac.uk>
References: <D512BA7E.3A481%s.j.thompson@bham.ac.uk>
Message-ID: <qi5id7q66e9iu52afg692idc.1491939339153@email.android.com>


Can you remember what version you were running? Don't worry if you can't remember.

It looks like ibm may have withdrawn 4.2.1<tel:4.2.1> from fix central and wish to forget its existences. Never a good sign, 4.2.0<tel:4.2.0>, 4.2.2<tel:4.2.2>, 4.2.3<tel:4.2.3> and even 3.5, so maybe upgrading is worth a try.

I've looked at all the standard trouble shouting guides and got nowhere hence why I asked. But another set of slides always helps.

Thank-you for the help, still head scratching....  Which only makes the issue more random.

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


---- Simon Thompson (IT Research Support) wrote ----

We actually saw this for a while on one of our clusters which was new. But
by the time I'd got round to looking deeper, it had gone, maybe we were
using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2,
so might be worth trying to bump the version and see if it goes away.

We saw it on the NSD servers directly as well, so not some client trying
to talk to it, so maybe there was some buggy code?

Simon

On 11/04/2017, 16:51, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of Bryan Banister" <gpfsug-discuss-bounces at spectrumscale.org on behalf of
bbanister at jumptrading.com> wrote:

>There are so many things to look at and many tools for doing so (iostat,
>htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would
>recommend a review of the presentation that Yuri gave at the most recent
>GPFS User Group:
>https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs
>
>Cheers,
>-Bryan
>
>-----Original Message-----
>From: gpfsug-discuss-bounces at spectrumscale.org
>[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter
>Childs
>Sent: Tuesday, April 11, 2017 3:58 AM
>To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
>
>This is a curious issue which I'm trying to get to the bottom of.
>
>We currently have two Spectrum Scale file systems, both are running GPFS
>4.2.1-1 some of the servers have been upgraded to 4.2.1-2.
>
>The older one which was upgraded from GPFS 3.5 works find create a
>directory is always fast and no issue.
>
>The new one, which has nice new SSD for metadata and hence should be
>faster. can take up to 30 seconds to create a directory but usually takes
>less than a second, The longer directory creates usually happen on busy
>nodes that have not used the new storage in a while. (Its new so we've
>not moved much of the data over yet) But it can also happen randomly
>anywhere, including from the NSD servers them selves. (times of 3-4
>seconds from the NSD servers have been seen, on a single directory create)
>
>We've been pointed at the network and suggested we check all network
>settings, and its been suggested to build an admin network, but I'm not
>sure I entirely understand why and how this would help. Its a mixed
>1G/10G network with the NSD servers connected at 40G with an MTU of 9000.
>
>However as I say, the older filesystem is fine, and it does not matter if
>the nodes are connected to the old GPFS cluster or the new one, (although
>the delay is worst on the old gpfs cluster), So I'm really playing spot
>the difference. and the network is not really an obvious difference.
>
>Its been suggested to look at a trace when it occurs but as its difficult
>to recreate collecting one is difficult.
>
>Any ideas would be most helpful.
>
>Thanks
>
>
>
>Peter Childs
>ITS Research Infrastructure
>Queen Mary, University of London
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>________________________________
>
>Note: This email is for the confidential use of the named addressee(s)
>only and may contain proprietary, confidential or privileged information.
>If you are not the intended recipient, you are hereby notified that any
>review, dissemination or copying of this email is strictly prohibited,
>and to please notify the sender immediately and destroy this email and
>any attachments. Email transmission cannot be guaranteed to be secure or
>error-free. The Company, therefore, does not make any guarantees as to
>the completeness or accuracy of this email or any attachments. This email
>is for informational purposes only and does not constitute a
>recommendation, offer, request or solicitation of any kind to buy, sell,
>subscribe, redeem or perform any type of transaction of a financial
>product.
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170411/504d9d6f/attachment.htm>

From mitsugi at linux.vnet.ibm.com  Wed Apr 12 02:51:03 2017
From: mitsugi at linux.vnet.ibm.com (Masanori Mitsugi)
Date: Wed, 12 Apr 2017 10:51:03 +0900
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
In-Reply-To: <OF7811876E.8E9EE6F5-ON852580FF.0052CEC9-852580FF.0055C732@notes.na.collabserv.com>
References: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com>
	<CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>
	<OF6CE32902.78672ADE-ON852580FF.0040E922-852580FF.0041539E@notes.na.collabserv.com>
	<OF7811876E.8E9EE6F5-ON852580FF.0052CEC9-852580FF.0055C732@notes.na.collabserv.com>
Message-ID: <0851d194-088e-d93a-303d-ceb0de3dbaa8@linux.vnet.ibm.com>

Marc, Zachary, Fred, Bryan,

Thank you for providing great advice!

It's pretty useful for me to tune our policy with best performance.

As for "directories in inode", we plan to use latest version,
so I believe we can leverage this function.

-- 
Masanori Mitsugi
mitsugi at linux.vnet.ibm.com


From vpuvvada at in.ibm.com  Wed Apr 12 10:53:25 2017
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Wed, 12 Apr 2017 15:23:25 +0530
Subject: [gpfsug-discuss] AFM gateways
In-Reply-To: <524d253e-b825-4e6a-7cbf-884af394ddc5@wustl.edu>
References: <f49e9fe5-5b49-6e71-ac4f-83f1673b6a82@wustl.edu><OFE66BE3A0.DCC08083-ON652580FE.003B060D-652580FE.003C151E@notes.na.collabserv.com><DB5PR04MB1463EFFEFCEC10D3BE417C4BE1010@DB5PR04MB1463.eurprd04.prod.outlook.com>
	<524d253e-b825-4e6a-7cbf-884af394ddc5@wustl.edu>
Message-ID: <OF58E8D466.D8556925-ON65258100.00362FE6-65258100.00365425@notes.na.collabserv.com>

Gateway node requires server license.

~Venkat (vpuvvada at in.ibm.com)


From:   Matt Weil <mweil at wustl.edu>
To:     <gpfsug-discuss at spectrumscale.org>
Date:   04/11/2017 01:46 AM
Subject:        Re: [gpfsug-discuss] AFM gateways
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Thanks for the answers..  For fail over I believe we will want to keep it 
separate then.   Next question. Is it licensed as a client or a server? 

On 4/10/17 6:20 AM, McLaughlin, Sandra M wrote:
Hi,
 
I agree with Venkat. 
 
I did exactly what you said below, enabled my NSD servers as gateways to 
get additional throughput (with both native gpfs protocol and NFS 
protocol), which worked well; we definitely got the increased traffic. 
However, I wouldn?t do it again through choice. As Venkat says, if there 
is a problem with the remote cluster, that can affect any of the gateway 
nodes (if using gpfs protocol), but also, we had a problem with one of the 
gateway nodes, where it kept crashing (which is now resolved) and then all 
filesets for which that node was the gateway had to failover to other 
gateway servers and this really messes everything up while the failover is 
taking place. I am also, stupidly, serving NFS and samba from the NSD 
servers (via ctdb) which I also, would not do again !
 
It would be nice if there was a way to specify which gateway server is the 
primary gateway for a specific fileset.
 
Regards, Sandra
 
From: gpfsug-discuss-bounces at spectrumscale.org [
mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara 
R Puvvada
Sent: 10 April 2017 11:56
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM gateways
 
It is not recommended to make NSD servers as gateway nodes for native GPFS 
protocol. Unresponsive remote cluster mount might cause gateway node to 
hang on synchronous operations (ex. Lookup, Read, Open etc..), this will 
affect NSD server functionality. More information is documented @

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1ins_NFSvsGPFSAFM.htm


~Venkat (vpuvvada at in.ibm.com)


From:        Matt Weil <mweil at wustl.edu>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        04/07/2017 08:28 PM
Subject:        [gpfsug-discuss] AFM gateways
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,

any reason to not enable all NSD servers as gateway when using native
gpfs AFM?  Will they all pass traffic?

Thanks

Matt


________________________________
The materials in this message are private and may contain Protected 
Healthcare Information or other information of a sensitive nature. If you 
are not the intended recipient, be advised that any unauthorized use, 
disclosure, copying or the taking of any action in reliance on the 
contents of this information is strictly prohibited. If you have received 
this email in error, please immediately notify the sender via telephone or 
return mail.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


AstraZeneca UK Limited is a company incorporated in England and Wales with 
registered number:03674842 and its registered office at 1 Francis Crick 
Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient 
only and may contain confidential and privileged information. If they have 
come to you in error, you must not copy or show them to anyone; instead, 
please reply to this e-mail, highlighting the error to the sender and then 
immediately delete the message. For information about how AstraZeneca UK 
Limited and its affiliates may process information, personal data and 
monitor communications, please see our privacy notice at 
www.astrazeneca.com


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


The materials in this message are private and may contain Protected 
Healthcare Information or other information of a sensitive nature. If you 
are not the intended recipient, be advised that any unauthorized use, 
disclosure, copying or the taking of any action in reliance on the 
contents of this information is strictly prohibited. If you have received 
this email in error, please immediately notify the sender via telephone or 
return mail._______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170412/f19e0f8e/attachment.htm>

From mweil at wustl.edu  Wed Apr 12 15:52:48 2017
From: mweil at wustl.edu (Matt Weil)
Date: Wed, 12 Apr 2017 09:52:48 -0500
Subject: [gpfsug-discuss] AFM gateways
In-Reply-To: <OF58E8D466.D8556925-ON65258100.00362FE6-65258100.00365425@notes.na.collabserv.com>
References: <f49e9fe5-5b49-6e71-ac4f-83f1673b6a82@wustl.edu>
	<OFE66BE3A0.DCC08083-ON652580FE.003B060D-652580FE.003C151E@notes.na.collabserv.com>
	<DB5PR04MB1463EFFEFCEC10D3BE417C4BE1010@DB5PR04MB1463.eurprd04.prod.outlook.com>
	<524d253e-b825-4e6a-7cbf-884af394ddc5@wustl.edu>
	<OF58E8D466.D8556925-ON65258100.00362FE6-65258100.00365425@notes.na.collabserv.com>
Message-ID: <f5977cc8-8911-484d-76c7-bf2a39b88e50@wustl.edu>

yes it tells you that when you attempt to make the node a gateway and is does not have a server license designation.

On 4/12/17 4:53 AM, Venkateswara R Puvvada wrote:
Gateway node requires server license.

~Venkat (vpuvvada at in.ibm.com<mailto:vpuvvada at in.ibm.com>)


From:        Matt Weil <mweil at wustl.edu><mailto:mweil at wustl.edu>
To:        <gpfsug-discuss at spectrumscale.org><mailto:gpfsug-discuss at spectrumscale.org>
Date:        04/11/2017 01:46 AM
Subject:        Re: [gpfsug-discuss] AFM gateways
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Thanks for the answers..  For fail over I believe we will want to keep it separate then.   Next question. Is it licensed as a client or a server?

On 4/10/17 6:20 AM, McLaughlin, Sandra M wrote:
Hi,

I agree with Venkat.

I did exactly what you said below, enabled my NSD servers as gateways to get additional throughput (with both native gpfs protocol and NFS protocol), which worked well; we definitely got the increased traffic. However, I wouldn?t do it again through choice. As Venkat says, if there is a problem with the remote cluster, that can affect any of the gateway nodes (if using gpfs protocol), but also, we had a problem with one of the gateway nodes, where it kept crashing (which is now resolved) and then all filesets for which that node was the gateway had to failover to other gateway servers and this really messes everything up while the failover is taking place. I am also, stupidly, serving NFS and samba from the NSD servers (via ctdb) which I also, would not do again !

It would be nice if there was a way to specify which gateway server is the primary gateway for a specific fileset.

Regards, Sandra

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada
Sent: 10 April 2017 11:56
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org><mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM gateways

It is not recommended to make NSD servers as gateway nodes for native GPFS protocol. Unresponsive remote cluster mount might cause gateway node to hang on synchronous operations (ex. Lookup, Read, Open etc..), this will affect NSD server functionality. More information is documented @

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1ins_NFSvsGPFSAFM.htm

~Venkat (vpuvvada at in.ibm.com<mailto:vpuvvada at in.ibm.com>)


From:        Matt Weil <mweil at wustl.edu<mailto:mweil at wustl.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/07/2017 08:28 PM
Subject:        [gpfsug-discuss] AFM gateways
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>

________________________________


Hello,

any reason to not enable all NSD servers as gateway when using native
gpfs AFM?  Will they all pass traffic?

Thanks

Matt


________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://clicktime.symantec.com/a/1/8aBOwOn_FKxazPLofFX3cA0dDfWXyb5axOe5dISv0MQ=?d=j_6XzU1IV9LQ-e532TEWOiD1RS4MpcmKAZyY2sOb5ZFKeKraYKzPKVQ4DFQQvcoLaFlXYvSpJrBNPXgTLo9lvlh_-tLXZz6eK3RvlxqOvMh7u61FPNKvX3imIyz4oKgEui5fq5PAZtfg30umWRmQiMC4IXcZP4tBPCofBgPeN1QnVLFzY9StBzIWmH1VXwf9-MBET1k5ltix5IHkz4Q7JkSDZjHbVjiq_zycxFUQ4u92eEKq-4RzX8jPpKXVMjL-HQPjf2Y8oGNZhDBbiAFrwI-vUSs9VLha4rRcdOn4k0gN4IrFgvtlJY6r65vqwyrYb50stU_BmJgIX94nHlh1AYUa-bDwHNn7aPX-MOHCgHM86sfnDwm_hzuia4YUXmzxcbU4NR00eyHApIKhXXAgxBfua-s25zX32om-i8jWCKpl_AOfX7CZqMuvSp48hzs%3D&u=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss>


________________________________

AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com/>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail._______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170412/3129d0b6/attachment.htm>

From chekh at stanford.edu  Wed Apr 12 22:01:45 2017
From: chekh at stanford.edu (Alex Chekholko)
Date: Wed, 12 Apr 2017 14:01:45 -0700
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
In-Reply-To: <OF7811876E.8E9EE6F5-ON852580FF.0052CEC9-852580FF.0055C732@notes.na.collabserv.com>
References: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com>
	<CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com>
	<OF6CE32902.78672ADE-ON852580FF.0040E922-852580FF.0041539E@notes.na.collabserv.com>
	<OF7811876E.8E9EE6F5-ON852580FF.0052CEC9-852580FF.0055C732@notes.na.collabserv.com>
Message-ID: <284246a2-b14b-0a73-6dad-4c73caef58c9@stanford.edu>

On 4/11/17 8:36 AM, Marc A Kaplan wrote:
>
> 5) YMMV - We'd love to hear/see your performance results for
> mmapplypolicy, especially if they're good.  Even if they're bad, come
> back here for more tuning tips!

I have a filesystem that currently has 267919775 (roughly quarter 
billion, 250 million) used inodes.  The metadata is on SSD behind a DDN 12K.

We do use 4K inodes, and files smaller than 4K fit into the inodes.

Here is the command I use to apply a policy:

mmapplypolicy gsfs0 -P policy.txt -N 
scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7 -g 
/srv/gsfs0/admin_stuff/ -I test -B 500 -A 61 -a 4

That takes approximately 10 minutes to do the whole scan.  The "-B 500 
-A 61 -a 4" numbers we determined just by trying different values with 
the same policy file and seeing the resulting scan duration.

10mins is short enough to do almost "interactive" type of file list 
policies and look at the results.  E.g. list all files over 1TB in size.

This was a couple of years ago, probably on a different GPFS version, 
but on same storage and NSD hardware, so now I just copy those 
parameters.  You should probably not just copy them but try some other 
values yourself.

Regards,
Alex


From makaplan at us.ibm.com  Wed Apr 12 23:43:20 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Wed, 12 Apr 2017 18:43:20 -0400
Subject: [gpfsug-discuss] Policy scan against billion files for ILM/HSM
In-Reply-To: <284246a2-b14b-0a73-6dad-4c73caef58c9@stanford.edu>
References: <db74fc20-6b9b-15aa-ab13-5e11347dd841@linux.vnet.ibm.com><CAMYZk=d4B3RPTdZTK42bbf_1-M9df=Vc49kPocAa=7Yqs3HGzw@mail.gmail.com><OF6CE32902.78672ADE-ON852580FF.0040E922-852580FF.0041539E@notes.na.collabserv.com><OF7811876E.8E9EE6F5-ON852580FF.0052CEC9-852580FF.0055C732@notes.na.collabserv.com>
	<284246a2-b14b-0a73-6dad-4c73caef58c9@stanford.edu>
Message-ID: <OFEAD6CC5D.3532C5DC-ON85258100.007C2C0F-85258100.007CD49A@notes.na.collabserv.com>

>>>Here is the command I use to apply a policy:

 mmapplypolicy gsfs0 -P policy.txt -N 
 scg-gs0,scg-gs1,scg-gs2,scg-gs3,scg-gs4,scg-gs5,scg-gs6,scg-gs7 -g 
 /srv/gsfs0/admin_stuff/ -I test -B 500 -A 61 -a 4

 That takes approximately 10 minutes to do the whole scan.  The "-B 500 
 -A 61 -a 4" numbers we determined just by trying different values with 
 the same policy file and seeing the resulting scan duration.
<<<

That's pretty good.  BUT, FYI, the -A number-of-buckets parameter should 
be scaled with the total number of files you expect to find in the 
argument filesystem or directory. 

If you don't set it the command will default to number-of-inodes-allocated 
/ million, but capped at a minimum of 7 and a maximum of 4096. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170412/a2d8b0bf/attachment.htm>

From p.childs at qmul.ac.uk  Thu Apr 13 11:35:19 2017
From: p.childs at qmul.ac.uk (Peter Childs)
Date: Thu, 13 Apr 2017 10:35:19 +0000
Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
In-Reply-To: <qi5id7q66e9iu52afg692idc.1491939339153@email.android.com>
References: <D512BA7E.3A481%s.j.thompson@bham.ac.uk>,
	<qi5id7q66e9iu52afg692idc.1491939339153@email.android.com>
Message-ID: <HE1PR0701MB2554D86AAD9141D633622E82A4020@HE1PR0701MB2554.eurprd07.prod.outlook.com>


After a load more debugging, and switching off the quota's the issue looks to be quota related. in that the issue has gone away since I switched quota's off.

I will need to switch them back on, but at least we know the issue is not the network and is likely to be fixed by upgrading.....


Peter Childs
ITS Research Infrastructure
Queen Mary, University of London


________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Peter Childs <p.childs at qmul.ac.uk>
Sent: Tuesday, April 11, 2017 8:35:40 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

Can you remember what version you were running? Don't worry if you can't remember.

It looks like ibm may have withdrawn 4.2.1<tel:4.2.1> from fix central and wish to forget its existences. Never a good sign, 4.2.0<tel:4.2.0>, 4.2.2<tel:4.2.2>, 4.2.3<tel:4.2.3> and even 3.5, so maybe upgrading is worth a try.

I've looked at all the standard trouble shouting guides and got nowhere hence why I asked. But another set of slides always helps.

Thank-you for the help, still head scratching....  Which only makes the issue more random.

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


---- Simon Thompson (IT Research Support) wrote ----

We actually saw this for a while on one of our clusters which was new. But
by the time I'd got round to looking deeper, it had gone, maybe we were
using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2,
so might be worth trying to bump the version and see if it goes away.

We saw it on the NSD servers directly as well, so not some client trying
to talk to it, so maybe there was some buggy code?

Simon

On 11/04/2017, 16:51, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of Bryan Banister" <gpfsug-discuss-bounces at spectrumscale.org on behalf of
bbanister at jumptrading.com> wrote:

>There are so many things to look at and many tools for doing so (iostat,
>htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would
>recommend a review of the presentation that Yuri gave at the most recent
>GPFS User Group:
>https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs
>
>Cheers,
>-Bryan
>
>-----Original Message-----
>From: gpfsug-discuss-bounces at spectrumscale.org
>[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter
>Childs
>Sent: Tuesday, April 11, 2017 3:58 AM
>To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
>
>This is a curious issue which I'm trying to get to the bottom of.
>
>We currently have two Spectrum Scale file systems, both are running GPFS
>4.2.1-1 some of the servers have been upgraded to 4.2.1-2.
>
>The older one which was upgraded from GPFS 3.5 works find create a
>directory is always fast and no issue.
>
>The new one, which has nice new SSD for metadata and hence should be
>faster. can take up to 30 seconds to create a directory but usually takes
>less than a second, The longer directory creates usually happen on busy
>nodes that have not used the new storage in a while. (Its new so we've
>not moved much of the data over yet) But it can also happen randomly
>anywhere, including from the NSD servers them selves. (times of 3-4
>seconds from the NSD servers have been seen, on a single directory create)
>
>We've been pointed at the network and suggested we check all network
>settings, and its been suggested to build an admin network, but I'm not
>sure I entirely understand why and how this would help. Its a mixed
>1G/10G network with the NSD servers connected at 40G with an MTU of 9000.
>
>However as I say, the older filesystem is fine, and it does not matter if
>the nodes are connected to the old GPFS cluster or the new one, (although
>the delay is worst on the old gpfs cluster), So I'm really playing spot
>the difference. and the network is not really an obvious difference.
>
>Its been suggested to look at a trace when it occurs but as its difficult
>to recreate collecting one is difficult.
>
>Any ideas would be most helpful.
>
>Thanks
>
>
>
>Peter Childs
>ITS Research Infrastructure
>Queen Mary, University of London
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>________________________________
>
>Note: This email is for the confidential use of the named addressee(s)
>only and may contain proprietary, confidential or privileged information.
>If you are not the intended recipient, you are hereby notified that any
>review, dissemination or copying of this email is strictly prohibited,
>and to please notify the sender immediately and destroy this email and
>any attachments. Email transmission cannot be guaranteed to be secure or
>error-free. The Company, therefore, does not make any guarantees as to
>the completeness or accuracy of this email or any attachments. This email
>is for informational purposes only and does not constitute a
>recommendation, offer, request or solicitation of any kind to buy, sell,
>subscribe, redeem or perform any type of transaction of a financial
>product.
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From scale at us.ibm.com  Fri Apr 14 08:34:06 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Fri, 14 Apr 2017 15:34:06 +0800
Subject: [gpfsug-discuss] Does SVC / Spectrum Virtualize support IBM
 Spectrum Scale with SCSI-3 Persistent Reservations?
In-Reply-To: <OF1A98F7F5.B325E1A4-ON002580FF.002DCEF1-C12580FF.002E323E@notes.na.collabserv.com>
References: <OF1A98F7F5.B325E1A4-ON002580FF.002DCEF1-C12580FF.002E323E@notes.na.collabserv.com>
Message-ID: <OFB4CA752E.61730811-ON85258102.00293314-48258102.00299338@notes.na.collabserv.com>


If you can use  " mmchconfig usePersistentReserve=yes" successfully, then
it is supported, we will check the compatibility during the command,  and
you can also use "tsprinquiry device(no /dev prefix)" check the vendor
output. Thanks.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	"Christoph Krafft" <ckrafft at de.ibm.com>
To:	"gpfsug main discussion list" <gpfsug-discuss at gpfsug.org>
Cc:	Achim Christ <achim.christ at de.ibm.com>, Petra Christ
            <PCHRIST at de.ibm.com>
Date:	04/11/2017 04:25 PM
Subject:	[gpfsug-discuss] Does SVC / Spectrum Virtualize support IBM
            Spectrum Scale with SCSI-3 Persistent Reservations?
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi folks,

there is a list of storage devices that support SCSI-3 PR in the GPFS FAQ
Doc (see Answer 4.5).

https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html#scsi3


Since this list contains IBM V-model storage subsystems that include
Storage Virtualization - I was wondering if SVC / Spectrum Virtualize
can also support SCSI-3 PR (although not explicitly on the list)?

Any hints and help is warmla welcome - thank you in advance.


Mit freundlichen Gr??en / Sincerely

Christoph Krafft

Client Technical Specialist - Power Systems, IBM Systems
Certified IT Specialist @ The Open Group
                                                                                                              
                                                                                                              
 Phone:            +49 (0) 7034 643 2171                    IBM Deutschland GmbH                              
                                                                                                              
 Mobile:           +49 (0) 160 97 81 86 12                  Am Weiher 24                                      
                                                                                                              
 Email:            ckrafft at de.ibm.com                       65451 Kelsterbach                                 
                                                                                                              
                                                            Germany                                           
                                                                                                              
                                                                                                              
 IBM Deutschland                                                                                              
 GmbH /                                                                                                       
 Vorsitzender des                                                                                             
 Aufsichtsrats:                                                                                               
 Martin Jetter                                                                                                
 Gesch?ftsf?hrung:                                                                                            
 Martina Koederitz                                                                                            
 (Vorsitzende),                                                                                               
 Nicole Reimer,                                                                                               
 Norbert Janzen,                                                                                              
 Dr. Christian                                                                                                
 Keller, Ivo                                                                                                  
 Koerner, Stefan                                                                                              
 Lutz                                                                                                         
 Sitz der                                                                                                     
 Gesellschaft:                                                                                                
 Ehningen /                                                                                                   
 Registergericht:                                                                                             
 Amtsgericht                                                                                                  
 Stuttgart, HRB                                                                                               
 14562 /                                                                                                      
 WEEE-Reg.-Nr. DE                                                                                             
 99369940                                                                                                     
                                                                                                              

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170414/73767f0a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170414/73767f0a/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A696179.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170414/73767f0a/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A223532.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170414/73767f0a/attachment-0002.gif>

From scale at us.ibm.com  Fri Apr 14 08:34:06 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Fri, 14 Apr 2017 15:34:06 +0800
Subject: [gpfsug-discuss] Does SVC / Spectrum Virtualize support IBM
 Spectrum Scale with SCSI-3 Persistent Reservations?
In-Reply-To: <OF1A98F7F5.B325E1A4-ON002580FF.002DCEF1-C12580FF.002E323E@notes.na.collabserv.com>
References: <OF1A98F7F5.B325E1A4-ON002580FF.002DCEF1-C12580FF.002E323E@notes.na.collabserv.com>
Message-ID: <OFB4CA752E.61730811-ON85258102.00293314-48258102.00299338@notes.na.collabserv.com>


If you can use  " mmchconfig usePersistentReserve=yes" successfully, then
it is supported, we will check the compatibility during the command,  and
you can also use "tsprinquiry device(no /dev prefix)" check the vendor
output. Thanks.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	"Christoph Krafft" <ckrafft at de.ibm.com>
To:	"gpfsug main discussion list" <gpfsug-discuss at gpfsug.org>
Cc:	Achim Christ <achim.christ at de.ibm.com>, Petra Christ
            <PCHRIST at de.ibm.com>
Date:	04/11/2017 04:25 PM
Subject:	[gpfsug-discuss] Does SVC / Spectrum Virtualize support IBM
            Spectrum Scale with SCSI-3 Persistent Reservations?
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi folks,

there is a list of storage devices that support SCSI-3 PR in the GPFS FAQ
Doc (see Answer 4.5).

https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html#scsi3


Since this list contains IBM V-model storage subsystems that include
Storage Virtualization - I was wondering if SVC / Spectrum Virtualize
can also support SCSI-3 PR (although not explicitly on the list)?

Any hints and help is warmla welcome - thank you in advance.


Mit freundlichen Gr??en / Sincerely

Christoph Krafft

Client Technical Specialist - Power Systems, IBM Systems
Certified IT Specialist @ The Open Group
                                                                                                              
                                                                                                              
 Phone:            +49 (0) 7034 643 2171                    IBM Deutschland GmbH                              
                                                                                                              
 Mobile:           +49 (0) 160 97 81 86 12                  Am Weiher 24                                      
                                                                                                              
 Email:            ckrafft at de.ibm.com                       65451 Kelsterbach                                 
                                                                                                              
                                                            Germany                                           
                                                                                                              
                                                                                                              
 IBM Deutschland                                                                                              
 GmbH /                                                                                                       
 Vorsitzender des                                                                                             
 Aufsichtsrats:                                                                                               
 Martin Jetter                                                                                                
 Gesch?ftsf?hrung:                                                                                            
 Martina Koederitz                                                                                            
 (Vorsitzende),                                                                                               
 Nicole Reimer,                                                                                               
 Norbert Janzen,                                                                                              
 Dr. Christian                                                                                                
 Keller, Ivo                                                                                                  
 Koerner, Stefan                                                                                              
 Lutz                                                                                                         
 Sitz der                                                                                                     
 Gesellschaft:                                                                                                
 Ehningen /                                                                                                   
 Registergericht:                                                                                             
 Amtsgericht                                                                                                  
 Stuttgart, HRB                                                                                               
 14562 /                                                                                                      
 WEEE-Reg.-Nr. DE                                                                                             
 99369940                                                                                                     
                                                                                                              

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170414/73767f0a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170414/73767f0a/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A696179.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170414/73767f0a/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A223532.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170414/73767f0a/attachment-0005.gif>

From Kevin.Buterbaugh at Vanderbilt.Edu  Sun Apr 16 14:47:20 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Sun, 16 Apr 2017 13:47:20 +0000
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it should
	have - why not?
Message-ID: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>

Hi All,

First off, I can open a PMR for this if I need to.  Second, I am far from an mmapplypolicy guru.  With that out of the way ? I have an mmapplypolicy job that didn?t migrate anywhere close to what it could / should have.  From the log file I have it create, here is the part where it shows the policies I told it to invoke:

[I] Qos 'maintenance' configured as inf
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% (no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.
[I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
Parsed 2 policy rules.

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)

And then the log shows it scanning all the directories and then says, "OK, here?s what I?m going to do":

[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen          KB_Ill     Rule
     0      5255960     237675081344        1868858     67355430720               0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)
     1          611       236745504             611       236745504               0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 414911602.

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%
system                                0               0      0.000000000% (no user data)

Notice that it says it?s only going to migrate less than 2 million of the 5.25 million candidate files!!  And sure enough, that?s all it did:

[I] A total of 1869469 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.

And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere near 98% full:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.54T ( 51%)        63.93G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.54T ( 51%)        64.39G ( 0%)
                -------------                         -------------------- -------------------
(pool total)           116.4T                                59.08T ( 51%)        128.3G ( 0%)

I don?t understand why it only migrated a small subset of what it could / should have?

We are doing a migration from one filesystem (gpfs21) to gpfs23 and I really need to stuff my gpfs23capacity pool as full of data as I can to keep the migration going.  Any ideas anyone?  Thanks in advance?

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170416/f29ee74a/attachment.htm>

From Robert.Oesterlin at nuance.com  Sun Apr 16 17:20:15 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Sun, 16 Apr 2017 16:20:15 +0000
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should have - why not?
Message-ID: <252ABBB2-7E94-41F6-AD76-B6D836E5C916@nuance.com>

I think the first thing I would do is turn up the ?-L? level to a large value (like ?6?) and see what it tells you about files that are being chosen and which ones aren?t being migrated and why. You could run it in test mode, write the output to a file and see what it says.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Sunday, April 16, 2017 at 8:47 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] mmapplypolicy didn't migrate everything it should have - why not?

First off, I can open a PMR for this if I need to.  Second, I am far from an mmapplypolicy guru.  With that out of the way ? I have an mmapplypolicy job that didn?t migrate anywhere close to what it could / should have.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170416/d3ffeeb1/attachment.htm>

From makaplan at us.ibm.com  Sun Apr 16 20:15:40 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Sun, 16 Apr 2017 15:15:40 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
Message-ID: <OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% 
(no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.

Your rule says you want to migrate data to gpfs23capacity, up to 98% full:

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98) WHERE ...

We scan your files and find and reckon...
[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen  
KB_Ill     Rule
     0      5255960     237675081344        1868858     67355430720  0 
RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' 
LIMIT(98.000000) WHERE(.)

So yes, 5.25Million files match the rule, but the utility chooses 
1.868Million files that add up to 67,355GB and figures that if it migrates 
those to gpfs23capacity,
(and also figuring the other migrations  by your second rule)then gpfs23 
will end up  97.9999% full.
We show you that with our "predictions" message.

Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%

So that's why it chooses to migrate "only" 67GB....

See? Makes sense to me.

Questions:
Did you run with -I yes or -I defer ?

Were some of the files illreplicated or illplaced?

Did you give the cluster-wide space reckoning protocols time to see the 
changes?  mmdf is usually "behind" by some non-neglible amount of time.

What else is going on?
If  you're moving  or deleting or creating data by other means while 
mmapplypolicy is running -- it doesn't "know" about that! 

Run it again!


From:   "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/16/2017 09:47 AM
Subject:        [gpfsug-discuss] mmapplypolicy didn't migrate everything 
it should       have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi All, 

First off, I can open a PMR for this if I need to.  Second, I am far from 
an mmapplypolicy guru.  With that out of the way ? I have an mmapplypolicy 
job that didn?t migrate anywhere close to what it could / should have. 
From the log file I have it create, here is the part where it shows the 
policies I told it to invoke:

[I] Qos 'maintenance' configured as inf
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% 
(no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.
[I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
Parsed 2 policy rules.

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND 
(KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)

And then the log shows it scanning all the directories and then says, "OK, 
here?s what I?m going to do":

[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen  
KB_Ill     Rule
     0      5255960     237675081344        1868858     67355430720  0 
RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' 
LIMIT(98.000000) WHERE(.)
     1          611       236745504             611       236745504  0 
RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO POOL 
'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 414911602.

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%
system                                0               0      0.000000000% 
(no user data)

Notice that it says it?s only going to migrate less than 2 million of the 
5.25 million candidate files!!  And sure enough, that?s all it did:

[I] A total of 1869469 files have been migrated, deleted or processed by 
an EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.

And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere near 
98% full:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 
TB)
eon35Ansd               58.2T       35 No       Yes          29.54T ( 51%) 
       63.93G ( 0%) 
eon35Dnsd               58.2T       35 No       Yes          29.54T ( 51%) 
       64.39G ( 0%) 
                -------------                         -------------------- 
-------------------
(pool total)           116.4T                                59.08T ( 51%) 
       128.3G ( 0%)

I don?t understand why it only migrated a small subset of what it could / 
should have?

We are doing a migration from one filesystem (gpfs21) to gpfs23 and I 
really need to stuff my gpfs23capacity pool as full of data as I can to 
keep the migration going.  Any ideas anyone?  Thanks in advance?

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and 
Education
Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170416/a47dfeb8/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 21994 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170416/a47dfeb8/attachment.gif>

From makaplan at us.ibm.com  Sun Apr 16 20:39:21 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Sun, 16 Apr 2017 15:39:21 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
	<OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
Message-ID: <OFE7FC31F4.DD21EE68-ON85258104.006BF001-85258104.006BF983@notes.na.collabserv.com>

Correction:
So that's why it chooses to migrate "only" 67TB....  (67000 GB)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170416/92659e8d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 21994 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170416/92659e8d/attachment.gif>

From Kevin.Buterbaugh at Vanderbilt.Edu  Mon Apr 17 16:24:02 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Mon, 17 Apr 2017 15:24:02 +0000
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
	<OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
Message-ID: <C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>

Hi Marc,

I do understand what you?re saying about mmapplypolicy deciding it only needed to move ~1.8 million files to fill the capacity pool to ~98% full.  However, it is now more than 24 hours since the mmapplypolicy finished ?successfully? and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.66T ( 51%)        64.16G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.66T ( 51%)        64.61G ( 0%)
                -------------                         -------------------- -------------------
(pool total)           116.4T                                59.33T ( 51%)        128.8G ( 0%)

And yes, I did run the mmapplypolicy with ?-I yes? ? here?s the partially redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N some,list,of,NSD,server,nodes

And here?s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration in one direction at a time ? i.e. I used to have those two rules in two separate files and would run an mmapplypolicy using the OldStuff rule the 1st weekend of the month and run the other rule the other weekends of the month.  This is the 1st weekend that I attempted to run an mmapplypolicy that did both at the same time.  Did I mess something up with that?

I have not run it again yet because we also run migrations on the other filesystem that we are still in the process of migrating off of.  So gpfs23 goes 1st and as soon as it?s done the other filesystem migration kicks off.  I don?t like to run two migrations simultaneously if at all possible.  The 2nd migration ran until this morning, when it was unfortunately terminated by a network switch crash that has also had me tied up all morning until now.  :-(

And yes, there is something else going on ? well, was going on - the network switch crash killed this too ? I have been running an rsync on one particular ~80TB directory tree from the old filesystem to gpfs23.  I understand that the migration wouldn?t know about those files and that?s fine ? I just don?t understand why mmapplypolicy said it was going to fill the capacity pool to 98% but didn?t do it ? wait, mmapplypolicy hasn?t gone into politics, has it?!?  ;-)

Thanks - and again, if I should open a PMR for this please let me know...

Kevin

On Apr 16, 2017, at 2:15 PM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% (no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.

Your rule says you want to migrate data to gpfs23capacity, up to 98% full:

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98) WHERE ...

We scan your files and find and reckon...
[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen          KB_Ill     Rule
     0      5255960     237675081344        1868858     67355430720               0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)

So yes, 5.25Million files match the rule, but the utility chooses 1.868Million files that add up to 67,355GB and figures that if it migrates those to gpfs23capacity,
(and also figuring the other migrations  by your second rule)then gpfs23 will end up  97.9999% full.
We show you that with our "predictions" message.

Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%

So that's why it chooses to migrate "only" 67GB....

See? Makes sense to me.

Questions:
Did you run with -I yes or -I defer ?

Were some of the files illreplicated or illplaced?

Did you give the cluster-wide space reckoning protocols time to see the changes?  mmdf is usually "behind" by some non-neglible amount of time.

What else is going on?
If  you're moving  or deleting or creating data by other means while mmapplypolicy is running -- it doesn't "know" about that!

Run it again!

<ATT00001.gif>


From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/16/2017 09:47 AM
Subject:        [gpfsug-discuss] mmapplypolicy didn't migrate everything it should        have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hi All,

First off, I can open a PMR for this if I need to.  Second, I am far from an mmapplypolicy guru.  With that out of the way ? I have an mmapplypolicy job that didn?t migrate anywhere close to what it could / should have.  From the log file I have it create, here is the part where it shows the policies I told it to invoke:

[I] Qos 'maintenance' configured as inf
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% (no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.
[I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
Parsed 2 policy rules.

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)

And then the log shows it scanning all the directories and then says, "OK, here?s what I?m going to do":

[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen          KB_Ill     Rule
     0      5255960     237675081344        1868858     67355430720               0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)
     1          611       236745504             611       236745504               0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 414911602.

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%
system                                0               0      0.000000000% (no user data)

Notice that it says it?s only going to migrate less than 2 million of the 5.25 million candidate files!!  And sure enough, that?s all it did:

[I] A total of 1869469 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.

And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere near 98% full:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.54T ( 51%)        63.93G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.54T ( 51%)        64.39G ( 0%)
                -------------                         -------------------- -------------------
(pool total)           116.4T                                59.08T ( 51%)        128.3G ( 0%)

I don?t understand why it only migrated a small subset of what it could / should have?

We are doing a migration from one filesystem (gpfs21) to gpfs23 and I really need to stuff my gpfs23capacity pool as full of data as I can to keep the migration going.  Any ideas anyone?  Thanks in advance?

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170417/35bba40b/attachment.htm>

From chekh at stanford.edu  Mon Apr 17 19:49:12 2017
From: chekh at stanford.edu (Alex Chekholko)
Date: Mon, 17 Apr 2017 11:49:12 -0700
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should have - why not?
In-Reply-To: <C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
	<OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
	<C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
Message-ID: <09e154ef-15ed-3217-db65-51e693e28faa@stanford.edu>

Hi Kevin,

IMHO, safe to just run it again.

You can also run it with '-I test -L 6' again and look through the 
output.  But I don't think you can "break" anything by having it scan 
and/or move data.

Can you post the full command line that you use to run it?

The behavior you describe is odd; you say it prints out the "files 
migrated successfully" message, but the files didn't actually get 
migrated?  Turn up the debug param and have it print every file as it is 
moving it or something.

Regards,
Alex

On 4/17/17 8:24 AM, Buterbaugh, Kevin L wrote:
> Hi Marc,
>
> I do understand what you?re saying about mmapplypolicy deciding it only
> needed to move ~1.8 million files to fill the capacity pool to ~98%
> full.  However, it is now more than 24 hours since the mmapplypolicy
> finished ?successfully? and:
>
> Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
> eon35Ansd               58.2T       35 No       Yes          29.66T (
> 51%)        64.16G ( 0%)
> eon35Dnsd               58.2T       35 No       Yes          29.66T (
> 51%)        64.61G ( 0%)
>                 -------------
> -------------------- -------------------
> (pool total)           116.4T                                59.33T (
> 51%)        128.8G ( 0%)
>
> And yes, I did run the mmapplypolicy with ?-I yes? ? here?s the
> partially redacted command line:
>
> /usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on
> another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy
> -N some,list,of,NSD,server,nodes
>
> And here?s that policy file:
>
> define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
> define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))
>
> RULE 'OldStuff'
>   MIGRATE FROM POOL 'gpfs23data'
>   TO POOL 'gpfs23capacity'
>   LIMIT(98)
>   WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))
>
> RULE 'INeedThatAfterAll'
>   MIGRATE FROM POOL 'gpfs23capacity'
>   TO POOL 'gpfs23data'
>   LIMIT(75)
>   WHERE (access_age < 14)
>
> The one thing that has changed is that formerly I only ran the migration
> in one direction at a time ? i.e. I used to have those two rules in two
> separate files and would run an mmapplypolicy using the OldStuff rule
> the 1st weekend of the month and run the other rule the other weekends
> of the month.  This is the 1st weekend that I attempted to run an
> mmapplypolicy that did both at the same time.  Did I mess something up
> with that?
>
> I have not run it again yet because we also run migrations on the other
> filesystem that we are still in the process of migrating off of.  So
> gpfs23 goes 1st and as soon as it?s done the other filesystem migration
> kicks off.  I don?t like to run two migrations simultaneously if at all
> possible.  The 2nd migration ran until this morning, when it was
> unfortunately terminated by a network switch crash that has also had me
> tied up all morning until now.  :-(
>
> And yes, there is something else going on ? well, was going on - the
> network switch crash killed this too ? I have been running an rsync on
> one particular ~80TB directory tree from the old filesystem to gpfs23.
>  I understand that the migration wouldn?t know about those files and
> that?s fine ? I just don?t understand why mmapplypolicy said it was
> going to fill the capacity pool to 98% but didn?t do it ? wait,
> mmapplypolicy hasn?t gone into politics, has it?!?  ;-)
>
> Thanks - and again, if I should open a PMR for this please let me know...
>
> Kevin
>
>> On Apr 16, 2017, at 2:15 PM, Marc A Kaplan <makaplan at us.ibm.com
>> <mailto:makaplan at us.ibm.com>> wrote:
>>
>> Let's look at how mmapplypolicy does the reckoning.
>> Before it starts, it see your pools as:
>>
>> [I] GPFS Current Data Pool Utilization in KB and %
>> Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
>> gpfs23capacity              55365193728    124983549952     44.297984614%
>> gpfs23data                 166747037696    343753326592     48.507759721%
>> system                                0               0
>>  0.000000000% (no user data)
>> [I] 75142046 of 209715200 inodes used: 35.830520%.
>>
>> Your rule says you want to migrate data to gpfs23capacity, up to 98% full:
>>
>> RULE 'OldStuff'
>>   MIGRATE FROM POOL 'gpfs23data'
>>   TO POOL 'gpfs23capacity'
>>   LIMIT(98) WHERE ...
>>
>> We scan your files and find and reckon...
>> [I] Summary of Rule Applicability and File Choices:
>>  Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen
>>        KB_Ill     Rule
>>      0      5255960     237675081344        1868858     67355430720
>>             0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO
>> POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)
>>
>> So yes, 5.25Million files match the rule, but the utility chooses
>> 1.868Million files that add up to 67,355GB and figures that if it
>> migrates those to gpfs23capacity,
>> (and also figuring the other migrations  by your second rule)then
>> gpfs23 will end up  97.9999% full.
>> We show you that with our "predictions" message.
>>
>> Predicted Data Pool Utilization in KB and %:
>> Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
>> gpfs23capacity             122483878944    124983549952     97.999999993%
>> gpfs23data                 104742360032    343753326592     30.470209865%
>>
>> So that's why it chooses to migrate "only" 67GB....
>>
>> See? Makes sense to me.
>>
>> Questions:
>> Did you run with -I yes or -I defer ?
>>
>> Were some of the files illreplicated or illplaced?
>>
>> Did you give the cluster-wide space reckoning protocols time to see
>> the changes?  mmdf is usually "behind" by some non-neglible amount of
>> time.
>>
>> What else is going on?
>> If  you're moving  or deleting or creating data by other means while
>> mmapplypolicy is running -- it doesn't "know" about that!
>>
>> Run it again!
>>
>> <ATT00001.gif>
>>
>>
>>
>> From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu
>> <mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
>> To:        gpfsug main discussion list
>> <gpfsug-discuss at spectrumscale.org
>> <mailto:gpfsug-discuss at spectrumscale.org>>
>> Date:        04/16/2017 09:47 AM
>> Subject:        [gpfsug-discuss] mmapplypolicy didn't migrate
>> everything it should        have - why not?
>> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
>> <mailto:gpfsug-discuss-bounces at spectrumscale.org>
>> ------------------------------------------------------------------------
>>
>>
>>
>> Hi All,
>>
>> First off, I can open a PMR for this if I need to.  Second, I am far
>> from an mmapplypolicy guru.  With that out of the way ? I have an
>> mmapplypolicy job that didn?t migrate anywhere close to what it could
>> / should have.  From the log file I have it create, here is the part
>> where it shows the policies I told it to invoke:
>>
>> [I] Qos 'maintenance' configured as inf
>> [I] GPFS Current Data Pool Utilization in KB and %
>> Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
>> gpfs23capacity              55365193728    124983549952     44.297984614%
>> gpfs23data                 166747037696    343753326592     48.507759721%
>> system                                0               0
>>  0.000000000% (no user data)
>> [I] 75142046 of 209715200 inodes used: 35.830520%.
>> [I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
>> Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
>> Parsed 2 policy rules.
>>
>> RULE 'OldStuff'
>>   MIGRATE FROM POOL 'gpfs23data'
>>   TO POOL 'gpfs23capacity'
>>   LIMIT(98)
>>   WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND
>> (KB_ALLOCATED > 3584))
>>
>> RULE 'INeedThatAfterAll'
>>   MIGRATE FROM POOL 'gpfs23capacity'
>>   TO POOL 'gpfs23data'
>>   LIMIT(75)
>>   WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)
>>
>> And then the log shows it scanning all the directories and then says,
>> "OK, here?s what I?m going to do":
>>
>> [I] Summary of Rule Applicability and File Choices:
>>  Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen
>>        KB_Ill     Rule
>>      0      5255960     237675081344        1868858     67355430720
>>             0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO
>> POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)
>>      1          611       236745504             611       236745504
>>             0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL
>> 'gpfs23capacity' TO POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)
>>
>> [I] Filesystem objects with no applicable rules: 414911602.
>>
>> [I] GPFS Policy Decisions and File Choice Totals:
>>  Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
>> Predicted Data Pool Utilization in KB and %:
>> Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
>> gpfs23capacity             122483878944    124983549952     97.999999993%
>> gpfs23data                 104742360032    343753326592     30.470209865%
>> system                                0               0
>>  0.000000000% (no user data)
>>
>> Notice that it says it?s only going to migrate less than 2 million of
>> the 5.25 million candidate files!!  And sure enough, that?s all it did:
>>
>> [I] A total of 1869469 files have been migrated, deleted or processed
>> by an EXTERNAL EXEC/script;
>>         0 'skipped' files and/or errors.
>>
>> And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere
>> near 98% full:
>>
>> Disks in storage pool: gpfs23capacity (Maximum disk size allowed is
>> 519 TB)
>> eon35Ansd               58.2T       35 No       Yes          29.54T (
>> 51%)        63.93G ( 0%)
>> eon35Dnsd               58.2T       35 No       Yes          29.54T (
>> 51%)        64.39G ( 0%)
>>                 -------------
>> -------------------- -------------------
>> (pool total)           116.4T                                59.08T (
>> 51%)        128.3G ( 0%)
>>
>> I don?t understand why it only migrated a small subset of what it
>> could / should have?
>>
>> We are doing a migration from one filesystem (gpfs21) to gpfs23 and I
>> really need to stuff my gpfs23capacity pool as full of data as I can
>> to keep the migration going.  Any ideas anyone?  Thanks in advance?
>>
>> ?
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and
>> Education
>> _Kevin.Buterbaugh at vanderbilt.edu_
>> <mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633


From makaplan at us.ibm.com  Mon Apr 17 21:11:18 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Mon, 17 Apr 2017 16:11:18 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu><OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
	<C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
Message-ID: <OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com>

Kevin,

1. Running with both fairly simple rules so that you migrate "in both 
directions" is fine.  It was designed to do that!

2. Glad you understand the logic of "rules hit" vs "files chosen". 

3. To begin to understand "what the hxxx is going on" (as our fearless 
leader liked to say before he was in charge ;-) ) I suggest:

(a) Run mmapplypolicy on directory of just a few files  `mmapplypolicy 
/gpfs23/test-directory -I test ...` and check that the 
[I] ... Current data pool utilization 
message is consistent with the output of `mmdf gpfs23`. 

They should be, but if they're not, that's a weird problem right there 
since they're supposed to be looking at the same metadata!

You can do this anytime, should complete almost instantly...

(b) When time and resources permit, re-run mmapplypolicy on the full FS 
with your desired migration policy.
Again, do the "Current", "Chosen" and "Predicted" messages make sense, and 
"add up"?
Do the file counts seem reasonable, considering that you recently did 
migrations/deletions that should have changed the counts compared to 
previous runs
of mmapplypolicy?  If you just want to look and not actually change 
anything, use `-I test` which will skip the migration steps.  If you want 
to see the list of files chosen

(c) If you continue to see significant discrepancies between mmapplypolicy 
and mmdf, let us know.

(d) Also at some point you may consider running mmrestripefs with options 
to make sure every file has its data blocks where they are supposed to be 
and is replicated
as you have specified.

Let's see where those steps take us...

-- marc of Spectrum Scale (n? GPFS)


From:   "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/17/2017 11:25 AM
Subject:        Re: [gpfsug-discuss] mmapplypolicy didn't migrate 
everything it should    have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Marc, 

I do understand what you?re saying about mmapplypolicy deciding it only 
needed to move ~1.8 million files to fill the capacity pool to ~98% full. 
However, it is now more than 24 hours since the mmapplypolicy finished 
?successfully? and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 
TB)
eon35Ansd               58.2T       35 No       Yes          29.66T ( 51%) 
       64.16G ( 0%) 
eon35Dnsd               58.2T       35 No       Yes          29.66T ( 51%) 
       64.61G ( 0%) 
                -------------                         -------------------- 
-------------------
(pool total)           116.4T                                59.33T ( 51%) 
       128.8G ( 0%)

And yes, I did run the mmapplypolicy with ?-I yes? ? here?s the partially 
redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on 
another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N 
some,list,of,NSD,server,nodes

And here?s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration 
in one direction at a time ? i.e. I used to have those two rules in two 
separate files and would run an mmapplypolicy using the OldStuff rule the 
1st weekend of the month and run the other rule the other weekends of the 
month.  This is the 1st weekend that I attempted to run an mmapplypolicy 
that did both at the same time.  Did I mess something up with that?

I have not run it again yet because we also run migrations on the other 
filesystem that we are still in the process of migrating off of.  So 
gpfs23 goes 1st and as soon as it?s done the other filesystem migration 
kicks off.  I don?t like to run two migrations simultaneously if at all 
possible.  The 2nd migration ran until this morning, when it was 
unfortunately terminated by a network switch crash that has also had me 
tied up all morning until now.  :-(

And yes, there is something else going on ? well, was going on - the 
network switch crash killed this too ? I have been running an rsync on one 
particular ~80TB directory tree from the old filesystem to gpfs23.  I 
understand that the migration wouldn?t know about those files and that?s 
fine ? I just don?t understand why mmapplypolicy said it was going to fill 
the capacity pool to 98% but didn?t do it ? wait, mmapplypolicy hasn?t 
gone into politics, has it?!?  ;-)

Thanks - and again, if I should open a PMR for this please let me know...

Kevin

On Apr 16, 2017, at 2:15 PM, Marc A Kaplan <makaplan at us.ibm.com> wrote:

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% 
(no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.

Your rule says you want to migrate data to gpfs23capacity, up to 98% full:

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98) WHERE ...

We scan your files and find and reckon...
[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen  
KB_Ill     Rule
     0      5255960     237675081344        1868858     67355430720  0 
RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' 
LIMIT(98.000000) WHERE(.)

So yes, 5.25Million files match the rule, but the utility chooses 
1.868Million files that add up to 67,355GB and figures that if it migrates 
those to gpfs23capacity,
(and also figuring the other migrations  by your second rule)then gpfs23 
will end up  97.9999% full.
We show you that with our "predictions" message.

Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%

So that's why it chooses to migrate "only" 67GB....

See? Makes sense to me.

Questions:
Did you run with -I yes or -I defer ?

Were some of the files illreplicated or illplaced?

Did you give the cluster-wide space reckoning protocols time to see the 
changes?  mmdf is usually "behind" by some non-neglible amount of time.

What else is going on?
If  you're moving  or deleting or creating data by other means while 
mmapplypolicy is running -- it doesn't "know" about that!  

Run it again!

<ATT00001.gif>


From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        04/16/2017 09:47 AM
Subject:        [gpfsug-discuss] mmapplypolicy didn't migrate everything 
it should        have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi All, 

First off, I can open a PMR for this if I need to.  Second, I am far from 
an mmapplypolicy guru.  With that out of the way ? I have an mmapplypolicy 
job that didn?t migrate anywhere close to what it could / should have. 
From the log file I have it create, here is the part where it shows the 
policies I told it to invoke:

[I] Qos 'maintenance' configured as inf
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% 
(no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.
[I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
Parsed 2 policy rules.

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND 
(KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)

And then the log shows it scanning all the directories and then says, "OK, 
here?s what I?m going to do":

[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen  
KB_Ill     Rule
     0      5255960     237675081344        1868858     67355430720  0 
RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' 
LIMIT(98.000000) WHERE(.)
     1          611       236745504             611       236745504  0 
RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO POOL 
'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 414911602.

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%
system                                0               0      0.000000000% 
(no user data)

Notice that it says it?s only going to migrate less than 2 million of the 
5.25 million candidate files!!  And sure enough, that?s all it did:

[I] A total of 1869469 files have been migrated, deleted or processed by 
an EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.

And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere near 
98% full:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 
TB)
eon35Ansd               58.2T       35 No       Yes          29.54T ( 51%) 
       63.93G ( 0%) 
eon35Dnsd               58.2T       35 No       Yes          29.54T ( 51%) 
       64.39G ( 0%) 
                -------------                         -------------------- 
-------------------
(pool total)           116.4T                                59.08T ( 51%) 
       128.3G ( 0%)

I don?t understand why it only migrated a small subset of what it could / 
should have?

We are doing a migration from one filesystem (gpfs21) to gpfs23 and I 
really need to stuff my gpfs23capacity pool as full of data as I can to 
keep the migration going.  Any ideas anyone?  Thanks in advance?

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and 
Education
Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170417/6a2e5cd0/attachment.htm>

From makaplan at us.ibm.com  Mon Apr 17 21:18:42 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Mon, 17 Apr 2017 16:18:42 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE5A2@LocalDomain>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu><OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
	<C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
	<OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE5A2@LocalDomain>
Message-ID: <OF6D9C0AB8.2711A3ED-ON85258105.006F0590-85258105.006F9397@notes.na.collabserv.com>

Oops...  If you want to see the list of what would be migrated  '-I test 
-L 2'   If you want to migrate and see each file migrated '-I yes -L 2'

I don't recommend -L 4 or higher, unless you want to see the files that do 
not match your rules.
-L 3 will show you all the files that match the rules, including those 
that are NOT chosen for migration.  See the command gu


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170417/2751c21e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 21994 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170417/2751c21e/attachment.gif>

From Kevin.Buterbaugh at Vanderbilt.Edu  Mon Apr 17 22:16:57 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Mon, 17 Apr 2017 21:16:57 +0000
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
	<OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
	<C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
	<OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com>
Message-ID: <90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu>

Hi Marc, Alex, all,

Thank you for the responses.  To answer Alex?s questions first ? the full command line I used (except for some stuff I?m redacting but you don?t need the exact details anyway) was:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N some,list,of,NSD,server,nodes

And yes, it printed out the very normal, ?Hey, I migrated all 1.8 million files I said I would successfully, so I?m done here? message:

[I] A total of 1869469 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
       0 'skipped' files and/or errors.

Marc - I ran what you suggest in your response below - section 3a.  The output of a ?test? mmapplypolicy and mmdf was very consistent.  Therefore, I?m moving on to 3b and running against the full filesystem again ? the only difference between the command line above and what I?m doing now is that I?m running with ?-L 2? this time around.  I?m not fond of doing this during the week but I need to figure out what?s going on and I *really* need to get some stuff moved from my ?data? pool to my ?capacity? pool.

I will respond back on the list again where there?s something to report.  Thanks again, all?

Kevin

On Apr 17, 2017, at 3:11 PM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

Kevin,

1. Running with both fairly simple rules so that you migrate "in both directions" is fine.  It was designed to do that!

2. Glad you understand the logic of "rules hit" vs "files chosen".

3. To begin to understand "what the hxxx is going on" (as our fearless leader liked to say before he was in charge ;-) ) I suggest:

(a) Run mmapplypolicy on directory of just a few files  `mmapplypolicy /gpfs23/test-directory -I test ...` and check that the
[I] ... Current data pool utilization
message is consistent with the output of `mmdf gpfs23`.

They should be, but if they're not, that's a weird problem right there since they're supposed to be looking at the same metadata!

You can do this anytime, should complete almost instantly...

(b) When time and resources permit, re-run mmapplypolicy on the full FS with your desired migration policy.
Again, do the "Current", "Chosen" and "Predicted" messages make sense, and "add up"?
Do the file counts seem reasonable, considering that you recently did migrations/deletions that should have changed the counts compared to previous runs
of mmapplypolicy?  If you just want to look and not actually change anything, use `-I test` which will skip the migration steps.  If you want to see the list of files chosen

(c) If you continue to see significant discrepancies between mmapplypolicy and mmdf, let us know.

(d) Also at some point you may consider running mmrestripefs with options to make sure every file has its data blocks where they are supposed to be and is replicated
as you have specified.

Let's see where those steps take us...

-- marc of Spectrum Scale (n? GPFS)


From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/17/2017 11:25 AM
Subject:        Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it should        have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hi Marc,

I do understand what you?re saying about mmapplypolicy deciding it only needed to move ~1.8 million files to fill the capacity pool to ~98% full.  However, it is now more than 24 hours since the mmapplypolicy finished ?successfully? and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.66T ( 51%)        64.16G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.66T ( 51%)        64.61G ( 0%)
                -------------                         -------------------- -------------------
(pool total)           116.4T                                59.33T ( 51%)        128.8G ( 0%)

And yes, I did run the mmapplypolicy with ?-I yes? ? here?s the partially redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N some,list,of,NSD,server,nodes

And here?s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration in one direction at a time ? i.e. I used to have those two rules in two separate files and would run an mmapplypolicy using the OldStuff rule the 1st weekend of the month and run the other rule the other weekends of the month.  This is the 1st weekend that I attempted to run an mmapplypolicy that did both at the same time.  Did I mess something up with that?

I have not run it again yet because we also run migrations on the other filesystem that we are still in the process of migrating off of.  So gpfs23 goes 1st and as soon as it?s done the other filesystem migration kicks off.  I don?t like to run two migrations simultaneously if at all possible.  The 2nd migration ran until this morning, when it was unfortunately terminated by a network switch crash that has also had me tied up all morning until now.  :-(

And yes, there is something else going on ? well, was going on - the network switch crash killed this too ? I have been running an rsync on one particular ~80TB directory tree from the old filesystem to gpfs23.  I understand that the migration wouldn?t know about those files and that?s fine ? I just don?t understand why mmapplypolicy said it was going to fill the capacity pool to 98% but didn?t do it ? wait, mmapplypolicy hasn?t gone into politics, has it?!?  ;-)

Thanks - and again, if I should open a PMR for this please let me know...

Kevin

On Apr 16, 2017, at 2:15 PM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% (no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.

Your rule says you want to migrate data to gpfs23capacity, up to 98% full:

RULE 'OldStuff'
 MIGRATE FROM POOL 'gpfs23data'
 TO POOL 'gpfs23capacity'
 LIMIT(98) WHERE ...

We scan your files and find and reckon...
[I] Summary of Rule Applicability and File Choices:
Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen          KB_Ill     Rule
    0      5255960     237675081344        1868858     67355430720               0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)

So yes, 5.25Million files match the rule, but the utility chooses 1.868Million files that add up to 67,355GB and figures that if it migrates those to gpfs23capacity,
(and also figuring the other migrations  by your second rule)then gpfs23 will end up  97.9999% full.
We show you that with our "predictions" message.

Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%

So that's why it chooses to migrate "only" 67GB....

See? Makes sense to me.

Questions:
Did you run with -I yes or -I defer ?

Were some of the files illreplicated or illplaced?

Did you give the cluster-wide space reckoning protocols time to see the changes?  mmdf is usually "behind" by some non-neglible amount of time.

What else is going on?
If  you're moving  or deleting or creating data by other means while mmapplypolicy is running -- it doesn't "know" about that!

Run it again!

<ATT00001.gif>


From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/16/2017 09:47 AM
Subject:        [gpfsug-discuss] mmapplypolicy didn't migrate everything it should        have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hi All,

First off, I can open a PMR for this if I need to.  Second, I am far from an mmapplypolicy guru.  With that out of the way ? I have an mmapplypolicy job that didn?t migrate anywhere close to what it could / should have.  From the log file I have it create, here is the part where it shows the policies I told it to invoke:

[I] Qos 'maintenance' configured as inf
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% (no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.
[I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
Parsed 2 policy rules.

RULE 'OldStuff'
 MIGRATE FROM POOL 'gpfs23data'
 TO POOL 'gpfs23capacity'
 LIMIT(98)
 WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
 MIGRATE FROM POOL 'gpfs23capacity'
 TO POOL 'gpfs23data'
 LIMIT(75)
 WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)

And then the log shows it scanning all the directories and then says, "OK, here?s what I?m going to do":

[I] Summary of Rule Applicability and File Choices:
Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen          KB_Ill     Rule
    0      5255960     237675081344        1868858     67355430720               0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)
    1          611       236745504             611       236745504               0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 414911602.

[I] GPFS Policy Decisions and File Choice Totals:
Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%
system                                0               0      0.000000000% (no user data)

Notice that it says it?s only going to migrate less than 2 million of the 5.25 million candidate files!!  And sure enough, that?s all it did:

[I] A total of 1869469 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
       0 'skipped' files and/or errors.

And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere near 98% full:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.54T ( 51%)        63.93G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.54T ( 51%)        64.39G ( 0%)
               -------------                         -------------------- -------------------
(pool total)           116.4T                                59.08T ( 51%)        128.3G ( 0%)

I don?t understand why it only migrated a small subset of what it could / should have?

We are doing a migration from one filesystem (gpfs21) to gpfs23 and I really need to stuff my gpfs23capacity pool as full of data as I can to keep the migration going.  Any ideas anyone?  Thanks in advance?

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170417/12964a34/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Tue Apr 18 14:31:20 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Tue, 18 Apr 2017 13:31:20 +0000
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
	<OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
	<C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
	<OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com>
	<90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu>
Message-ID: <764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu>

Hi All, but especially Marc,

I ran the mmapplypolicy again last night and, unfortunately, it again did not fill the capacity pool like it said it would.  From the log file:

[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen          KB_Ill     Rule
     0      3632859     181380873184        1620175     61434283936               0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)
     1           88        99230048              88        99230048               0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 442962867.

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to migrate 61533513984KB: 1620263 of 3632947 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878464    124983549952     97.999999609%
gpfs23data                 128885076416    343753326592     37.493477574%
system                                0               0      0.000000000% (no user data)
[I] 2017-04-18 at 02:52:48.402 Policy execution. 0 files dispatched.

And the tail end of the log file says that it moved those files:

[I] 2017-04-18 at 09:06:51.124 Policy execution. 1620263 files dispatched.
[I] A total of 1620263 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.

But mmdf (and how quickly the mmapplypolicy itself ran) say otherwise:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.73T ( 51%)        64.16G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.73T ( 51%)        64.61G ( 0%)
                -------------                         -------------------- -------------------
(pool total)           116.4T                                59.45T ( 51%)        128.8G ( 0%)

Ideas?  Or is it time for me to open a PMR?

Thanks?

Kevin

On Apr 17, 2017, at 4:16 PM, Buterbaugh, Kevin L <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>> wrote:

Hi Marc, Alex, all,

Thank you for the responses.  To answer Alex?s questions first ? the full command line I used (except for some stuff I?m redacting but you don?t need the exact details anyway) was:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N some,list,of,NSD,server,nodes

And yes, it printed out the very normal, ?Hey, I migrated all 1.8 million files I said I would successfully, so I?m done here? message:

[I] A total of 1869469 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
       0 'skipped' files and/or errors.

Marc - I ran what you suggest in your response below - section 3a.  The output of a ?test? mmapplypolicy and mmdf was very consistent.  Therefore, I?m moving on to 3b and running against the full filesystem again ? the only difference between the command line above and what I?m doing now is that I?m running with ?-L 2? this time around.  I?m not fond of doing this during the week but I need to figure out what?s going on and I *really* need to get some stuff moved from my ?data? pool to my ?capacity? pool.

I will respond back on the list again where there?s something to report.  Thanks again, all?

Kevin

On Apr 17, 2017, at 3:11 PM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

Kevin,

1. Running with both fairly simple rules so that you migrate "in both directions" is fine.  It was designed to do that!

2. Glad you understand the logic of "rules hit" vs "files chosen".

3. To begin to understand "what the hxxx is going on" (as our fearless leader liked to say before he was in charge ;-) ) I suggest:

(a) Run mmapplypolicy on directory of just a few files  `mmapplypolicy /gpfs23/test-directory -I test ...` and check that the
[I] ... Current data pool utilization
message is consistent with the output of `mmdf gpfs23`.

They should be, but if they're not, that's a weird problem right there since they're supposed to be looking at the same metadata!

You can do this anytime, should complete almost instantly...

(b) When time and resources permit, re-run mmapplypolicy on the full FS with your desired migration policy.
Again, do the "Current", "Chosen" and "Predicted" messages make sense, and "add up"?
Do the file counts seem reasonable, considering that you recently did migrations/deletions that should have changed the counts compared to previous runs
of mmapplypolicy?  If you just want to look and not actually change anything, use `-I test` which will skip the migration steps.  If you want to see the list of files chosen

(c) If you continue to see significant discrepancies between mmapplypolicy and mmdf, let us know.

(d) Also at some point you may consider running mmrestripefs with options to make sure every file has its data blocks where they are supposed to be and is replicated
as you have specified.

Let's see where those steps take us...

-- marc of Spectrum Scale (n? GPFS)


From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/17/2017 11:25 AM
Subject:        Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it should        have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hi Marc,

I do understand what you?re saying about mmapplypolicy deciding it only needed to move ~1.8 million files to fill the capacity pool to ~98% full.  However, it is now more than 24 hours since the mmapplypolicy finished ?successfully? and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.66T ( 51%)        64.16G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.66T ( 51%)        64.61G ( 0%)
                -------------                         -------------------- -------------------
(pool total)           116.4T                                59.33T ( 51%)        128.8G ( 0%)

And yes, I did run the mmapplypolicy with ?-I yes? ? here?s the partially redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N some,list,of,NSD,server,nodes

And here?s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration in one direction at a time ? i.e. I used to have those two rules in two separate files and would run an mmapplypolicy using the OldStuff rule the 1st weekend of the month and run the other rule the other weekends of the month.  This is the 1st weekend that I attempted to run an mmapplypolicy that did both at the same time.  Did I mess something up with that?

I have not run it again yet because we also run migrations on the other filesystem that we are still in the process of migrating off of.  So gpfs23 goes 1st and as soon as it?s done the other filesystem migration kicks off.  I don?t like to run two migrations simultaneously if at all possible.  The 2nd migration ran until this morning, when it was unfortunately terminated by a network switch crash that has also had me tied up all morning until now.  :-(

And yes, there is something else going on ? well, was going on - the network switch crash killed this too ? I have been running an rsync on one particular ~80TB directory tree from the old filesystem to gpfs23.  I understand that the migration wouldn?t know about those files and that?s fine ? I just don?t understand why mmapplypolicy said it was going to fill the capacity pool to 98% but didn?t do it ? wait, mmapplypolicy hasn?t gone into politics, has it?!?  ;-)

Thanks - and again, if I should open a PMR for this please let me know...

Kevin

On Apr 16, 2017, at 2:15 PM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% (no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.

Your rule says you want to migrate data to gpfs23capacity, up to 98% full:

RULE 'OldStuff'
 MIGRATE FROM POOL 'gpfs23data'
 TO POOL 'gpfs23capacity'
 LIMIT(98) WHERE ...

We scan your files and find and reckon...
[I] Summary of Rule Applicability and File Choices:
Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen          KB_Ill     Rule
    0      5255960     237675081344        1868858     67355430720               0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)

So yes, 5.25Million files match the rule, but the utility chooses 1.868Million files that add up to 67,355GB and figures that if it migrates those to gpfs23capacity,
(and also figuring the other migrations  by your second rule)then gpfs23 will end up  97.9999% full.
We show you that with our "predictions" message.

Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%

So that's why it chooses to migrate "only" 67GB....

See? Makes sense to me.

Questions:
Did you run with -I yes or -I defer ?

Were some of the files illreplicated or illplaced?

Did you give the cluster-wide space reckoning protocols time to see the changes?  mmdf is usually "behind" by some non-neglible amount of time.

What else is going on?
If  you're moving  or deleting or creating data by other means while mmapplypolicy is running -- it doesn't "know" about that!

Run it again!

<ATT00001.gif>


From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/16/2017 09:47 AM
Subject:        [gpfsug-discuss] mmapplypolicy didn't migrate everything it should        have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Hi All,

First off, I can open a PMR for this if I need to.  Second, I am far from an mmapplypolicy guru.  With that out of the way ? I have an mmapplypolicy job that didn?t migrate anywhere close to what it could / should have.  From the log file I have it create, here is the part where it shows the policies I told it to invoke:

[I] Qos 'maintenance' configured as inf
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000% (no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.
[I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
Parsed 2 policy rules.

RULE 'OldStuff'
 MIGRATE FROM POOL 'gpfs23data'
 TO POOL 'gpfs23capacity'
 LIMIT(98)
 WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
 MIGRATE FROM POOL 'gpfs23capacity'
 TO POOL 'gpfs23data'
 LIMIT(75)
 WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)

And then the log shows it scanning all the directories and then says, "OK, here?s what I?m going to do":

[I] Summary of Rule Applicability and File Choices:
Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen          KB_Ill     Rule
    0      5255960     237675081344        1868858     67355430720               0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)
    1          611       236745504             611       236745504               0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 414911602.

[I] GPFS Policy Decisions and File Choice Totals:
Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%
system                                0               0      0.000000000% (no user data)

Notice that it says it?s only going to migrate less than 2 million of the 5.25 million candidate files!!  And sure enough, that?s all it did:

[I] A total of 1869469 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
       0 'skipped' files and/or errors.

And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere near 98% full:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.54T ( 51%)        63.93G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.54T ( 51%)        64.39G ( 0%)
               -------------                         -------------------- -------------------
(pool total)           116.4T                                59.08T ( 51%)        128.3G ( 0%)

I don?t understand why it only migrated a small subset of what it could / should have?

We are doing a migration from one filesystem (gpfs21) to gpfs23 and I really need to stuff my gpfs23capacity pool as full of data as I can to keep the migration going.  Any ideas anyone?  Thanks in advance?

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170418/2cc5fca2/attachment.htm>

From zgiles at gmail.com  Tue Apr 18 14:56:43 2017
From: zgiles at gmail.com (Zachary Giles)
Date: Tue, 18 Apr 2017 09:56:43 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should have - why not?
In-Reply-To: <764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
	<OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
	<C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
	<OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com>
	<90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu>
	<764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu>
Message-ID: <CAMYZk=dRbVtCEsKWHxGegabATj-Z=kT8JdP60YZaGkC10ArVHw@mail.gmail.com>

Kevin,
Here's a silly theory: Have you tried putting a weight value in? I wonder
if during migration it hits some large file that would go over the
threshold and stops. With a weight flag you could move all small files in
first or by lack of heat etc to pack the tier more tightly.
Just something else to try before the PMR process.
Zach


On Apr 18, 2017 9:32 AM, "Buterbaugh, Kevin L" <
Kevin.Buterbaugh at vanderbilt.edu> wrote:

Hi All, but especially Marc,

I ran the mmapplypolicy again last night and, unfortunately, it again did
not fill the capacity pool like it said it would.  From the log file:

[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen
 KB_Ill     Rule
     0      3632859     181380873184        1620175     61434283936
      0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL
'gpfs23capacity' LIMIT(98.000000) WHERE(.)
     1           88        99230048              88        99230048
      0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO
POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 442962867.

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to migrate 61533513984KB: 1620263 of 3632947 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878464    124983549952     97.999999609%
gpfs23data                 128885076416    343753326592     37.493477574%
system                                0               0      0.000000000%
(no user data)
[I] 2017-04-18 at 02:52:48.402 Policy execution. 0 files dispatched.

And the tail end of the log file says that it moved those files:

[I] 2017-04-18 at 09:06:51.124 Policy execution. 1620263 files dispatched.
[I] A total of 1620263 files have been migrated, deleted or processed by an
EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.

But mmdf (and how quickly the mmapplypolicy itself ran) say otherwise:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.73T ( 51%)
       64.16G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.73T ( 51%)
       64.61G ( 0%)
                -------------                         --------------------
-------------------
(pool total)           116.4T                                59.45T ( 51%)
       128.8G ( 0%)

Ideas?  Or is it time for me to open a PMR?

Thanks?

Kevin

On Apr 17, 2017, at 4:16 PM, Buterbaugh, Kevin L <
Kevin.Buterbaugh at Vanderbilt.Edu> wrote:

Hi Marc, Alex, all,

Thank you for the responses.  To answer Alex?s questions first ? the full
command line I used (except for some stuff I?m redacting but you don?t need
the exact details anyway) was:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on
another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N
some,list,of,NSD,server,nodes


And yes, it printed out the very normal, ?Hey, I migrated all 1.8 million
files I said I would successfully, so I?m done here? message:

[I] A total of 1869469 files have been migrated, deleted or processed by an
EXTERNAL EXEC/script;
       0 'skipped' files and/or errors.


Marc - I ran what you suggest in your response below - section 3a.  The
output of a ?test? mmapplypolicy and mmdf was very consistent.  Therefore,
I?m moving on to 3b and running against the full filesystem again ? the
only difference between the command line above and what I?m doing now is
that I?m running with ?-L 2? this time around.  I?m not fond of doing this
during the week but I need to figure out what?s going on and I *really*
need to get some stuff moved from my ?data? pool to my ?capacity? pool.

I will respond back on the list again where there?s something to report.
Thanks again, all?

Kevin

On Apr 17, 2017, at 3:11 PM, Marc A Kaplan <makaplan at us.ibm.com> wrote:

Kevin,

1. Running with both fairly simple rules so that you migrate "in both
directions" is fine.  It was designed to do that!

2. Glad you understand the logic of "rules hit" vs "files chosen".

3. To begin to understand "what the hxxx is going on" (as our fearless
leader liked to say before he was in charge ;-) ) I suggest:

(a) Run mmapplypolicy on directory of just a few files  `mmapplypolicy
/gpfs23/test-directory -I test ...` and check that the
[I] ... Current data pool utilization
message is consistent with the output of `mmdf gpfs23`.

They should be, but if they're not, that's a weird problem right there
since they're supposed to be looking at the same metadata!

You can do this anytime, should complete almost instantly...

(b) When time and resources permit, re-run mmapplypolicy on the full FS
with your desired migration policy.
Again, do the "Current", "Chosen" and "Predicted" messages make sense, and
"add up"?
Do the file counts seem reasonable, considering that you recently did
migrations/deletions that should have changed the counts compared to
previous runs
of mmapplypolicy?  If you just want to look and not actually change
anything, use `-I test` which will skip the migration steps.  If you want
to see the list of files chosen

(c) If you continue to see significant discrepancies between mmapplypolicy
and mmdf, let us know.

(d) Also at some point you may consider running mmrestripefs with options
to make sure every file has its data blocks where they are supposed to be
and is replicated
as you have specified.

Let's see where those steps take us...

-- marc of Spectrum Scale (n? GPFS)


From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        04/17/2017 11:25 AM
Subject:        Re: [gpfsug-discuss] mmapplypolicy didn't migrate
everything it should        have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
------------------------------


Hi Marc,

I do understand what you?re saying about mmapplypolicy deciding it only
needed to move ~1.8 million files to fill the capacity pool to ~98% full.
However, it is now more than 24 hours since the mmapplypolicy finished
?successfully? and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.66T ( 51%)
       64.16G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.66T ( 51%)
       64.61G ( 0%)
                -------------                         --------------------
-------------------
(pool total)           116.4T                                59.33T ( 51%)
       128.8G ( 0%)

And yes, I did run the mmapplypolicy with ?-I yes? ? here?s the partially
redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on
another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N
some,list,of,NSD,server,nodes

And here?s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration in
one direction at a time ? i.e. I used to have those two rules in two
separate files and would run an mmapplypolicy using the OldStuff rule the
1st weekend of the month and run the other rule the other weekends of the
month.  This is the 1st weekend that I attempted to run an mmapplypolicy
that did both at the same time.  Did I mess something up with that?

I have not run it again yet because we also run migrations on the other
filesystem that we are still in the process of migrating off of.  So gpfs23
goes 1st and as soon as it?s done the other filesystem migration kicks
off.  I don?t like to run two migrations simultaneously if at all
possible.  The 2nd migration ran until this morning, when it was
unfortunately terminated by a network switch crash that has also had me
tied up all morning until now.  :-(

And yes, there is something else going on ? well, was going on - the
network switch crash killed this too ? I have been running an rsync on one
particular ~80TB directory tree from the old filesystem to gpfs23.  I
understand that the migration wouldn?t know about those files and that?s
fine ? I just don?t understand why mmapplypolicy said it was going to fill
the capacity pool to 98% but didn?t do it ? wait, mmapplypolicy hasn?t gone
into politics, has it?!?  ;-)

Thanks - and again, if I should open a PMR for this please let me know...

Kevin

On Apr 16, 2017, at 2:15 PM, Marc A Kaplan <*makaplan at us.ibm.com*
<makaplan at us.ibm.com>> wrote:

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000%
(no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.

Your rule says you want to migrate data to gpfs23capacity, up to 98% full:

RULE 'OldStuff'
 MIGRATE FROM POOL 'gpfs23data'
 TO POOL 'gpfs23capacity'
 LIMIT(98) WHERE ...

We scan your files and find and reckon...
[I] Summary of Rule Applicability and File Choices:
Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen
 KB_Ill     Rule
    0      5255960     237675081344        1868858     67355430720
      0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL
'gpfs23capacity' LIMIT(98.000000) WHERE(.)

So yes, 5.25Million files match the rule, but the utility chooses
1.868Million files that add up to 67,355GB and figures that if it migrates
those to gpfs23capacity,
(and also figuring the other migrations  by your second rule)then gpfs23
will end up  97.9999% full.
We show you that with our "predictions" message.

Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%

So that's why it chooses to migrate "only" 67GB....

See? Makes sense to me.

Questions:
Did you run with -I yes or -I defer ?

Were some of the files illreplicated or illplaced?

Did you give the cluster-wide space reckoning protocols time to see the
changes?  mmdf is usually "behind" by some non-neglible amount of time.

What else is going on?
If  you're moving  or deleting or creating data by other means while
mmapplypolicy is running -- it doesn't "know" about that!

Run it again!

<ATT00001.gif>


From:        "Buterbaugh, Kevin L" <*Kevin.Buterbaugh at Vanderbilt.Edu*
<Kevin.Buterbaugh at Vanderbilt.Edu>>
To:        gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org*
<gpfsug-discuss at spectrumscale.org>>
Date:        04/16/2017 09:47 AM
Subject:        [gpfsug-discuss] mmapplypolicy didn't migrate everything it
should        have - why not?
Sent by:        *gpfsug-discuss-bounces at spectrumscale.org*
<gpfsug-discuss-bounces at spectrumscale.org>
------------------------------


Hi All,

First off, I can open a PMR for this if I need to.  Second, I am far from
an mmapplypolicy guru.  With that out of the way ? I have an mmapplypolicy
job that didn?t migrate anywhere close to what it could / should have.
>From the log file I have it create, here is the part where it shows the
policies I told it to invoke:

[I] Qos 'maintenance' configured as inf
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000%
(no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.
[I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
Parsed 2 policy rules.

RULE 'OldStuff'
 MIGRATE FROM POOL 'gpfs23data'
 TO POOL 'gpfs23capacity'
 LIMIT(98)
 WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND
(KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
 MIGRATE FROM POOL 'gpfs23capacity'
 TO POOL 'gpfs23data'
 LIMIT(75)
 WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)

And then the log shows it scanning all the directories and then says, "OK,
here?s what I?m going to do":

[I] Summary of Rule Applicability and File Choices:
Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen
 KB_Ill     Rule
    0      5255960     237675081344        1868858     67355430720
      0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL
'gpfs23capacity' LIMIT(98.000000) WHERE(.)
    1          611       236745504             611       236745504
      0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO
POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 414911602.

[I] GPFS Policy Decisions and File Choice Totals:
Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%
system                                0               0      0.000000000%
(no user data)

Notice that it says it?s only going to migrate less than 2 million of the
5.25 million candidate files!!  And sure enough, that?s all it did:

[I] A total of 1869469 files have been migrated, deleted or processed by an
EXTERNAL EXEC/script;
       0 'skipped' files and/or errors.

And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere near
98% full:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.54T ( 51%)
       63.93G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.54T ( 51%)
       64.39G ( 0%)
               -------------                         --------------------
-------------------
(pool total)           116.4T                                59.08T ( 51%)
       128.3G ( 0%)

I don?t understand why it only migrated a small subset of what it could /
should have?

We are doing a migration from one filesystem (gpfs21) to gpfs23 and I
really need to stuff my gpfs23capacity pool as full of data as I can to
keep the migration going.  Any ideas anyone?  Thanks in advance?

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
*Kevin.Buterbaugh at vanderbilt.edu* <Kevin.Buterbaugh at vanderbilt.edu>-
(615)875-9633 <(615)%20875-9633>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org/>
*http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
<http://gpfsug.org/mailman/listinfo/gpfsug-discuss>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org/>
*http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
<http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170418/3d926f50/attachment.htm>

From makaplan at us.ibm.com  Tue Apr 18 16:11:19 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Tue, 18 Apr 2017 11:11:19 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu><OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com><C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu><OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com><90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu>
	<764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu>
Message-ID: <OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com>

ANYONE else reading this saga?  Who uses mmapplypolicy to migrate files 
within multi-TB file systems?  Problems? Or all working as expected?

------

Well, again mmapplypolicy "thinks" it has "chosen" 1.6 million files whose 
total size is 61 Terabytes and migrating those will bring the occupancy of 
gpfs23capacity pool to 98% and then we're done.

So now I'm wondering where this is going wrong.  Is there some bug in the 
reckoning inside of mmapplypolicy or somewhere else in GPFS?

Sure you can put in an PMR, and probably should.  I'm guessing whoever 
picks up the PMR will end up calling or emailing me ... but maybe she can 
do some of the clerical work for us... 

While we're waiting for that... Here's what I suggest next.

Add  a clause ...

SHOW(varchar(KB_ALLOCATED) || ' n=' || varchar(NLINK))

before the WHERE clause to each of your rules.

Re-run the command with options  '-I test -L 2'  and collect the output. 

We're not actually going to move any data, but we're going to look at the 
files and file sizes that are "chosen"...

You should see 1.6 million lines that look kind of like this:

/yy/dat/bigC     RULE 'msx' MIGRATE FROM POOL 'system' TO POOL 'xtra' 
WEIGHT(inf) SHOW( 1024 n=1)

Run a script over the output to add up all the SHOW() values in the lines 
that contain TO POOL 'gpfs23capacity' and verify that they do indeed
add up to 61TB...  (The show is in KB so the SHOW numbers should add up to 
61 billion).

That sanity checks the policy arithmetic.  Let's assume that's okay. 

Then the next question is whether the individual numbers are correct... 
Zach Giles made a suggestion... which I'll interpret as 
find some of the biggest of those files and check that they really are 
that big....

At this point, I really don't know, but I'm guessing there's some 
discrepances in the reported KB_ALLOCATED numbers for many of the files...
and/or they are "illplaced"  - the data blocks aren't all in the pool FROM 
POOL ...

HMMMM....  I just thought about this some more and added the NLINK 
statistic.  It would be unusual for this to be a big problem, but files 
that are hard linked are
not recognized by mmapplypolicy as sharing storage... 
This has not come to my attention as a significant problem -- does the 
file system in question have significant GBs of hard linked files?

The truth is that you're the first customer/user/admin in a long time to 
question/examine how mmapplypolicy does its space reckoning ... 
Optimistically that means it works fine for most customers... 

So sorry, something unusual about your installation or usage...


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170418/178ef478/attachment.htm>

From david_johnson at brown.edu  Tue Apr 18 16:31:12 2017
From: david_johnson at brown.edu (David D. Johnson)
Date: Tue, 18 Apr 2017 11:31:12 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
	should	have - why not?
In-Reply-To: <OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
	<OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
	<C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
	<OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com>
	<90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu>
	<764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu>
	<OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com>
Message-ID: <4EC20B6E-8172-492D-B2ED-017359A48D03@brown.edu>

I have an observation, which may merely serve to show my ignorance:
 Is it significant that the words "EXTERNAL EXEC/script? are seen below?  
If migrating between storage pools within the cluster, I would expect the PIT engine to do the migration.
When doing HSM (off cluster, tape libraries, etc) is where I would expect to need a script to actually do the work.

> [I] 2017-04-18 at 09:06:51.124 Policy execution. 1620263 files dispatched.
> [I] A total of 1620263 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
>         0 'skipped' files and/or errors.

? ddj
Dave Johnson
Brown University 

> On Apr 18, 2017, at 11:11 AM, Marc A Kaplan <makaplan at us.ibm.com> wrote:
> 
> ANYONE else reading this saga?  Who uses mmapplypolicy to migrate files within multi-TB file systems?  Problems? Or all working as expected?
> 
> ------
> 
> Well, again mmapplypolicy "thinks" it has "chosen" 1.6 million files whose total size is 61 Terabytes and migrating those will bring the occupancy of gpfs23capacity pool to 98% and then we're done.
> 
> So now I'm wondering where this is going wrong.  Is there some bug in the reckoning inside of mmapplypolicy or somewhere else in GPFS?
> 
> Sure you can put in an PMR, and probably should.  I'm guessing whoever picks up the PMR will end up calling or emailing me ... but maybe she can do some of the clerical work for us...  
> 
> While we're waiting for that... Here's what I suggest next.
> 
> Add  a clause ...
> 
> SHOW(varchar(KB_ALLOCATED) || ' n=' || varchar(NLINK))
> 
> before the WHERE clause to each of your rules.
> 
> Re-run the command with options  '-I test -L 2'  and collect the output.  
> 
> We're not actually going to move any data, but we're going to look at the files and file sizes that are "chosen"...
> 
> You should see 1.6 million lines that look kind of like this:
> 
> /yy/dat/bigC     RULE 'msx' MIGRATE FROM POOL 'system' TO POOL 'xtra' WEIGHT(inf) SHOW( 1024 n=1)
> 
> Run a script over the output to add up all the SHOW() values in the lines that contain TO POOL 'gpfs23capacity' and verify that they do indeed
> add up to 61TB...  (The show is in KB so the SHOW numbers should add up to 61 billion).
> 
> That sanity checks the policy arithmetic.  Let's assume that's okay. 
> 
> Then the next question is whether the individual numbers are correct... Zach Giles made a suggestion... which I'll interpret as 
> find some of the biggest of those files and check that they really are that big....
> 
> At this point, I really don't know, but I'm guessing there's some discrepances in the reported KB_ALLOCATED numbers for many of the files...
> and/or they are "illplaced"  - the data blocks aren't all in the pool FROM POOL ...
> 
> HMMMM....  I just thought about this some more and added the NLINK statistic.  It would be unusual for this to be a big problem, but files that are hard linked are
> not recognized by mmapplypolicy as sharing storage... 
> This has not come to my attention as a significant problem -- does the file system in question have significant GBs of hard linked files?
> 
> The truth is that you're the first customer/user/admin in a long time to question/examine how mmapplypolicy does its space reckoning ... 
> Optimistically that means it works fine for most customers...  
> 
> So sorry, something unusual about your installation or usage...
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170418/4dd8593c/attachment.htm>

From makaplan at us.ibm.com  Tue Apr 18 17:06:16 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Tue, 18 Apr 2017 12:06:16 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything
 it	should	have - why not?
In-Reply-To: <4EC20B6E-8172-492D-B2ED-017359A48D03@brown.edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu><OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com><C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu><OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com><90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu><764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu><OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com>
	<4EC20B6E-8172-492D-B2ED-017359A48D03@brown.edu>
Message-ID: <OF74552EAB.5993533B-ON85258106.005822A2-85258106.00587756@notes.na.collabserv.com>

That is a summary message. It says one way or another, the command has 
dealt with 1.6 million files.  For the case under discussion there are no 
EXTERNAL pools, nor any DELETions, just intra-GPFS MIGRATions.

[I] A total of 1620263 files have been migrated, deleted or processed by 
an EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170418/459967ab/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Tue Apr 18 17:32:24 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Tue, 18 Apr 2017 16:32:24 +0000
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
	<OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
	<C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
	<OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com>
	<90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu>
	<764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu>
	<OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com>
Message-ID: <968C356B-8FDD-44F8-9814-F3D2470369B0@Vanderbilt.Edu>

Hi Marc,

Two things:

1.  I have a PMR open now.

2.  You *may* have identified the problem ? I?m still checking ? but files with hard links may be our problem.  I wrote a simple Perl script to interate over the log file I had mmapplypolicy create.  Here?s the code (don?t laugh, I?m a SysAdmin, not a programmer, and I whipped this out in < 5 minutes ? and yes, I realize the fact that I used Perl instead of Python shows my age as well <grin>):

#!/usr/bin/perl
#
use strict;
use warnings;
my $InputFile = "/tmp/mmapplypolicy.gpfs23.log";
my $TotalFiles = 0;
my $TotalLinks = 0;
my $TotalSize = 0;
open INPUT, $InputFile or die "Couldn\'t open $InputFile for read:  $!\n";
while (<INPUT>) {
  next unless /MIGRATED/;
  $TotalFiles++;
  my $FileName = (split / /)[3];
  if ( -f $FileName ) {  # some files may have been deleted since mmapplypolicy ran
    my ($NumLinks, $FileSize) = (stat($FileName))[3,7];
    $TotalLinks += $NumLinks;
    $TotalSize += $FileSize;
  }
}
close INPUT;
print "Number of files / links = $TotalFiles / $TotalLinks, Total size = $TotalSize\n";
exit 0;

And here?s what it kicked out:

Number of files / links = 1620263 / 80818483, Total size = 53966202814094

1.6 million files but 80 million hard links!!!

I?m doing some checking right now, but it appears that it is one particular group - and therefore one particular fileset - that is responsible for this ? they?ve got thousands of files with 50 or more hard links each ? and they?re not inconsequential in size.

IIRC (and keep in mind I?m far from a GPFS policy guru), there is a way to say something to the effect of ?and the path does not contain /gpfs23/fileset/path? ? may need a little help getting that right.

I?ll post this information to the ticket as well but wanted to update the list.  This wouldn?t be the first time we were an ?edge case? for something in GPFS? ;-)

Thanks...

Kevin


On Apr 18, 2017, at 10:11 AM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

ANYONE else reading this saga?  Who uses mmapplypolicy to migrate files within multi-TB file systems?  Problems? Or all working as expected?

------

Well, again mmapplypolicy "thinks" it has "chosen" 1.6 million files whose total size is 61 Terabytes and migrating those will bring the occupancy of gpfs23capacity pool to 98% and then we're done.

So now I'm wondering where this is going wrong.  Is there some bug in the reckoning inside of mmapplypolicy or somewhere else in GPFS?

Sure you can put in an PMR, and probably should.  I'm guessing whoever picks up the PMR will end up calling or emailing me ... but maybe she can do some of the clerical work for us...

While we're waiting for that... Here's what I suggest next.

Add  a clause ...

SHOW(varchar(KB_ALLOCATED) || ' n=' || varchar(NLINK))

before the WHERE clause to each of your rules.

Re-run the command with options  '-I test -L 2'  and collect the output.

We're not actually going to move any data, but we're going to look at the files and file sizes that are "chosen"...

You should see 1.6 million lines that look kind of like this:

/yy/dat/bigC     RULE 'msx' MIGRATE FROM POOL 'system' TO POOL 'xtra' WEIGHT(inf) SHOW( 1024 n=1)

Run a script over the output to add up all the SHOW() values in the lines that contain TO POOL 'gpfs23capacity' and verify that they do indeed
add up to 61TB...  (The show is in KB so the SHOW numbers should add up to 61 billion).

That sanity checks the policy arithmetic.  Let's assume that's okay.

Then the next question is whether the individual numbers are correct... Zach Giles made a suggestion... which I'll interpret as
find some of the biggest of those files and check that they really are that big....

At this point, I really don't know, but I'm guessing there's some discrepances in the reported KB_ALLOCATED numbers for many of the files...
and/or they are "illplaced"  - the data blocks aren't all in the pool FROM POOL ...

HMMMM....  I just thought about this some more and added the NLINK statistic.  It would be unusual for this to be a big problem, but files that are hard linked are
not recognized by mmapplypolicy as sharing storage...
This has not come to my attention as a significant problem -- does the file system in question have significant GBs of hard linked files?

The truth is that you're the first customer/user/admin in a long time to question/examine how mmapplypolicy does its space reckoning ...
Optimistically that means it works fine for most customers...

So sorry, something unusual about your installation or usage...


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170418/09c31be0/attachment.htm>

From makaplan at us.ibm.com  Tue Apr 18 17:56:11 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Tue, 18 Apr 2017 12:56:11 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should have - why not? hard links! A workaround
In-Reply-To: <968C356B-8FDD-44F8-9814-F3D2470369B0@Vanderbilt.Edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu><OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com><C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu><OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com><90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu><764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu><OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com>
	<968C356B-8FDD-44F8-9814-F3D2470369B0@Vanderbilt.Edu>
Message-ID: <OF6167792A.CF57B0EC-ON85258106.005CA51E-85258106.005D0914@notes.na.collabserv.com>

Kevin, Wow.  Never underestimate the power of ...

Anyhow try this as a fix. 

Add the clause  SIZE(KB_ALLOCATED/NLINK) to your MIGRATE rules.

This spreads the total actual size over each hardlink...


From:   "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/18/2017 12:33 PM
Subject:        Re: [gpfsug-discuss] mmapplypolicy didn't migrate 
everything it should    have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Marc, 

Two things:

1.  I have a PMR open now.

2.  You *may* have identified the problem ? I?m still checking ? but files 
with hard links may be our problem.  I wrote a simple Perl script to 
interate over the log file I had mmapplypolicy create.  Here?s the code 
(don?t laugh, I?m a SysAdmin, not a programmer, and I whipped this out in 
< 5 minutes ? and yes, I realize the fact that I used Perl instead of 
Python shows my age as well <grin>):

#!/usr/bin/perl
#
use strict;
use warnings;
my $InputFile = "/tmp/mmapplypolicy.gpfs23.log";
my $TotalFiles = 0;
my $TotalLinks = 0;
my $TotalSize = 0;
open INPUT, $InputFile or die "Couldn\'t open $InputFile for read:  $!\n";
while (<INPUT>) {
  next unless /MIGRATED/;
  $TotalFiles++;
  my $FileName = (split / /)[3];
  if ( -f $FileName ) {  # some files may have been deleted since 
mmapplypolicy ran
    my ($NumLinks, $FileSize) = (stat($FileName))[3,7];
    $TotalLinks += $NumLinks;
    $TotalSize += $FileSize;
  }
}
close INPUT;
print "Number of files / links = $TotalFiles / $TotalLinks, Total size = 
$TotalSize\n";
exit 0;

And here?s what it kicked out:

Number of files / links = 1620263 / 80818483, Total size = 53966202814094

1.6 million files but 80 million hard links!!!

I?m doing some checking right now, but it appears that it is one 
particular group - and therefore one particular fileset - that is 
responsible for this ? they?ve got thousands of files with 50 or more hard 
links each ? and they?re not inconsequential in size.

IIRC (and keep in mind I?m far from a GPFS policy guru), there is a way to 
say something to the effect of ?and the path does not contain 
/gpfs23/fileset/path? ? may need a little help getting that right.

I?ll post this information to the ticket as well but wanted to update the 
list.  This wouldn?t be the first time we were an ?edge case? for 
something in GPFS? ;-)

Thanks...

Kevin


On Apr 18, 2017, at 10:11 AM, Marc A Kaplan <makaplan at us.ibm.com> wrote:

ANYONE else reading this saga?  Who uses mmapplypolicy to migrate files 
within multi-TB file systems?  Problems? Or all working as expected?

------

Well, again mmapplypolicy "thinks" it has "chosen" 1.6 million files whose 
total size is 61 Terabytes and migrating those will bring the occupancy of 
gpfs23capacity pool to 98% and then we're done.

So now I'm wondering where this is going wrong.  Is there some bug in the 
reckoning inside of mmapplypolicy or somewhere else in GPFS?

Sure you can put in an PMR, and probably should.  I'm guessing whoever 
picks up the PMR will end up calling or emailing me ... but maybe she can 
do some of the clerical work for us...  

While we're waiting for that... Here's what I suggest next.

Add  a clause ...

SHOW(varchar(KB_ALLOCATED) || ' n=' || varchar(NLINK))

before the WHERE clause to each of your rules.

Re-run the command with options  '-I test -L 2'  and collect the output.  

We're not actually going to move any data, but we're going to look at the 
files and file sizes that are "chosen"...

You should see 1.6 million lines that look kind of like this:

/yy/dat/bigC     RULE 'msx' MIGRATE FROM POOL 'system' TO POOL 'xtra' 
WEIGHT(inf) SHOW( 1024 n=1)

Run a script over the output to add up all the SHOW() values in the lines 
that contain TO POOL 'gpfs23capacity' and verify that they do indeed
add up to 61TB...  (The show is in KB so the SHOW numbers should add up to 
61 billion).

That sanity checks the policy arithmetic.  Let's assume that's okay. 

Then the next question is whether the individual numbers are correct... 
Zach Giles made a suggestion... which I'll interpret as 
find some of the biggest of those files and check that they really are 
that big....

At this point, I really don't know, but I'm guessing there's some 
discrepances in the reported KB_ALLOCATED numbers for many of the files...
and/or they are "illplaced"  - the data blocks aren't all in the pool FROM 
POOL ...

HMMMM....  I just thought about this some more and added the NLINK 
statistic.  It would be unusual for this to be a big problem, but files 
that are hard linked are
not recognized by mmapplypolicy as sharing storage... 
This has not come to my attention as a significant problem -- does the 
file system in question have significant GBs of hard linked files?

The truth is that you're the first customer/user/admin in a long time to 
question/examine how mmapplypolicy does its space reckoning ... 
Optimistically that means it works fine for most customers...  

So sorry, something unusual about your installation or usage...


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170418/8dc9489a/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Wed Apr 19 14:12:16 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Wed, 19 Apr 2017 13:12:16 +0000
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <4EC20B6E-8172-492D-B2ED-017359A48D03@brown.edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu>
	<OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com>
	<C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu>
	<OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com>
	<90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu>
	<764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu>
	<OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com>
	<4EC20B6E-8172-492D-B2ED-017359A48D03@brown.edu>
Message-ID: <458DAA01-0766-4ACB-964C-255BAC6E7975@vanderbilt.edu>

Hi All,

I think we *may* be able to wrap this saga up?  ;-)

Dave - in regards to your question, all I know is that the tail end of the log file is ?normal? for all the successful pool migrations I?ve done in the past few years.

It looks like the hard links were the problem.  We have one group with a fileset on our filesystem that they use for backing up Linux boxes in their lab.  That one fileset has thousands and thousands (I haven?t counted, but based on the output of that Perl script I wrote it could well be millions) of files with anywhere from 50 to 128 hard links each ? those files ranged from a few KB to a few MB in size.

From what Marc said, my understanding is that with the way I had my policy rule written mmapplypolicy was seeing each of those as separate files and therefore thinking it was moving 50 to 128 times as much space to the gpfs23capacity pool as it really was for those files.  Marc can correct me or clarify further if necessary.  He directed me to add:

SIZE(KB_ALLOCATED/NLINK)

to both of my migrate rules in my policy file.  I did so and kicked off another mmapplypolicy last night, which is still running.  However, the prediction section now says:

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to migrate 40050141920KB: 2051495 of 2051495 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             104098980256    124983549952     83.290145220%
gpfs23data                 168478368352    343753326592     49.011414674%
system                                0               0      0.000000000% (no user data)

So now it?s going to move every file it can that matches my policies because it?s figured out that a lot of those are hard links ? and I don?t have enough files matching the criteria to fill the gpfs23capacity pool to the 98% limit like mmapplypolicy thought I did before.  According to the log file, it?s happily chugging along migrating files, and mmdf agrees that my gpfs23capacity pool is gradually getting more full (I have it QOSed, of course):

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          25.33T ( 44%)        68.13G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          25.33T ( 44%)        68.49G ( 0%)
                -------------                         -------------------- -------------------
(pool total)           116.4T                                50.66T ( 44%)        136.6G ( 0%)

My sincere thanks to all who took the time to respond to my questions.  Of course, that goes double for Marc.

We (Vanderbilt) seem to have a long tradition of finding some edge cases in GPFS going all the way back to when we originally moved off of an NFS server to GPFS (2.2, 2.3?) back in 2005.  I was creating individual tarballs of each users? home directory on the NFS server, copying the tarball to one of the NSD servers, and untarring it there (don?t remember why we weren?t rsync?ing, but there was a reason).  Everything was working just fine except for one user.  Every time I tried to untar her home directory on GPFS it barfed part of the way thru ? turns out that until then IBM hadn?t considered that someone would want to put 6 million files in one directory.  Gotta love those users!  ;-)

Kevin

On Apr 18, 2017, at 10:31 AM, David D. Johnson <david_johnson at brown.edu<mailto:david_johnson at brown.edu>> wrote:

I have an observation, which may merely serve to show my ignorance:
 Is it significant that the words "EXTERNAL EXEC/script? are seen below?
If migrating between storage pools within the cluster, I would expect the PIT engine to do the migration.
When doing HSM (off cluster, tape libraries, etc) is where I would expect to need a script to actually do the work.

[I] 2017-04-18 at 09:06:51.124 Policy execution. 1620263 files dispatched.
[I] A total of 1620263 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.

? ddj
Dave Johnson
Brown University

On Apr 18, 2017, at 11:11 AM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

ANYONE else reading this saga?  Who uses mmapplypolicy to migrate files within multi-TB file systems?  Problems? Or all working as expected?

------

Well, again mmapplypolicy "thinks" it has "chosen" 1.6 million files whose total size is 61 Terabytes and migrating those will bring the occupancy of gpfs23capacity pool to 98% and then we're done.

So now I'm wondering where this is going wrong.  Is there some bug in the reckoning inside of mmapplypolicy or somewhere else in GPFS?

Sure you can put in an PMR, and probably should.  I'm guessing whoever picks up the PMR will end up calling or emailing me ... but maybe she can do some of the clerical work for us...

While we're waiting for that... Here's what I suggest next.

Add  a clause ...

SHOW(varchar(KB_ALLOCATED) || ' n=' || varchar(NLINK))

before the WHERE clause to each of your rules.

Re-run the command with options  '-I test -L 2'  and collect the output.

We're not actually going to move any data, but we're going to look at the files and file sizes that are "chosen"...

You should see 1.6 million lines that look kind of like this:

/yy/dat/bigC     RULE 'msx' MIGRATE FROM POOL 'system' TO POOL 'xtra' WEIGHT(inf) SHOW( 1024 n=1)

Run a script over the output to add up all the SHOW() values in the lines that contain TO POOL 'gpfs23capacity' and verify that they do indeed
add up to 61TB...  (The show is in KB so the SHOW numbers should add up to 61 billion).

That sanity checks the policy arithmetic.  Let's assume that's okay.

Then the next question is whether the individual numbers are correct... Zach Giles made a suggestion... which I'll interpret as
find some of the biggest of those files and check that they really are that big....

At this point, I really don't know, but I'm guessing there's some discrepances in the reported KB_ALLOCATED numbers for many of the files...
and/or they are "illplaced"  - the data blocks aren't all in the pool FROM POOL ...

HMMMM....  I just thought about this some more and added the NLINK statistic.  It would be unusual for this to be a big problem, but files that are hard linked are
not recognized by mmapplypolicy as sharing storage...
This has not come to my attention as a significant problem -- does the file system in question have significant GBs of hard linked files?

The truth is that you're the first customer/user/admin in a long time to question/examine how mmapplypolicy does its space reckoning ...
Optimistically that means it works fine for most customers...

So sorry, something unusual about your installation or usage...


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/df709a0f/attachment.htm>

From makaplan at us.ibm.com  Wed Apr 19 15:37:29 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Wed, 19 Apr 2017 10:37:29 -0400
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <458DAA01-0766-4ACB-964C-255BAC6E7975@vanderbilt.edu>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu><OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com><C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu><OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com><90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu><764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu><OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com><4EC20B6E-8172-492D-B2ED-017359A48D03@brown.edu>
	<458DAA01-0766-4ACB-964C-255BAC6E7975@vanderbilt.edu>
Message-ID: <OF44330972.B957FDCD-ON85258107.004E6C50-85258107.00505612@notes.na.collabserv.com>

Well  I'm glad we followed Mr. S. Holmes dictum which I'll paraphrase... 
eliminate the impossible and what remains, even if it seems improbable, 
must hold.

BTW - you may want to look at  mmclone.  Personally, I find the doc and 
terminology confusing, but mmclone was designed to efficiently store 
copies and near-copies of large (virtual machine) images.  Uses 
copy-on-write strategy, similar to GPFS snapshots, but at a file by file 
granularity.

BBTW - we fixed directories - they can now be huge (up to about 2^30 
files) and automagically, efficiently grow and shrink in size.  Also small 
directories can be stored efficiently in the inode.  The last major 
improvement was just a few years ago.  Before that they could be huge, but 
would never shrink. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/11ebec8b/attachment.htm>

From bbanister at jumptrading.com  Wed Apr 19 17:18:50 2017
From: bbanister at jumptrading.com (Bryan Banister)
Date: Wed, 19 Apr 2017 16:18:50 +0000
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <OF44330972.B957FDCD-ON85258107.004E6C50-85258107.00505612@notes.na.collabserv.com>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu><OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com><C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu><OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com><90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu><764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu><OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com><4EC20B6E-8172-492D-B2ED-017359A48D03@brown.edu>
	<458DAA01-0766-4ACB-964C-255BAC6E7975@vanderbilt.edu>
	<OF44330972.B957FDCD-ON85258107.004E6C50-85258107.00505612@notes.na.collabserv.com>
Message-ID: <e1dd9093de7b482087b8f6d5c4d05357@jumptrading.com>

Hey Marc,

I'm having some issues where a simple ILM list policy never completes, but I have yet to open a PMR or enable additional logging.  But I was wondering if there are known reasons that this would not complete, such as when there is a symbolic link that creates a loop within the directory structure or something simple like that.

Do you know of any cases like this, Marc, that I should try to find in my file systems?

Thanks in advance!
-Bryan

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan
Sent: Wednesday, April 19, 2017 9:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it should have - why not?

Well  I'm glad we followed Mr. S. Holmes dictum which I'll paraphrase... eliminate the impossible and what remains, even if it seems improbable, must hold.

BTW - you may want to look at  mmclone.  Personally, I find the doc and terminology confusing, but mmclone was designed to efficiently store copies and near-copies of large (virtual machine) images.  Uses copy-on-write strategy, similar to GPFS snapshots, but at a file by file granularity.

BBTW - we fixed directories - they can now be huge (up to about 2^30 files) and automagically, efficiently grow and shrink in size.  Also small directories can be stored efficiently in the inode.  The last major improvement was just a few years ago.  Before that they could be huge, but would never shrink.


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/db6c1858/attachment.htm>

From YARD at il.ibm.com  Wed Apr 19 17:23:12 2017
From: YARD at il.ibm.com (Yaron Daniel)
Date: Wed, 19 Apr 2017 19:23:12 +0300
Subject: [gpfsug-discuss] mmapplypolicy didn't migrate everything it
 should	have - why not?
In-Reply-To: <e1dd9093de7b482087b8f6d5c4d05357@jumptrading.com>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu><OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com><C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu><OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com><90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu><764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu><OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com><4EC20B6E-8172-492D-B2ED-017359A48D03@brown.edu><458DAA01-0766-4ACB-964C-255BAC6E7975@vanderbilt.edu><OF44330972.B957FDCD-ON85258107.004E6C50-85258107.00505612@notes.na.collabserv.com>
	<e1dd9093de7b482087b8f6d5c4d05357@jumptrading.com>
Message-ID: <OFA3AB6452.ECD2B4BE-ONC2258107.0059F3CF-C2258107.005A0426@notes.na.collabserv.com>

Hi

Maybe the temp list file - fill the FS that they build on.

Try to monitor the FS where the temp filelist is created.

 
Regards
 

Yaron Daniel
 94 Em Ha'Moshavot Rd

Server, Storage and Data Services - Team Leader  
 Petach Tiqva, 49527
Global Technology Services
 Israel
Phone:
+972-3-916-5672
 
 
Fax:
+972-3-916-5672
 
 
Mobile:
+972-52-8395593
 
 
e-mail:
yard at il.ibm.com
 
 
IBM Israel
 
 
From:   Bryan Banister <bbanister at jumptrading.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/19/2017 07:19 PM
Subject:        Re: [gpfsug-discuss] mmapplypolicy didn't migrate 
everything it should    have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hey Marc,
 
I?m having some issues where a simple ILM list policy never completes, but 
I have yet to open a PMR or enable additional logging.  But I was 
wondering if there are known reasons that this would not complete, such as 
when there is a symbolic link that creates a loop within the directory 
structure or something simple like that.
 
Do you know of any cases like this, Marc, that I should try to find in my 
file systems?
 
Thanks in advance!
-Bryan
 
From: gpfsug-discuss-bounces at spectrumscale.org [
mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A 
Kaplan
Sent: Wednesday, April 19, 2017 9:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it 
should have - why not?
 
Well  I'm glad we followed Mr. S. Holmes dictum which I'll paraphrase... 
eliminate the impossible and what remains, even if it seems improbable, 
must hold.

BTW - you may want to look at  mmclone.  Personally, I find the doc and 
terminology confusing, but mmclone was designed to efficiently store 
copies and near-copies of large (virtual machine) images.  Uses 
copy-on-write strategy, similar to GPFS snapshots, but at a file by file 
granularity.

BBTW - we fixed directories - they can now be huge (up to about 2^30 
files) and automagically, efficiently grow and shrink in size.  Also small 
directories can be stored efficiently in the inode.  The last major 
improvement was just a few years ago.  Before that they could be huge, but 
would never shrink. 


Note: This email is for the confidential use of the named addressee(s) 
only and may contain proprietary, confidential or privileged information. 
If you are not the intended recipient, you are hereby notified that any 
review, dissemination or copying of this email is strictly prohibited, and 
to please notify the sender immediately and destroy this email and any 
attachments. Email transmission cannot be guaranteed to be secure or 
error-free. The Company, therefore, does not make any guarantees as to the 
completeness or accuracy of this email or any attachments. This email is 
for informational purposes only and does not constitute a recommendation, 
offer, request or solicitation of any kind to buy, sell, subscribe, redeem 
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/0f7ff5c9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/0f7ff5c9/attachment.gif>

From makaplan at us.ibm.com  Wed Apr 19 18:10:28 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Wed, 19 Apr 2017 13:10:28 -0400
Subject: [gpfsug-discuss] mmapplypolicy not terminating properly?
In-Reply-To: <e1dd9093de7b482087b8f6d5c4d05357@jumptrading.com>
References: <A2B55DF8-6D84-48FF-91A5-61FBE611031A@vanderbilt.edu><OFD5D1AB35.A86C5008-ON85258104.0066DBFD-85258104.0069CE37@notes.na.collabserv.com><C14CFFAF-48DE-454C-A9D5-CC8D68D8A221@vanderbilt.edu><OF2D75CC4E.5B8F1DA5-ON85258105.00562E15-85258105.006EE614@notes.na.collabserv.com><90BBFFED-C308-41E2-A614-A0AE5DA764CD@vanderbilt.edu><764081F7-56BD-40D4-862D-9BBBD02ED214@vanderbilt.edu><OFA4C89C12.47339A28-ON85258106.004EC07E-85258106.00536F24@notes.na.collabserv.com><4EC20B6E-8172-492D-B2ED-017359A48D03@brown.edu><458DAA01-0766-4ACB-964C-255BAC6E7975@vanderbilt.edu><OF44330972.B957FDCD-ON85258107.004E6C50-85258107.00505612@notes.na.collabserv.com>
	<e1dd9093de7b482087b8f6d5c4d05357@jumptrading.com>
Message-ID: <OFBA31254D.F3409717-ON85258107.005D18DF-85258107.005E57C1@notes.na.collabserv.com>

(Bryan B asked...)

Open a PMR.  The first response from me will be ...  Run the mmapplypolicy 
command again, except with additional option `-d 017` and collect output 
with something equivalent to  `2>&1  |  tee 
/tmp/save-all-command-output-here-to-be-passed-along-to-IBM-service `  If 
you are convinced that mmapplypolicy is "looping" or "hung" - wait another 
2 minutes, terminate, and then pass along the saved-all-command-output.

-d 017 will dump a lot of additional diagnostics -- If you want to narrow 
it by baby steps we could try  `-d 03` first and see if there are enough 
clues in that.

To answer two of your questions: 
1. mmapplypolicy does not follow symlinks, so no "infinite loop" possible 
with symlinks.

2a. loops in directory are file system bugs in GPFS, (in fact in any 
posixish file system), (mm)fsck!
2b. mmapplypolicy does impose a limit on total length of pathnames, so 
even if there is a loop in the directory, mmapplypolicy will "trim" the 
directory walk.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/dffc5201/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Wed Apr 19 20:53:42 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Wed, 19 Apr 2017 19:53:42 +0000
Subject: [gpfsug-discuss] RAID config for SSD's used for data
Message-ID: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>

Hi All,

We currently have what I believe is a fairly typical setup ? metadata for our GPFS filesystems is the only thing in the system pool and it?s on SSD, while data is on spinning disk (RAID 6 LUNs).  Everything connected via 8 Gb FC SAN.  8 NSD servers.  Roughly 1 PB usable space.

Now lets just say that you have a little bit of money to spend.  Your I/O demands aren?t great - in fact, they?re way on the low end ? typical (cumulative) usage is 200 - 600 MB/sec read, less than that for writes.  But while GPFS has always been great and therefore you don?t need to Make GPFS Great Again, you do want to provide your users with the best possible environment.

So you?re considering the purchase of a dual-controller FC storage array with 12 or so 1.8 TB SSD?s in it, with the idea being that that storage would be in its? own storage pool and that pool would be the default location for I/O for your main filesystem ? at least for smaller files.  You intend to use mmapplypolicy nightly to move data to / from this pool and the spinning disk pools.

Given all that ? would you configure those disks as 6 RAID 1 mirrors and have 6 different primary NSD servers or would it be feasible to configure one big RAID 6 LUN?  I?m thinking the latter is not a good idea as there could only be one primary NSD server for that one LUN, but given that:  1) I have no experience with this, and 2) I have been wrong once or twice before (<grin>), I?m looking for advice.  Thanks!

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/55f480a1/attachment.htm>

From luis.bolinches at fi.ibm.com  Wed Apr 19 20:59:18 2017
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Wed, 19 Apr 2017 19:59:18 +0000
Subject: [gpfsug-discuss] RAID config for SSD's used for data
In-Reply-To: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
Message-ID: <OF40AC0635.E01006B5-ON00258107.006DCC91-1492631958170@notes.na.collabserv.com>

Hi

I'll give my opinion. Worth what you pay for. 

Do as many as you can, six in this case for the good reason you mentioned. 

But play with the callbacks so the migration happens on watermarks when it happens. Otherwise you might hit no space till your next policy run. 

The second is well documented on the redbook AFAIK

Cheers

--
Cheers

> On 19 Apr 2017, at 22.54, Buterbaugh, Kevin L <Kevin.Buterbaugh at Vanderbilt.Edu> wrote:
> 
> Hi All,
> 
> We currently have what I believe is a fairly typical setup ? metadata for our GPFS filesystems is the only thing in the system pool and it?s on SSD, while data is on spinning disk (RAID 6 LUNs).  Everything connected via 8 Gb FC SAN.  8 NSD servers.  Roughly 1 PB usable space.
> 
> Now lets just say that you have a little bit of money to spend.  Your I/O demands aren?t great - in fact, they?re way on the low end ? typical (cumulative) usage is 200 - 600 MB/sec read, less than that for writes.  But while GPFS has always been great and therefore you don?t need to Make GPFS Great Again, you do want to provide your users with the best possible environment.
> 
> So you?re considering the purchase of a dual-controller FC storage array with 12 or so 1.8 TB SSD?s in it, with the idea being that that storage would be in its? own storage pool and that pool would be the default location for I/O for your main filesystem ? at least for smaller files.  You intend to use mmapplypolicy nightly to move data to / from this pool and the spinning disk pools.
> 
> Given all that ? would you configure those disks as 6 RAID 1 mirrors and have 6 different primary NSD servers or would it be feasible to configure one big RAID 6 LUN?  I?m thinking the latter is not a good idea as there could only be one primary NSD server for that one LUN, but given that:  1) I have no experience with this, and 2) I have been wrong once or twice before (<grin>), I?m looking for advice.  Thanks!
> 
> ?
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and Education
> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633
> 
> 
> 

Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/6ab88a7d/attachment.htm>

From S.J.Thompson at bham.ac.uk  Wed Apr 19 21:05:49 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Wed, 19 Apr 2017 20:05:49 +0000
Subject: [gpfsug-discuss] RAID config for SSD's used for data
In-Reply-To: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DCD87@EX13.adf.bham.ac.uk>

By having many LUNs, you get many IO queues for Linux to play with. Also the raid6 overhead can be quite significant, so it might be better to go with raid1 anyway depending on the controller...

And if only gpfs had some sort of auto tier back up the pools for hot or data caching :-)

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu]
Sent: 19 April 2017 20:53
To: gpfsug main discussion list
Subject: [gpfsug-discuss] RAID config for SSD's used for data

Hi All,

We currently have what I believe is a fairly typical setup ? metadata for our GPFS filesystems is the only thing in the system pool and it?s on SSD, while data is on spinning disk (RAID 6 LUNs).  Everything connected via 8 Gb FC SAN.  8 NSD servers.  Roughly 1 PB usable space.

Now lets just say that you have a little bit of money to spend.  Your I/O demands aren?t great - in fact, they?re way on the low end ? typical (cumulative) usage is 200 - 600 MB/sec read, less than that for writes.  But while GPFS has always been great and therefore you don?t need to Make GPFS Great Again, you do want to provide your users with the best possible environment.

So you?re considering the purchase of a dual-controller FC storage array with 12 or so 1.8 TB SSD?s in it, with the idea being that that storage would be in its? own storage pool and that pool would be the default location for I/O for your main filesystem ? at least for smaller files.  You intend to use mmapplypolicy nightly to move data to / from this pool and the spinning disk pools.

Given all that ? would you configure those disks as 6 RAID 1 mirrors and have 6 different primary NSD servers or would it be feasible to configure one big RAID 6 LUN?  I?m thinking the latter is not a good idea as there could only be one primary NSD server for that one LUN, but given that:  1) I have no experience with this, and 2) I have been wrong once or twice before (<grin>), I?m looking for advice.  Thanks!

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


From aaron.s.knister at nasa.gov  Wed Apr 19 21:13:14 2017
From: aaron.s.knister at nasa.gov (Aaron Knister)
Date: Wed, 19 Apr 2017 16:13:14 -0400
Subject: [gpfsug-discuss] RAID config for SSD's used for data
In-Reply-To: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DCD87@EX13.adf.bham.ac.uk>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DCD87@EX13.adf.bham.ac.uk>
Message-ID: <a61ebb3d-a713-1c33-d03c-385cffb7df8e@nasa.gov>


On 4/19/17 4:05 PM, Simon Thompson (IT Research Support) wrote:
> By having many LUNs, you get many IO queues for Linux to play with. Also the raid6 overhead can be quite significant, so it might be better to go with raid1 anyway depending on the controller...
>
> And if only gpfs had some sort of auto tier back up the pools for hot or data caching :-)

You mean like HAWC but for writes larger than 64K? ;-)

Or I guess "HARC" as it might be called for a read cache...

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776


From luis.bolinches at fi.ibm.com  Wed Apr 19 21:20:20 2017
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Wed, 19 Apr 2017 20:20:20 +0000
Subject: [gpfsug-discuss] RAID config for SSD's used for data
In-Reply-To: <a61ebb3d-a713-1c33-d03c-385cffb7df8e@nasa.gov>
Message-ID: <OF645DF687.FF0C1D56-ON00258107.006FB9C8-1492633220707@notes.na.collabserv.com>


I assume you are making the joke of external LROC. But not sure I would use
external storage for LROC, as the whole point is to have really fast
storage as close to the node (L for local) as possible. Maybe those SSD
that will get replaced with the fancy external storage?

--
Cheers

> On 19 Apr 2017, at 23.13, Aaron Knister <aaron.s.knister at nasa.gov> wrote:
>
>
>
>> On 4/19/17 4:05 PM, Simon Thompson (IT Research Support) wrote:
>> By having many LUNs, you get many IO queues for Linux to play with. Also
the raid6 overhead can be quite significant, so it might be better to go
with raid1 anyway depending on the controller...
>>
>> And if only gpfs had some sort of auto tier back up the pools for hot or
data caching :-)
>
> You mean like HAWC but for writes larger than 64K? ;-)
>
> Or I guess "HARC" as it might be called for a read cache...
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/3dc5e7b5/attachment.htm>

From makaplan at us.ibm.com  Wed Apr 19 21:49:56 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Wed, 19 Apr 2017 16:49:56 -0400
Subject: [gpfsug-discuss] RAID config for SSD's - potential pitfalls
In-Reply-To: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
Message-ID: <OFB3F9CDFE.C075EC1A-ON85258107.0070B9C3-85258107.00726F7F@notes.na.collabserv.com>

As I've mentioned before, RAID choices for GPFS are not so simple.    Here 
are  a couple points to consider, I'm sure there's more.  And if I'm 
wrong, someone will please correct me - but I believe the two biggest 
pitfalls are:

Some RAID configurations (classically 5 and 6) work best with large, full 
block writes.  When the file system does a partial block write, RAID may 
have to read a full "stripe" from several devices, compute the differences 
and then write back the modified data to several devices.  This is 
certainly true with RAID that is configured over several storage devices, 
with error correcting codes.  SO, you do NOT want to put GPFS metadata 
(system pool!) on RAID configured with large stripes and error correction. 
This is the Read-Modify-Write Raid pitfall.

GPFS has built-in replication features - consider using those instead of 
RAID replication (classically Raid-1).  GPFS replication can work with 
storage devices that are in different racks, separated by significant 
physical space, and from different manufacturers.  This can be more robust 
than RAID in a single box or single rack.  Consider a fire scenario, or 
exploding power supply or similar physical disaster.  Consider that 
storage devices and controllers from the same manufacturer may have the 
same bugs, defects, failures. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/0238cc16/attachment.htm>

From Kevin.Buterbaugh at Vanderbilt.Edu  Wed Apr 19 22:12:35 2017
From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L)
Date: Wed, 19 Apr 2017 21:12:35 +0000
Subject: [gpfsug-discuss] RAID config for SSD's - potential pitfalls
In-Reply-To: <OFB3F9CDFE.C075EC1A-ON85258107.0070B9C3-85258107.00726F7F@notes.na.collabserv.com>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
	<OFB3F9CDFE.C075EC1A-ON85258107.0070B9C3-85258107.00726F7F@notes.na.collabserv.com>
Message-ID: <FF2A4E94-2F68-41D9-B68C-72EF30421E54@vanderbilt.edu>

Hi Marc,

But the limitation on GPFS replication is that I can set replication separately for metadata and data, but no matter whether I have one data pool or ten data pools they all must have the same replication, correct?

And believe me I *love* GPFS replication ? I would hope / imagine that I am one of the few people on this mailing list who has actually gotten to experience a ?fire scenario? ? electrical fire, chemical suppressant did it?s thing, and everything in the data center had a nice layer of soot, ash, and chemical suppressant on and in it and therefore had to be professionally cleaned.  Insurance bought us enough disk space that we could (temporarily) turn on GPFS data replication and clean storage arrays one at a time!

But in my current hypothetical scenario I?m stretching the budget just to get that one storage array with 12 x 1.8 TB SSD?s in it.  Two are out of the question.

My current metadata that I?ve got on SSDs is on RAID 1 mirrors and has GPFS replication set to 2.  I thought the multiple RAID 1 mirrors approach was the way to go for SSDs for data as well, as opposed to one big RAID 6 LUN, but wanted to get the advice of those more knowledgeable than me.

Thanks!

Kevin

On Apr 19, 2017, at 3:49 PM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

As I've mentioned before, RAID choices for GPFS are not so simple.    Here are  a couple points to consider, I'm sure there's more.  And if I'm wrong, someone will please correct me - but I believe the two biggest pitfalls are:

  *   Some RAID configurations (classically 5 and 6) work best with large, full block writes.  When the file system does a partial block write, RAID may have to read a full "stripe" from several devices, compute the differences and then write back the modified data to several devices.  This is certainly true with RAID that is configured over several storage devices, with error correcting codes.  SO, you do NOT want to put GPFS metadata (system pool!) on RAID configured with large stripes and error correction. This is the Read-Modify-Write Raid pitfall.
  *   GPFS has built-in replication features - consider using those instead of RAID replication (classically Raid-1).  GPFS replication can work with storage devices that are in different racks, separated by significant physical space, and from different manufacturers.  This can be more robust than RAID in a single box or single rack.  Consider a fire scenario, or exploding power supply or similar physical disaster.  Consider that storage devices and controllers from the same manufacturer may have the same bugs, defects, failures.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/6252633c/attachment.htm>

From chekh at stanford.edu  Wed Apr 19 22:23:15 2017
From: chekh at stanford.edu (Alex Chekholko)
Date: Wed, 19 Apr 2017 14:23:15 -0700
Subject: [gpfsug-discuss] RAID config for SSD's used for data
In-Reply-To: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
Message-ID: <4f16617c-0ae9-18ef-bfb5-206507762fd9@stanford.edu>

On 04/19/2017 12:53 PM, Buterbaugh, Kevin L wrote:
>
> So you?re considering the purchase of a dual-controller FC storage array
> with 12 or so 1.8 TB SSD?s in it, with the idea being that that storage
> would be in its? own storage pool and that pool would be the default
> location for I/O for your main filesystem ? at least for smaller files.
>  You intend to use mmapplypolicy nightly to move data to / from this
> pool and the spinning disk pools.

We did this and failed in interesting (but in retrospect obvious) ways. 
You will want to ensure that your users cannot fill your write target 
pool within a day.  The faster the storage, the more likely that is to 
happen.  Or else your users will get ENOSPC. You will want to ensure 
that your pools can handle the additional I/O from the migration in 
aggregate with all the user I/O.  Or else your users will see worse 
performance from the fast pool than the slow pool while the migration is 
running.  You will want to make sure that the write throughput of your 
slow pool is faster than the read throughput of your fast pool.

In our case, the fast pool was undersized in capacity, and oversized in 
terms of performance.  And overall the filesystem was oversubscribed 
(~100 10GbE clients, 8 x 10GbE NSD servers) So the fast pool would fill 
very quickly.  Then I would switch the placement policy to the big slow 
pool and performance would drop dramatically, and then if I ran a 
migration it would either (depending on parameters) take up all the I/O 
to the slow pool (leaving none for the users), or else take forever 
(weeks) because the user I/O was maxing out the slow pool.

Things should better today with QoS stuff, but your relative pool 
capacities (in our case it was like 1% fast, 99% slow) and your relative 
pool performance (in our case, slow pool had fewer IOPS than fast pool) 
are still going to matter a lot.

-- 
Alex Chekholko chekh at stanford.edu


From makaplan at us.ibm.com  Wed Apr 19 22:58:24 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Wed, 19 Apr 2017 17:58:24 -0400
Subject: [gpfsug-discuss] RAID config for SSD's - potential pitfalls
In-Reply-To: <FF2A4E94-2F68-41D9-B68C-72EF30421E54@vanderbilt.edu>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu><OFB3F9CDFE.C075EC1A-ON85258107.0070B9C3-85258107.00726F7F@notes.na.collabserv.com>
	<FF2A4E94-2F68-41D9-B68C-72EF30421E54@vanderbilt.edu>
Message-ID: <OFAD6E1054.07DBAA49-ON85258107.0077FED4-85258107.0078B3EC@notes.na.collabserv.com>

Kevin asked: " ...  data pools they all must have the same replication, 
correct?"

Actually no! You can use policy RULE ... SET POOL 'x' REPLICATE(2) to set 
the replication factor when a file is created. Use  mmchattr or 
mmapplypolicy to change the replication factor after creation.  You 
specify the maximum data replication factor when you create the file 
system (1,2,3), but any given file can have replication factor set to 1 or 
2 or 3.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/ab42239b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 21994 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/ab42239b/attachment.gif>

From kums at us.ibm.com  Wed Apr 19 23:03:33 2017
From: kums at us.ibm.com (Kumaran Rajaram)
Date: Wed, 19 Apr 2017 18:03:33 -0400
Subject: [gpfsug-discuss] RAID config for SSD's - potential pitfalls
In-Reply-To: <OFB3F9CDFE.C075EC1A-ON85258107.0070B9C3-85258107.00726F7F@notes.na.collabserv.com>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
	<OFB3F9CDFE.C075EC1A-ON85258107.0070B9C3-85258107.00726F7F@notes.na.collabserv.com>
Message-ID: <OF2BFEE3FB.E1314217-ON00258107.0073F1C8-85258107.00792D8D@notes.na.collabserv.com>

Hi,

>> As I've mentioned before, RAID choices for GPFS are not so simple. Here 
are  a couple points to consider, I'm sure there's more.  And if I'm 
wrong, someone will please correct me - but I believe the two biggest 
pitfalls are:

>>Some RAID configurations (classically 5 and 6) work best with large, 
full block writes.  When the file system does a partial block write, RAID 
may have to read a full "stripe" from several devices, compute the 
differences and then write back the modified data to several devices. 
>>This is certainly true with RAID that is configured over several storage 
devices, with error correcting codes.  SO, you do NOT want to put GPFS 
metadata (system pool!) on RAID configured with large stripes and error 
correction. This is the Read-Modify-Write Raid pitfall.

As you pointed out, the RAID choices for GPFS may not be simple and we 
need to take into consideration factors such as storage subsystem 
configuration/capabilities such as if all drives are homogenous or there 
is mix of drives. If all the drives are homogeneous, then create 
dataAndMetadata NSDs across RAID-6 and if the storage  controller supports 
write-cache + write-cache mirroring (WC + WM) then enable this (WC +WM) 
can alleviate read-modify-write for small writes (typical in metadata). If 
there is MIX of SSD and HDD (e.g. 15K RPM), then we need to take into 
consideration the aggregate IOPS of RAID-1 SSD volumes vs. RAID-6 HDDs 
before separating data and metadata into separate media. For example, if 
the storage subsystem has 2 x SSDs and ~300 x 15K RPM or NL_SAS HDDs then 
most likely aggregate IOPS of RAID-6 HDD volumes will be higher than 
RAID-1 SSD volumes. It would be recommended to also assess the I/O 
performance on different configuration (dataAndMetadata vs 
dataOnly/metadataOnly NSDs) with some application workload + production 
scenarios before deploying the final solution. 

>> GPFS has built-in replication features - consider using those instead 
of RAID replication (classically Raid-1).  GPFS replication can work with 
storage devices that are in different racks, separated by significant 
physical space, and from different manufacturers.  This can be more 
>>robust than RAID in a single box or single rack.  Consider a fire 
scenario, or exploding power supply or similar physical disaster. Consider 
that storage devices and controllers from the same manufacturer may have 
the same bugs, defects, failures.

For high-resiliency (for e.g. metadataOnly) and if there are multiple 
storage across different failure domains (different racks/rooms/DC etc), 
it will be good to enable BOTH hardware RAID-1 as well as GPFS metadata 
replication enabled (at the minimum,  -m 2). 

If there is single shared storage for GPFS file-system storage and 
metadata is separated from data, then RAID-1 would minimize administrative 
overhead compared to GPFS replication in the event of drive failure (since 
with GPFS replication across single SSD would require 
mmdeldisk/mmdelnsd/mmcrnsd/mmadddisk every time disk goes faulty and needs 
to be replaced). 

Best,
-Kums


From:   Marc A Kaplan/Watson/IBM at IBMUS
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/19/2017 04:50 PM
Subject:        Re: [gpfsug-discuss] RAID config for SSD's - potential 
pitfalls
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


As I've mentioned before, RAID choices for GPFS are not so simple.    Here 
are  a couple points to consider, I'm sure there's more.  And if I'm 
wrong, someone will please correct me - but I believe the two biggest 
pitfalls are:
Some RAID configurations (classically 5 and 6) work best with large, full 
block writes.  When the file system does a partial block write, RAID may 
have to read a full "stripe" from several devices, compute the differences 
and then write back the modified data to several devices.  This is 
certainly true with RAID that is configured over several storage devices, 
with error correcting codes.  SO, you do NOT want to put GPFS metadata 
(system pool!) on RAID configured with large stripes and error correction. 
This is the Read-Modify-Write Raid pitfall.
GPFS has built-in replication features - consider using those instead of 
RAID replication (classically Raid-1).  GPFS replication can work with 
storage devices that are in different racks, separated by significant 
physical space, and from different manufacturers.  This can be more robust 
than RAID in a single box or single rack.  Consider a fire scenario, or 
exploding power supply or similar physical disaster.  Consider that 
storage devices and controllers from the same manufacturer may have the 
same bugs, defects, failures. 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/58f189c6/attachment.htm>

From makaplan at us.ibm.com  Wed Apr 19 23:41:19 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Wed, 19 Apr 2017 18:41:19 -0400
Subject: [gpfsug-discuss] RAID config for SSD's - potential pitfalls
In-Reply-To: <OF2BFEE3FB.E1314217-ON00258107.0073F1C8-85258107.00792D8D@notes.na.collabserv.com>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu><OFB3F9CDFE.C075EC1A-ON85258107.0070B9C3-85258107.00726F7F@notes.na.collabserv.com>
	<OF2BFEE3FB.E1314217-ON00258107.0073F1C8-85258107.00792D8D@notes.na.collabserv.com>
Message-ID: <OF2A0E595C.3F88913D-ON85258107.007B10B5-85258107.007CA1ED@notes.na.collabserv.com>

Kums is our performance guru, so weigh that appropriately and relative to 
my own remarks... 

Nevertheless,   I still think RAID-5or6 is a poor choice for GPFS 
metadata.  The write cache will NOT mitigate the read-modify-write problem 
of a workload that has a random or hop-scotch access pattern of small 
writes.  In the end you've still got to read and write several times more 
disk blocks than you actually set out to modify.  Same goes for any large 
amount of data that will be written in a pattern of non-sequential small 
writes.  (Define a small write as less than a full RAID stripe). 

For sure, non-volatile write caches are a good thing - but not a be all 
end all solution.

Relying on RAID-1 to protect your metadata may well be easier to 
administer, but still GPFS replication can be more robust. 

Doing both - belt and suspenders is fine -- if you can afford it. Either 
is buying 2x storage, both is 4x.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/e3c8c8c6/attachment.htm>

From Robert.Oesterlin at nuance.com  Thu Apr 20 00:16:08 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 19 Apr 2017 23:16:08 +0000
Subject: [gpfsug-discuss] mmfsadm dump vfsstats - negative values
Message-ID: <3F3E9259-1601-4473-A827-7CD5418B8C58@nuance.com>

I assume the counter has wrapped on some of these - would a PMR fix this? (4.2.1)

[root at cnt-r01r07u15 ~]# mmfsadm vfsstats
vfs statistics currently enabled
started at: Fri Jan 27 16:22:02.702 2017
  duration: 7091405.800 sec

name                    calls  time per call     total time
-------------------- -------- -------------- --------------
access                8472691       0.006672   56529.863993
close                1460175509       0.000034   49854.695358
create                2101110       0.073797  155055.263775
fsync                      20       0.001214       0.024288
getattr              859449161       0.000118  101183.699413
link                  2175473       0.000287     625.343799
lockctl                 17326       0.000302       5.229828
lookup               200369809       0.005999 1201980.046683
map_lloff            220850355       0.000039    8561.791963
mkdir                  817894       0.265793  217390.095681
mknod                       3       0.000474       0.001422
open                 1460169409       0.000092  134811.724068
read                 -412143552       0.001023 3971403.879911
write                164739329       0.000829  136616.948900
mmapRead             17108252       0.000623   10665.877349
readdir              142261835       0.000049    6999.159121
readlink             485335656       0.000004    2111.627292
readpage             -648839570       0.000004   14346.195128
remove                4239806       0.022000   93277.124289
rename                 350671       0.055135   19334.226490
rmdir                  342019       0.008000    2736.037074
setattr               3709237       0.004573   16963.899331
symlink                160610       0.053061    8522.185175
unmap                -365476297       0.000000    1735.669373
setxattr                  119       0.000009       0.001042
getxattr             -218316996       0.000154  628416.355002
removexattr                15       0.000003       0.000042
statfs                2624067       0.000082     214.306646
fastOpen             1456944934       0.000000       0.000000
fastClose            1515612004       0.000000       0.000000
fastLookup           77981387       0.000000       0.000000
fastRead             -922882405       0.000000       0.000000
fastWrite            102606402       0.000000       0.000000
revalidate             899677       0.000000       0.000000
aio write sync       21331080       0.000061    1309.773528

Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/722c616f/attachment.htm>

From scale at us.ibm.com  Thu Apr 20 01:10:51 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Wed, 19 Apr 2017 20:10:51 -0400
Subject: [gpfsug-discuss] mmfsadm dump vfsstats - negative values
In-Reply-To: <3F3E9259-1601-4473-A827-7CD5418B8C58@nuance.com>
References: <3F3E9259-1601-4473-A827-7CD5418B8C58@nuance.com>
Message-ID: <OF23379859.702E9301-ON85258107.0083CEF5-85258108.0000FE8B@us.ibm.com>

Bob,

I also noticed this recently.  I think it may be a simple matter of a 
printf()-like statement in the code that handles "mmfsadm vfsstats" using 
an incorrect conversion specifier --- one that treats the counter as 
signed instead of unsigned and treats the counter as being smaller than it 
really is.

To help confirm that hypothesis, could you please run the following 
commands on the node, at the same time, so the output can be compared:

# mmfsadm vfsstats

# mmfsadm eventsExporter mmpmon vfss

I believe the code that handles "mmfsadm eventsExporter mmpmon vfss" uses 
the correct printf()-like conversion specifier.  So, it should so good 
numbers where "mmfsadm vfsstats" shows negative numbers.

Regards, The Spectrum Scale (GPFS) team
Eric Agar

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/19/2017 07:16 PM
Subject:        [gpfsug-discuss] mmfsadm dump vfsstats - negative values
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


I assume the counter has wrapped on some of these - would a PMR fix this? 
(4.2.1)
 
[root at cnt-r01r07u15 ~]# mmfsadm vfsstats
vfs statistics currently enabled
started at: Fri Jan 27 16:22:02.702 2017
  duration: 7091405.800 sec
 
name                    calls  time per call     total time
-------------------- -------- -------------- --------------
access                8472691       0.006672   56529.863993
close                1460175509       0.000034   49854.695358
create                2101110       0.073797  155055.263775
fsync                      20       0.001214       0.024288
getattr              859449161       0.000118  101183.699413
link                  2175473       0.000287     625.343799
lockctl                 17326       0.000302       5.229828
lookup               200369809       0.005999 1201980.046683
map_lloff            220850355       0.000039    8561.791963
mkdir                  817894       0.265793  217390.095681
mknod                       3       0.000474       0.001422
open                 1460169409       0.000092  134811.724068
read                 -412143552       0.001023 3971403.879911
write                164739329       0.000829  136616.948900
mmapRead             17108252       0.000623   10665.877349
readdir              142261835       0.000049    6999.159121
readlink             485335656       0.000004    2111.627292
readpage             -648839570       0.000004   14346.195128
remove                4239806       0.022000   93277.124289
rename                 350671       0.055135   19334.226490
rmdir                  342019       0.008000    2736.037074
setattr               3709237       0.004573   16963.899331
symlink                160610       0.053061    8522.185175
unmap                -365476297       0.000000    1735.669373
setxattr                  119       0.000009       0.001042
getxattr             -218316996       0.000154  628416.355002
removexattr                15       0.000003       0.000042
statfs                2624067       0.000082     214.306646
fastOpen             1456944934       0.000000       0.000000
fastClose            1515612004       0.000000       0.000000
fastLookup           77981387       0.000000       0.000000
fastRead             -922882405       0.000000       0.000000
fastWrite            102606402       0.000000       0.000000
revalidate             899677       0.000000       0.000000
aio write sync       21331080       0.000061    1309.773528
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/1e921290/attachment.htm>

From Robert.Oesterlin at nuance.com  Thu Apr 20 01:21:04 2017
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 20 Apr 2017 00:21:04 +0000
Subject: [gpfsug-discuss] mmfsadm dump vfsstats - negative values
Message-ID: <C7023A2A-BEE4-4323-9C97-A7221623D1DA@nuance.com>

Hi Eric

Looks like your assumption is correct - no negative values from ?mmfsadm eventsExporter mmpmon vfss?. I don?t normally view these via ?mmfsadm?, I use the zimon stats. But, It?s a bug that should be fixed.

What?s the best way to get this fixed?

root at cnt-r01r07u15 ~]# mmfsadm eventsExporter mmpmon vfss
_response_ begin mmpmon vfss
_mmpmon::vfss_ _n_ 10.30.100.193 _nn_ cnt-r01r07u15 _rc_ 0 _t_ 1492647309 _tu_ 311964 _access_ 8472897 56529.874886 _close_ 1460223848 49854.938090 _create_ 2101927 155055.515041 _fclear_ 0 0.000000 _fsync_ 20 0.024288 _fsync_range_ 0 0.000000 _ftrunc_ 0 0.000000 _getattr_ 859626332 101183.720281 _link_ 2175473 625.343799 _lockctl_ 17326 5.229828 _lookup_ 200378610 1201985.264220 _map_lloff_ 220854519 8561.860515 _mkdir_ 817943 217390.170859 _mknod_ 3 0.001422 _open_ 1460217712 134812.649162 _read_ 3883163461 3971457.463527 _write_ 186078410 137927.496812 _mmapRead_ 17108947 10665.929860 _mmapWrite_ 0 0.000000 _aioRead_ 0 0.000000 _aioWrite_ 0 0.000000 _readdir_ 142262897 6999.189450 _readlink_ 485337171 2111.634286 _readpage_ 3646233600 14346.331414 _remove_ 4241324 93277.463798 _rename_ 350679 19334.235924 _rmdir_ 342042 2736.048976 _setacl_ 0 0.000000 _setattr_ 3709289 16963.901179 _symlink_ 161336 8522.670079 _unmap_ 3929805828 1735.740690 _writepage_ 0 0.000000 _tsfattr_ 0 0.000000 _tsfsattr_ 0 0.000000 _flock_ 0 0.000000 _setxattr_ 119 0.001042 _getxattr_ 4077218348 628418.213008 _listxattr_ 0 0.000000 _removexattr_ 15 0.000042 _encode_fh_ 0 0.000000 _decode_fh_ 0 0.000000 _get_dentry_ 0 0.000000 _get_parent_ 0 0.000000 _mount_ 0 0.000000 _statfs_ 2625497 214.309671 _sync_ 0 0.000000 _vget_ 0 0.000000
_response_ end

Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of IBM Spectrum Scale <scale at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, April 19, 2017 at 7:10 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss-bounces at spectrumscale.org" <gpfsug-discuss-bounces at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsadm dump vfsstats - negative values

Bob,

I also noticed this recently.  I think it may be a simple matter of a printf()-like statement in the code that handles "mmfsadm vfsstats" using an incorrect conversion specifier --- one that treats the counter as signed instead of unsigned and treats the counter as being smaller than it really is.

To help confirm that hypothesis, could you please run the following commands on the node, at the same time, so the output can be compared:

# mmfsadm vfsstats

# mmfsadm eventsExporter mmpmon vfss

I believe the code that handles "mmfsadm eventsExporter mmpmon vfss" uses the correct printf()-like conversion specifier.  So, it should so good numbers where "mmfsadm vfsstats" shows negative numbers.

Regards, The Spectrum Scale (GPFS) team
Eric Agar

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ibm.com_developerworks_community_forums_html_forum-3Fid-3D11111111-2D0000-2D0000-2D0000-2D000000000479&d=DwMBAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=E7LxwAh9JTywdNmlR4hPn1wTtOyXwRkz58hDagk2e5Q&s=sdharhyfKk6j_Fv-sAOyp2W99fx59zM0k-i3XnFFp5M&e=>.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.


From:        "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        04/19/2017 07:16 PM
Subject:        [gpfsug-discuss] mmfsadm dump vfsstats - negative values
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


I assume the counter has wrapped on some of these - would a PMR fix this? (4.2.1)

[root at cnt-r01r07u15 ~]# mmfsadm vfsstats
vfs statistics currently enabled
started at: Fri Jan 27 16:22:02.702 2017
  duration: 7091405.800 sec

name                    calls  time per call     total time
-------------------- -------- -------------- --------------
access                8472691       0.006672   56529.863993
close                1460175509       0.000034   49854.695358
create                2101110       0.073797  155055.263775
fsync                      20       0.001214       0.024288
getattr              859449161       0.000118  101183.699413
link                  2175473       0.000287     625.343799
lockctl                 17326       0.000302       5.229828
lookup               200369809       0.005999 1201980.046683
map_lloff            220850355       0.000039    8561.791963
mkdir                  817894       0.265793  217390.095681
mknod                       3       0.000474       0.001422
open                 1460169409       0.000092  134811.724068
read                 -412143552       0.001023 3971403.879911
write                164739329       0.000829  136616.948900
mmapRead             17108252       0.000623   10665.877349
readdir              142261835       0.000049    6999.159121
readlink             485335656       0.000004    2111.627292
readpage             -648839570       0.000004   14346.195128
remove                4239806       0.022000   93277.124289
rename                 350671       0.055135   19334.226490
rmdir                  342019       0.008000    2736.037074
setattr               3709237       0.004573   16963.899331
symlink                160610       0.053061    8522.185175
unmap                -365476297       0.000000    1735.669373
setxattr                  119       0.000009       0.001042
getxattr             -218316996       0.000154  628416.355002
removexattr                15       0.000003       0.000042
statfs                2624067       0.000082     214.306646
fastOpen             1456944934       0.000000       0.000000
fastClose            1515612004       0.000000       0.000000
fastLookup           77981387       0.000000       0.000000
fastRead             -922882405       0.000000       0.000000
fastWrite            102606402       0.000000       0.000000
revalidate             899677       0.000000       0.000000
aio write sync       21331080       0.000061    1309.773528

Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413

 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMBAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=E7LxwAh9JTywdNmlR4hPn1wTtOyXwRkz58hDagk2e5Q&s=-JBAQJMfdFh_IhAOF7sP7unAJAVqzMfyggCcczB4ctM&e=>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170420/01053a70/attachment.htm>

From scale at us.ibm.com  Thu Apr 20 02:03:16 2017
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Wed, 19 Apr 2017 21:03:16 -0400
Subject: [gpfsug-discuss] mmfsadm dump vfsstats - negative values
In-Reply-To: <C7023A2A-BEE4-4323-9C97-A7221623D1DA@nuance.com>
References: <C7023A2A-BEE4-4323-9C97-A7221623D1DA@nuance.com>
Message-ID: <OF47E0AAA8.7500DF39-ON85258108.00059D3F-85258108.0005CB17@us.ibm.com>

Thanks Bob.

Yes, it looks good for the hypothesis.

ZIMon gets its VFSS stats from the mmpmon code that we just exercised with 
"mmfsadm eventsExporter mmpmon vfss"; so the ZIMon stats are also probably 
correct.

Having said that, I agree with you that the "mmfsadm vfsstats" problem is 
a bug that should be fixed.

If you would like to open a PMR so an APAR gets generated, it might help 
speed the routing of the PMR if you include in the PMR text our email 
exchange, and highlight Eric Agar is the GPFS developer with whom you've 
already discussed this issue.  You could also mention that I believe I 
have no need for a gpfs snap.  Having an APAR will help ensure the fix 
makes it into a PTF for the release you are using.

If you do not want to open a PMR, I still intend to fix the problem in the 
development stream.

Thanks again.
Regards, The Spectrum Scale (GPFS) team
Eric Agar

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:     IBM Spectrum Scale/Poughkeepsie/IBM at IBMUS
Date:   04/19/2017 08:21 PM
Subject:        Re: [gpfsug-discuss] mmfsadm dump vfsstats - negative 
values


Hi Eric
 
Looks like your assumption is correct - no negative values from ?mmfsadm 
eventsExporter mmpmon vfss?. I don?t normally view these via ?mmfsadm?, I 
use the zimon stats. But, It?s a bug that should be fixed.
 
What?s the best way to get this fixed?
 
root at cnt-r01r07u15 ~]# mmfsadm eventsExporter mmpmon vfss
_response_ begin mmpmon vfss
_mmpmon::vfss_ _n_ 10.30.100.193 _nn_ cnt-r01r07u15 _rc_ 0 _t_ 1492647309 
_tu_ 311964 _access_ 8472897 56529.874886 _close_ 1460223848 49854.938090 
_create_ 2101927 155055.515041 _fclear_ 0 0.000000 _fsync_ 20 0.024288 
_fsync_range_ 0 0.000000 _ftrunc_ 0 0.000000 _getattr_ 859626332 
101183.720281 _link_ 2175473 625.343799 _lockctl_ 17326 5.229828 _lookup_ 
200378610 1201985.264220 _map_lloff_ 220854519 8561.860515 _mkdir_ 817943 
217390.170859 _mknod_ 3 0.001422 _open_ 1460217712 134812.649162 _read_ 
3883163461 3971457.463527 _write_ 186078410 137927.496812 _mmapRead_ 
17108947 10665.929860 _mmapWrite_ 0 0.000000 _aioRead_ 0 0.000000 
_aioWrite_ 0 0.000000 _readdir_ 142262897 6999.189450 _readlink_ 485337171 
2111.634286 _readpage_ 3646233600 14346.331414 _remove_ 4241324 
93277.463798 _rename_ 350679 19334.235924 _rmdir_ 342042 2736.048976 
_setacl_ 0 0.000000 _setattr_ 3709289 16963.901179 _symlink_ 161336 
8522.670079 _unmap_ 3929805828 1735.740690 _writepage_ 0 0.000000 
_tsfattr_ 0 0.000000 _tsfsattr_ 0 0.000000 _flock_ 0 0.000000 _setxattr_ 
119 0.001042 _getxattr_ 4077218348 628418.213008 _listxattr_ 0 0.000000 
_removexattr_ 15 0.000042 _encode_fh_ 0 0.000000 _decode_fh_ 0 0.000000 
_get_dentry_ 0 0.000000 _get_parent_ 0 0.000000 _mount_ 0 0.000000 
_statfs_ 2625497 214.309671 _sync_ 0 0.000000 _vget_ 0 0.000000
_response_ end
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413
 
 
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of IBM Spectrum 
Scale <scale at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, April 19, 2017 at 7:10 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss-bounces at spectrumscale.org" 
<gpfsug-discuss-bounces at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] mmfsadm dump vfsstats - negative 
values
 
Bob, 

I also noticed this recently.  I think it may be a simple matter of a 
printf()-like statement in the code that handles "mmfsadm vfsstats" using 
an incorrect conversion specifier --- one that treats the counter as 
signed instead of unsigned and treats the counter as being smaller than it 
really is. 

To help confirm that hypothesis, could you please run the following 
commands on the node, at the same time, so the output can be compared: 

# mmfsadm vfsstats 

# mmfsadm eventsExporter mmpmon vfss 

I believe the code that handles "mmfsadm eventsExporter mmpmon vfss" uses 
the correct printf()-like conversion specifier.  So, it should so good 
numbers where "mmfsadm vfsstats" shows negative numbers. 

Regards, The Spectrum Scale (GPFS) team 
Eric Agar

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team. 


From:        "Oesterlin, Robert" <Robert.Oesterlin at nuance.com> 
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org> 
Date:        04/19/2017 07:16 PM 
Subject:        [gpfsug-discuss] mmfsadm dump vfsstats - negative values 
Sent by:        gpfsug-discuss-bounces at spectrumscale.org 


I assume the counter has wrapped on some of these - would a PMR fix this? 
(4.2.1) 
  
[root at cnt-r01r07u15 ~]# mmfsadm vfsstats 
vfs statistics currently enabled 
started at: Fri Jan 27 16:22:02.702 2017 
  duration: 7091405.800 sec 
  
name                    calls  time per call     total time 
-------------------- -------- -------------- -------------- 
access                8472691       0.006672   56529.863993 
close                1460175509       0.000034   49854.695358 
create                2101110       0.073797  155055.263775 
fsync                      20       0.001214       0.024288 
getattr              859449161       0.000118  101183.699413 
link                  2175473       0.000287     625.343799 
lockctl                 17326       0.000302       5.229828 
lookup               200369809       0.005999 1201980.046683 
map_lloff            220850355       0.000039    8561.791963 
mkdir                  817894       0.265793  217390.095681 
mknod                       3       0.000474       0.001422 
open                 1460169409       0.000092  134811.724068 
read                 -412143552       0.001023 3971403.879911 
write                164739329       0.000829  136616.948900 
mmapRead             17108252       0.000623   10665.877349 
readdir              142261835       0.000049    6999.159121 
readlink             485335656       0.000004    2111.627292 
readpage             -648839570       0.000004   14346.195128 
remove                4239806       0.022000   93277.124289 
rename                 350671       0.055135   19334.226490 
rmdir                  342019       0.008000    2736.037074 
setattr               3709237       0.004573   16963.899331 
symlink                160610       0.053061    8522.185175 
unmap                -365476297       0.000000    1735.669373 
setxattr                  119       0.000009       0.001042 
getxattr             -218316996       0.000154  628416.355002 
removexattr                15       0.000003       0.000042 
statfs                2624067       0.000082     214.306646 
fastOpen             1456944934       0.000000       0.000000 
fastClose            1515612004       0.000000       0.000000 
fastLookup           77981387       0.000000       0.000000 
fastRead             -922882405       0.000000       0.000000 
fastWrite            102606402       0.000000       0.000000 
revalidate             899677       0.000000       0.000000 
aio write sync       21331080       0.000061    1309.773528 
  
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413 
  
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170419/34c5a2ad/attachment.htm>

From UWEFALKE at de.ibm.com  Thu Apr 20 09:11:15 2017
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 20 Apr 2017 10:11:15 +0200
Subject: [gpfsug-discuss] RAID config for SSD's used for data
In-Reply-To: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
Message-ID: <OFC0FFE4FD.2BC2533D-ONC1258108.002BAD28-C1258108.002CF97E@notes.na.collabserv.com>

Some thoughts: 
you give typical cumulative usage values. However, a fast pool might 
matter most for spikes of the traffic. Do you have spikes driving your 
current system to the edge? 

Then: using the SSD pool for writes is straightforward (placement), using 
it for reads will only pay off if data are either pre-fetched to the pool 
somehow, or read more than once before getting migrated back to the HDD 
pool(s). Write traffic is less than read as you wrote. 

RAID1 vs RAID6: RMW penalty of parity-based RAIDs was mentioned, which 
strikes at writes smaller than the full stripe width of your RAID - what 
type of write I/O do you have (or expect)? (This may also be important for 
choosing the quality of SSDs, with RMW in mind you will have a comparably 
huge amount of data written on the SSD devices if your I/O traffic 
consists of myriads of small IOs and you organized the SSDs in a RAID5 or 
RAID6)

I suppose your current system is well set to provide the required 
aggregate throughput. Now, what kind of improvement do you expect? How are 
the clients connected? Would they have sufficient network bandwidth to see 
improvements at all?


Mit freundlichen Gr??en / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: 
Andreas Hasse, Thorsten Moehring
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 


gpfsug-discuss-bounces at spectrumscale.org wrote on 04/19/2017 09:53:42 PM:

> From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 04/19/2017 09:54 PM
> Subject: [gpfsug-discuss] RAID config for SSD's used for data
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> Hi All, 
> 
> We currently have what I believe is a fairly typical setup ? 
> metadata for our GPFS filesystems is the only thing in the system 
> pool and it?s on SSD, while data is on spinning disk (RAID 6 LUNs). 
> Everything connected via 8 Gb FC SAN.  8 NSD servers.  Roughly 1 PB 
> usable space.
> 
> Now lets just say that you have a little bit of money to spend. 
> Your I/O demands aren?t great - in fact, they?re way on the low end 
> ? typical (cumulative) usage is 200 - 600 MB/sec read, less than 
> that for writes.  But while GPFS has always been great and therefore
> you don?t need to Make GPFS Great Again, you do want to provide your
> users with the best possible environment.
> 
> So you?re considering the purchase of a dual-controller FC storage 
> array with 12 or so 1.8 TB SSD?s in it, with the idea being that 
> that storage would be in its? own storage pool and that pool would 
> be the default location for I/O for your main filesystem ? at least 
> for smaller files.  You intend to use mmapplypolicy nightly to move 
> data to / from this pool and the spinning disk pools.
> 
> Given all that ? would you configure those disks as 6 RAID 1 mirrors
> and have 6 different primary NSD servers or would it be feasible to 
> configure one big RAID 6 LUN?  I?m thinking the latter is not a good
> idea as there could only be one primary NSD server for that one LUN,
> but given that:  1) I have no experience with this, and 2) I have 
> been wrong once or twice before (<grin>), I?m looking for advice. 
Thanks!
> 
> ?
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and 
Education
> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From jonathan at buzzard.me.uk  Thu Apr 20 10:25:40 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Thu, 20 Apr 2017 10:25:40 +0100
Subject: [gpfsug-discuss] RAID config for SSD's used for data
In-Reply-To: <4f16617c-0ae9-18ef-bfb5-206507762fd9@stanford.edu>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
	<4f16617c-0ae9-18ef-bfb5-206507762fd9@stanford.edu>
Message-ID: <1492680340.4102.120.camel@buzzard.me.uk>

On Wed, 2017-04-19 at 14:23 -0700, Alex Chekholko wrote:
> On 04/19/2017 12:53 PM, Buterbaugh, Kevin L wrote:
> >
> > So you?re considering the purchase of a dual-controller FC storage array
> > with 12 or so 1.8 TB SSD?s in it, with the idea being that that storage
> > would be in its? own storage pool and that pool would be the default
> > location for I/O for your main filesystem ? at least for smaller files.
> >  You intend to use mmapplypolicy nightly to move data to / from this
> > pool and the spinning disk pools.
> 
> We did this and failed in interesting (but in retrospect obvious) ways. 
> You will want to ensure that your users cannot fill your write target 
> pool within a day.  The faster the storage, the more likely that is to 
> happen.  Or else your users will get ENOSPC.

Eh? Seriously you should have a fail over rule so that when your "fast"
pool is filled up it starts allocating in the "slow" pool (nice good
names that are descriptive and less than 8 characters including
termination character). Now there are issues when you get close to very
full so you need to set the fail over to as sizeable bit less than the
full size, 95% is a good starting point.

The pool names size is important because if the fast pool is less than
eight characters and the slow is more because you called in
"nearline" (which is 9 including termination character) once the files
get moved they get backed up again by TSM, yeah!!!

The 95% bit comes about from this. Imagine you had 12KB left in the fast
pool and you go to write a file. You open the file with 0B in size and
then start writing. At 12KB you run out of space in the fast pool and as
the file can only be in one pool you get a ENOSPC, and the file gets
canned. This then starts repeating on a regular basis.

So if you start allocating at significantly less than 100%, say 95%
where that 5% is larger than the largest file you expect that file
works, but all subsequent files get allocated in the slow pool, till you
flush the fast pool.

Something like this as the last two rules in your policy should do the
trick.

/* by default new files to the fast disk unless full, then to slow */
RULE 'new' SET POOL 'fast' LIMIT(95)
RULE 'spillover' SET POOL 'slow'

However in general your fast pool needs to have sufficient capacity to
take your daily churn and then some.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From jonathan at buzzard.me.uk  Thu Apr 20 10:32:20 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Thu, 20 Apr 2017 10:32:20 +0100
Subject: [gpfsug-discuss] RAID config for SSD's used for data
In-Reply-To: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DCD87@EX13.adf.bham.ac.uk>
References: <F0E9362A-6632-468D-B177-C48D7A1DA799@vanderbilt.edu>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DCD87@EX13.adf.bham.ac.uk>
Message-ID: <1492680740.4102.126.camel@buzzard.me.uk>

On Wed, 2017-04-19 at 20:05 +0000, Simon Thompson (IT Research Support)
wrote:
> By having many LUNs, you get many IO queues for Linux to play with. Also the raid6 overhead can be quite significant, so it might be better to go with raid1 anyway depending on the controller...
> 
> And if only gpfs had some sort of auto tier back up the pools for hot or data caching :-)
> 

If you have sized the "fast" pool correctly then the "slow" pool will be
spending most of it's time doing diddly squat, aka under 10 IOPS per
second unless you are flushing the pool of old files to make space. I
have graphs that show this.

Then two things happen, if you are just reading the file then fine,
probably coming from the cache or the disks are not very busy anyway so
you won't notice.

If you happen to *change* the file and start doing things actively with
it again, then because most programs approach this by creating an
entirely new file with a temporary name, then doing a rename and delete
shuffle so a crash will leave you with a valid file somewhere then the
changed version ends up on the fast disk by virtue of being a new file.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From p.childs at qmul.ac.uk  Thu Apr 20 12:38:09 2017
From: p.childs at qmul.ac.uk (Peter Childs)
Date: Thu, 20 Apr 2017 11:38:09 +0000
Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
In-Reply-To: <D512BA7E.3A481%s.j.thompson@bham.ac.uk>
References: <D512BA7E.3A481%s.j.thompson@bham.ac.uk>
Message-ID: <HE1PR0701MB25546E2AA1C4765A1E99DEC0A41B0@HE1PR0701MB2554.eurprd07.prod.outlook.com>

Simon,

We've managed to resolve this issue by switching off quota's and switching them back on again and rebuilding the quota file.

Can I check if you run quota's on your cluster.

See you 2 weeks in Manchester

Thanks in advance.

Peter Childs
Research Storage Expert
ITS Research Infrastructure
Queen Mary, University of London
Phone: 020 7882 8393

________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson (IT Research Support) <S.J.Thompson at bham.ac.uk>
Sent: Tuesday, April 11, 2017 4:55:35 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

We actually saw this for a while on one of our clusters which was new. But
by the time I'd got round to looking deeper, it had gone, maybe we were
using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2,
so might be worth trying to bump the version and see if it goes away.

We saw it on the NSD servers directly as well, so not some client trying
to talk to it, so maybe there was some buggy code?

Simon

On 11/04/2017, 16:51, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of Bryan Banister" <gpfsug-discuss-bounces at spectrumscale.org on behalf of
bbanister at jumptrading.com> wrote:

>There are so many things to look at and many tools for doing so (iostat,
>htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would
>recommend a review of the presentation that Yuri gave at the most recent
>GPFS User Group:
>https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs
>
>Cheers,
>-Bryan
>
>-----Original Message-----
>From: gpfsug-discuss-bounces at spectrumscale.org
>[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter
>Childs
>Sent: Tuesday, April 11, 2017 3:58 AM
>To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
>
>This is a curious issue which I'm trying to get to the bottom of.
>
>We currently have two Spectrum Scale file systems, both are running GPFS
>4.2.1-1 some of the servers have been upgraded to 4.2.1-2.
>
>The older one which was upgraded from GPFS 3.5 works find create a
>directory is always fast and no issue.
>
>The new one, which has nice new SSD for metadata and hence should be
>faster. can take up to 30 seconds to create a directory but usually takes
>less than a second, The longer directory creates usually happen on busy
>nodes that have not used the new storage in a while. (Its new so we've
>not moved much of the data over yet) But it can also happen randomly
>anywhere, including from the NSD servers them selves. (times of 3-4
>seconds from the NSD servers have been seen, on a single directory create)
>
>We've been pointed at the network and suggested we check all network
>settings, and its been suggested to build an admin network, but I'm not
>sure I entirely understand why and how this would help. Its a mixed
>1G/10G network with the NSD servers connected at 40G with an MTU of 9000.
>
>However as I say, the older filesystem is fine, and it does not matter if
>the nodes are connected to the old GPFS cluster or the new one, (although
>the delay is worst on the old gpfs cluster), So I'm really playing spot
>the difference. and the network is not really an obvious difference.
>
>Its been suggested to look at a trace when it occurs but as its difficult
>to recreate collecting one is difficult.
>
>Any ideas would be most helpful.
>
>Thanks
>
>
>
>Peter Childs
>ITS Research Infrastructure
>Queen Mary, University of London
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>________________________________
>
>Note: This email is for the confidential use of the named addressee(s)
>only and may contain proprietary, confidential or privileged information.
>If you are not the intended recipient, you are hereby notified that any
>review, dissemination or copying of this email is strictly prohibited,
>and to please notify the sender immediately and destroy this email and
>any attachments. Email transmission cannot be guaranteed to be secure or
>error-free. The Company, therefore, does not make any guarantees as to
>the completeness or accuracy of this email or any attachments. This email
>is for informational purposes only and does not constitute a
>recommendation, offer, request or solicitation of any kind to buy, sell,
>subscribe, redeem or perform any type of transaction of a financial
>product.
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From kenneth.waegeman at ugent.be  Thu Apr 20 15:53:29 2017
From: kenneth.waegeman at ugent.be (Kenneth Waegeman)
Date: Thu, 20 Apr 2017 16:53:29 +0200
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk>
	<2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov>
	<CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
Message-ID: <4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>

Hi,


Having an issue that looks the same as this one:

We can do sequential writes to the filesystem at 7,8 GB/s total , which 
is the expected speed for our current storage
backend.  While we have even better performance with sequential reads on 
raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd 
server seems limited by 0,5GB/s) independent of the number of clients
(1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, 
MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed 
in this thread, but nothing seems to impact this read performance.

Any ideas?

Thanks!

Kenneth

On 17/02/17 19:29, Jan-Frode Myklebust wrote:
> I just had a similar experience from a sandisk infiniflash system 
> SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for 
> writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on 
> the order of 2 Gbyte/s.
>
> After a bit head scratching snd fumbling around I found out that 
> reducing maxMBpS from 10000 to 100 fixed the problem! Digging further 
> I found that reducing prefetchThreads from default=72 to 32 also fixed 
> it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.
>
> Could something like this be the problem on your box as well?
>
>
>
> -jf
> fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister 
> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>:
>
>     Well, I'm somewhat scrounging for hardware. This is in our test
>     environment :) And yep, it's got the 2U gpu-tray in it although even
>     without the riser it has 2 PCIe slots onboard (excluding the on-board
>     dual-port mezz card) so I think it would make a fine NSD server even
>     without the riser.
>
>     -Aaron
>
>     On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
>     wrote:
>     > Maybe its related to interrupt handlers somehow? You drive the
>     load up on one socket, you push all the interrupt handling to the
>     other socket where the fabric card is attached?
>     >
>     > Dunno ... (Though I am intrigued you use idataplex nodes as NSD
>     servers, I assume its some 2U gpu-tray riser one or something !)
>     >
>     > Simon
>     > ________________________________________
>     > From: gpfsug-discuss-bounces at spectrumscale.org
>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>
>     [gpfsug-discuss-bounces at spectrumscale.org
>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of
>     Aaron Knister [aaron.s.knister at nasa.gov
>     <mailto:aaron.s.knister at nasa.gov>]
>     > Sent: 17 February 2017 15:52
>     > To: gpfsug main discussion list
>     > Subject: [gpfsug-discuss] bizarre performance behavior
>     >
>     > This is a good one. I've got an NSD server with 4x 16GB fibre
>     > connections coming in and 1x FDR10 and 1x QDR connection going
>     out to
>     > the clients. I was having a really hard time getting anything
>     resembling
>     > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
>     > reads). The back-end is a DDN SFA12K and I *know* it can do
>     better than
>     > that.
>     >
>     > I don't remember quite how I figured this out but simply by running
>     > "openssl speed -multi 16" on the nsd server to drive up the load
>     I saw
>     > an almost 4x performance jump which is pretty much goes against
>     every
>     > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated
>     crap to
>     > quadruple your i/o performance").
>     >
>     > This feels like some type of C-states frequency scaling
>     shenanigans that
>     > I haven't quite ironed down yet. I booted the box with the following
>     > kernel parameters "intel_idle.max_cstate=0
>     processor.max_cstate=0" which
>     > didn't seem to make much of a difference. I also tried setting the
>     > frequency governer to userspace and setting the minimum frequency to
>     > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I
>     still have
>     > to run something to drive up the CPU load and then performance
>     improves.
>     >
>     > I'm wondering if this could be an issue with the C1E state? I'm
>     curious
>     > if anyone has seen anything like this. The node is a dx360 M4
>     > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>     >
>     > -Aaron
>     >
>     > --
>     > Aaron Knister
>     > NASA Center for Climate Simulation (Code 606.2)
>     > Goddard Space Flight Center
>     > (301) 286-2776
>     > _______________________________________________
>     > gpfsug-discuss mailing list
>     > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>     > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>     > _______________________________________________
>     > gpfsug-discuss mailing list
>     > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>     > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>     >
>
>     --
>     Aaron Knister
>     NASA Center for Climate Simulation (Code 606.2)
>     Goddard Space Flight Center
>     (301) 286-2776
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170420/474c7225/attachment.htm>

From aaron.s.knister at nasa.gov  Thu Apr 20 16:04:20 2017
From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP])
Date: Thu, 20 Apr 2017 15:04:20 +0000
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk>
	<2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov>
	<CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>,
	<4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
Message-ID: <67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov>

Interesting. Could you share a little more about your architecture? Is it possible to mount the fs on an NSD server and do some dd's from the fs on the NSD server? If that gives you decent performance perhaps try NSDPERF next https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf

-Aaron


On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman <kenneth.waegeman at ugent.be> wrote:

Hi,


Having an issue that looks the same as this one:

We can do sequential writes to the filesystem at 7,8 GB/s total , which is the expected speed for our current storage
backend.  While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients
(1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in this thread, but nothing seems to impact this read performance.

Any ideas?

Thanks!

Kenneth

On 17/02/17 19:29, Jan-Frode Myklebust wrote:
I just had a similar experience from a sandisk infiniflash system SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on the order of 2 Gbyte/s.

After a bit head scratching snd fumbling around I found out that reducing maxMBpS from 10000 to 100 fixed the problem! Digging further I found that reducing prefetchThreads from default=72 to 32 also fixed it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.

Could something like this be the problem on your box as well?


-jf
fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <aaron.s.knister at nasa.gov<mailto:aaron.s.knister at nasa.gov>>:
Well, I'm somewhat scrounging for hardware. This is in our test
environment :) And yep, it's got the 2U gpu-tray in it although even
without the riser it has 2 PCIe slots onboard (excluding the on-board
dual-port mezz card) so I think it would make a fine NSD server even
without the riser.

-Aaron

On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
wrote:
> Maybe its related to interrupt handlers somehow? You drive the load up on one socket, you push all the interrupt handling to the other socket where the fabric card is attached?
>
> Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, I assume its some 2U gpu-tray riser one or something !)
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of Aaron Knister [aaron.s.knister at nasa.gov<mailto:aaron.s.knister at nasa.gov>]
> Sent: 17 February 2017 15:52
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] bizarre performance behavior
>
> This is a good one. I've got an NSD server with 4x 16GB fibre
> connections coming in and 1x FDR10 and 1x QDR connection going out to
> the clients. I was having a really hard time getting anything resembling
> sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> reads). The back-end is a DDN SFA12K and I *know* it can do better than
> that.
>
> I don't remember quite how I figured this out but simply by running
> "openssl speed -multi 16" on the nsd server to drive up the load I saw
> an almost 4x performance jump which is pretty much goes against every
> sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> quadruple your i/o performance").
>
> This feels like some type of C-states frequency scaling shenanigans that
> I haven't quite ironed down yet. I booted the box with the following
> kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> didn't seem to make much of a difference. I also tried setting the
> frequency governer to userspace and setting the minimum frequency to
> 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> to run something to drive up the CPU load and then performance improves.
>
> I'm wondering if this could be an issue with the C1E state? I'm curious
> if anyone has seen anything like this. The node is a dx360 M4
> (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170420/387f6ab0/attachment.htm>

From UWEFALKE at de.ibm.com  Thu Apr 20 16:07:32 2017
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 20 Apr 2017 17:07:32 +0200
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov><CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk><2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
	<4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
Message-ID: <OFCC6C4333.D1C726CA-ONC1258108.0052DA16-C1258108.005315EA@notes.na.collabserv.com>

Hi Kennmeth, 

is prefetching off or on  at your storage backend?
Raw sequential is very different from GPFS sequential at the storage 
device !
GPFS does its own prefetching, the storage would never know what sectors 
sequential read at GPFS level maps to at storage level!

 
Mit freundlichen Gr??en / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: 
Andreas Hasse, Thorsten Moehring
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 


From:   Kenneth Waegeman <kenneth.waegeman at ugent.be>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/20/2017 04:53 PM
Subject:        Re: [gpfsug-discuss] bizarre performance behavior
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,

Having an issue that looks the same as this one: 
We can do sequential writes to the filesystem at 7,8 GB/s total , which is 
the expected speed for our current storage    
backend.  While we have even better performance with sequential reads on 
raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd 
server seems limited by 0,5GB/s) independent of the number of clients   
(1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, 
MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in 
this thread, but nothing seems to impact this read performance. 
Any ideas?
Thanks!

Kenneth

On 17/02/17 19:29, Jan-Frode Myklebust wrote:
I just had a similar experience from a sandisk infiniflash system 
SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. 
and 250-300 Mbyte/s on sequential reads!! Random reads were on the order 
of 2 Gbyte/s.

After a bit head scratching snd fumbling around I found out that reducing 
maxMBpS from 10000 to 100 fixed the problem! Digging further I found that 
reducing prefetchThreads from default=72 to 32 also fixed it, while 
leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.

Could something like this be the problem on your box as well?


-jf
fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <aaron.s.knister at nasa.gov
>:
Well, I'm somewhat scrounging for hardware. This is in our test
environment :) And yep, it's got the 2U gpu-tray in it although even
without the riser it has 2 PCIe slots onboard (excluding the on-board
dual-port mezz card) so I think it would make a fine NSD server even
without the riser.

-Aaron

On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
wrote:
> Maybe its related to interrupt handlers somehow? You drive the load up 
on one socket, you push all the interrupt handling to the other socket 
where the fabric card is attached?
>
> Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, 
I assume its some 2U gpu-tray riser one or something !)
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org [
gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [
aaron.s.knister at nasa.gov]
> Sent: 17 February 2017 15:52
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] bizarre performance behavior
>
> This is a good one. I've got an NSD server with 4x 16GB fibre
> connections coming in and 1x FDR10 and 1x QDR connection going out to
> the clients. I was having a really hard time getting anything resembling
> sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> reads). The back-end is a DDN SFA12K and I *know* it can do better than
> that.
>
> I don't remember quite how I figured this out but simply by running
> "openssl speed -multi 16" on the nsd server to drive up the load I saw
> an almost 4x performance jump which is pretty much goes against every
> sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> quadruple your i/o performance").
>
> This feels like some type of C-states frequency scaling shenanigans that
> I haven't quite ironed down yet. I booted the box with the following
> kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> didn't seem to make much of a difference. I also tried setting the
> frequency governer to userspace and setting the minimum frequency to
> 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> to run something to drive up the CPU load and then performance improves.
>
> I'm wondering if this could be an issue with the C1E state? I'm curious
> if anyone has seen anything like this. The node is a dx360 M4
> (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From marcusk at nz1.ibm.com  Fri Apr 21 02:21:51 2017
From: marcusk at nz1.ibm.com (Marcus Koenig1)
Date: Fri, 21 Apr 2017 14:21:51 +1300
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <OF204655C9.8F275E1B-ON00258108.00531C81@LocalDomain>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov><CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk><2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com><4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
	<OF204655C9.8F275E1B-ON00258108.00531C81@LocalDomain>
Message-ID: <OF9EF6AE8D.E66B3638-ONCC258109.000752BA-CC258109.00077E7B@notes.na.collabserv.com>


Hi Kennmeth,

we also had similar performance numbers in our tests. Native was far
quicker than through GPFS. When we learned though that the client tested
the performance on the FS at a big blocksize (512k) with small files - we
were able to speed it up significantly using a smaller FS blocksize
(obviously we had to recreate the FS).

So really depends on how you do your tests.


Cheers,

Marcus Koenig
Lab Services Storage & Power Specialist
IBM Australia & New Zealand Advanced Technical Skills
IBM Systems-Hardware
|---------------+------------------------------------------+-------------------------------------------------------------------------------->
|               |                                          |                                                                                |
|---------------+------------------------------------------+-------------------------------------------------------------------------------->
  >--------------------------------------------------------------------------------|
  |                                                                                |
  >--------------------------------------------------------------------------------|
|---------------+------------------------------------------+-------------------------------------------------------------------------------->
|               |                                          |                                                                                |
|               |Mobile: +64 21 67 34 27                   |                                                                                |
|               |E-mail: marcusk at nz1.ibm.com               |                                                                                |
|               |                                          |                                                                                |
|               |                                          |                                                                                |
|               |                                          |                                                                                |
|               |82 Wyndham Street                         |                                                                                |
|               |Auckland, AUK 1010                        |                                                                                |
|               |New Zealand                               |                                                                                |
|               |                                          |                                                                                |
|               |                                          |                                                                                |
|               |                                          |                                                                                |
|               |                                          |                                                                                |
|               |                                          |                                                                                |
|---------------+------------------------------------------+-------------------------------------------------------------------------------->
  >--------------------------------------------------------------------------------|
  |                                                                                |
  >--------------------------------------------------------------------------------|
|---------------+------------------------------------------+-------------------------------------------------------------------------------->
|               |                                          |                                                                                |
|---------------+------------------------------------------+-------------------------------------------------------------------------------->
  >--------------------------------------------------------------------------------|
  |                                                                                |
  >--------------------------------------------------------------------------------|


From:	"Uwe Falke" <UWEFALKE at de.ibm.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	04/21/2017 03:07 AM
Subject:	Re: [gpfsug-discuss] bizarre performance behavior
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi Kennmeth,

is prefetching off or on  at your storage backend?
Raw sequential is very different from GPFS sequential at the storage
device !
GPFS does its own prefetching, the storage would never know what sectors
sequential read at GPFS level maps to at storage level!


Mit freundlichen Gr??en / Kind regards


Dr. Uwe Falke

IT Specialist
High Performance Computing Services / Integrated Technology Services /
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------

IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------

IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung:
Andreas Hasse, Thorsten Moehring
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 17122


From:   Kenneth Waegeman <kenneth.waegeman at ugent.be>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/20/2017 04:53 PM
Subject:        Re: [gpfsug-discuss] bizarre performance behavior
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,

Having an issue that looks the same as this one:
We can do sequential writes to the filesystem at 7,8 GB/s total , which is
the expected speed for our current storage
backend.  While we have even better performance with sequential reads on
raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd
server seems limited by 0,5GB/s) independent of the number of clients
(1,2,4,..) or ways we tested (fio,dd). We played with blockdev params,
MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in
this thread, but nothing seems to impact this read performance.
Any ideas?
Thanks!

Kenneth

On 17/02/17 19:29, Jan-Frode Myklebust wrote:
I just had a similar experience from a sandisk infiniflash system
SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes.
and 250-300 Mbyte/s on sequential reads!! Random reads were on the order
of 2 Gbyte/s.

After a bit head scratching snd fumbling around I found out that reducing
maxMBpS from 10000 to 100 fixed the problem! Digging further I found that
reducing prefetchThreads from default=72 to 32 also fixed it, while
leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.

Could something like this be the problem on your box as well?


-jf
fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <aaron.s.knister at nasa.gov
>:
Well, I'm somewhat scrounging for hardware. This is in our test
environment :) And yep, it's got the 2U gpu-tray in it although even
without the riser it has 2 PCIe slots onboard (excluding the on-board
dual-port mezz card) so I think it would make a fine NSD server even
without the riser.

-Aaron

On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
wrote:
> Maybe its related to interrupt handlers somehow? You drive the load up
on one socket, you push all the interrupt handling to the other socket
where the fabric card is attached?
>
> Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers,
I assume its some 2U gpu-tray riser one or something !)
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org [
gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [
aaron.s.knister at nasa.gov]
> Sent: 17 February 2017 15:52
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] bizarre performance behavior
>
> This is a good one. I've got an NSD server with 4x 16GB fibre
> connections coming in and 1x FDR10 and 1x QDR connection going out to
> the clients. I was having a really hard time getting anything resembling
> sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> reads). The back-end is a DDN SFA12K and I *know* it can do better than
> that.
>
> I don't remember quite how I figured this out but simply by running
> "openssl speed -multi 16" on the nsd server to drive up the load I saw
> an almost 4x performance jump which is pretty much goes against every
> sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> quadruple your i/o performance").
>
> This feels like some type of C-states frequency scaling shenanigans that
> I haven't quite ironed down yet. I booted the box with the following
> kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> didn't seem to make much of a difference. I also tried setting the
> frequency governer to userspace and setting the minimum frequency to
> 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> to run something to drive up the CPU load and then performance improves.
>
> I'm wondering if this could be an issue with the C1E state? I'm curious
> if anyone has seen anything like this. The node is a dx360 M4
> (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/247b7c26/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/247b7c26/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 17773863.gif
Type: image/gif
Size: 3720 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/247b7c26/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 17405449.jpg
Type: image/jpeg
Size: 2741 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/247b7c26/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 17997200.gif
Type: image/gif
Size: 13421 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/247b7c26/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/247b7c26/attachment-0003.gif>

From olaf.weiser at de.ibm.com  Fri Apr 21 08:25:22 2017
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Fri, 21 Apr 2017 09:25:22 +0200
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <OF9EF6AE8D.E66B3638-ONCC258109.000752BA-CC258109.00077E7B@notes.na.collabserv.com>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov><CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk><2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com><4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be><OF204655C9.8F275E1B-ON00258108.00531C81@LocalDomain>
	<OF9EF6AE8D.E66B3638-ONCC258109.000752BA-CC258109.00077E7B@notes.na.collabserv.com>
Message-ID: <OF80C82C90.0838F149-ONC1258109.0027B72A-C1258109.0028C647@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 3720 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 2741 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 13421 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment-0005.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment-0006.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ba03e99e/attachment-0007.gif>

From kenneth.waegeman at ugent.be  Fri Apr 21 10:43:25 2017
From: kenneth.waegeman at ugent.be (Kenneth Waegeman)
Date: Fri, 21 Apr 2017 11:43:25 +0200
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk>
	<2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov>
	<CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
	<4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
	<67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov>
Message-ID: <9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be>

Hi,

We are running a test setup with 2 NSD Servers backed by 4 Dell 
Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of 
the 4 powervaults, nsd02 is primary serving LUNS of controller B.

We are testing from 2 testing machines connected to the nsds with 
infiniband, verbs enabled.

When we do dd from the NSD servers, we see indeed performance going to 
5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able 
to get the data at a decent speed. Since we can write from the clients 
at a good speed, I didn't suspect the communication between clients and 
nsds being the issue, especially since total performance stays the same 
using 1 or multiple clients.

I'll use the nsdperf tool to see if we can find anything,

thanks!

K

On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] 
wrote:
> Interesting. Could you share a little more about your architecture? Is 
> it possible to mount the fs on an NSD server and do some dd's from the 
> fs on the NSD server? If that gives you decent performance perhaps try 
> NSDPERF next 
> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf 
>
>
> -Aaron
>
>
>
>
> On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman 
> <kenneth.waegeman at ugent.be> wrote:
>>
>> Hi,
>>
>>
>> Having an issue that looks the same as this one:
>>
>> We can do sequential writes to the filesystem at 7,8 GB/s total , 
>> which is the expected speed for our current storage
>> backend.  While we have even better performance with sequential reads 
>> on raw storage LUNS, using GPFS we can only reach 1GB/s in total 
>> (each nsd server seems limited by 0,5GB/s) independent of the number 
>> of clients
>> (1,2,4,..) or ways we tested (fio,dd). We played with blockdev 
>> params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as 
>> discussed in this thread, but nothing seems to impact this read 
>> performance.
>>
>> Any ideas?
>>
>> Thanks!
>>
>> Kenneth
>>
>> On 17/02/17 19:29, Jan-Frode Myklebust wrote:
>>> I just had a similar experience from a sandisk infiniflash system 
>>> SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for 
>>> writes. and 250-300 Mbyte/s on sequential reads!! Random reads were 
>>> on the order of 2 Gbyte/s.
>>>
>>> After a bit head scratching snd fumbling around I found out that 
>>> reducing maxMBpS from 10000 to 100 fixed the problem! Digging 
>>> further I found that reducing prefetchThreads from default=72 to 32 
>>> also fixed it, while leaving maxMBpS at 10000. Can now also read at 
>>> 3,2 GByte/s.
>>>
>>> Could something like this be the problem on your box as well?
>>>
>>>
>>>
>>> -jf
>>> fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister 
>>> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>:
>>>
>>>     Well, I'm somewhat scrounging for hardware. This is in our test
>>>     environment :) And yep, it's got the 2U gpu-tray in it although even
>>>     without the riser it has 2 PCIe slots onboard (excluding the
>>>     on-board
>>>     dual-port mezz card) so I think it would make a fine NSD server even
>>>     without the riser.
>>>
>>>     -Aaron
>>>
>>>     On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT
>>>     Services)
>>>     wrote:
>>>     > Maybe its related to interrupt handlers somehow? You drive the
>>>     load up on one socket, you push all the interrupt handling to
>>>     the other socket where the fabric card is attached?
>>>     >
>>>     > Dunno ... (Though I am intrigued you use idataplex nodes as
>>>     NSD servers, I assume its some 2U gpu-tray riser one or something !)
>>>     >
>>>     > Simon
>>>     > ________________________________________
>>>     > From: gpfsug-discuss-bounces at spectrumscale.org
>>>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>
>>>     [gpfsug-discuss-bounces at spectrumscale.org
>>>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of
>>>     Aaron Knister [aaron.s.knister at nasa.gov
>>>     <mailto:aaron.s.knister at nasa.gov>]
>>>     > Sent: 17 February 2017 15:52
>>>     > To: gpfsug main discussion list
>>>     > Subject: [gpfsug-discuss] bizarre performance behavior
>>>     >
>>>     > This is a good one. I've got an NSD server with 4x 16GB fibre
>>>     > connections coming in and 1x FDR10 and 1x QDR connection going
>>>     out to
>>>     > the clients. I was having a really hard time getting anything
>>>     resembling
>>>     > sensible performance out of it (4-5Gb/s writes but maybe
>>>     1.2Gb/s for
>>>     > reads). The back-end is a DDN SFA12K and I *know* it can do
>>>     better than
>>>     > that.
>>>     >
>>>     > I don't remember quite how I figured this out but simply by
>>>     running
>>>     > "openssl speed -multi 16" on the nsd server to drive up the
>>>     load I saw
>>>     > an almost 4x performance jump which is pretty much goes
>>>     against every
>>>     > sysadmin fiber in me (i.e. "drive up the cpu load with
>>>     unrelated crap to
>>>     > quadruple your i/o performance").
>>>     >
>>>     > This feels like some type of C-states frequency scaling
>>>     shenanigans that
>>>     > I haven't quite ironed down yet. I booted the box with the
>>>     following
>>>     > kernel parameters "intel_idle.max_cstate=0
>>>     processor.max_cstate=0" which
>>>     > didn't seem to make much of a difference. I also tried setting the
>>>     > frequency governer to userspace and setting the minimum
>>>     frequency to
>>>     > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I
>>>     still have
>>>     > to run something to drive up the CPU load and then performance
>>>     improves.
>>>     >
>>>     > I'm wondering if this could be an issue with the C1E state?
>>>     I'm curious
>>>     > if anyone has seen anything like this. The node is a dx360 M4
>>>     > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>>>     >
>>>     > -Aaron
>>>     >
>>>     > --
>>>     > Aaron Knister
>>>     > NASA Center for Climate Simulation (Code 606.2)
>>>     > Goddard Space Flight Center
>>>     > (301) 286-2776
>>>     > _______________________________________________
>>>     > gpfsug-discuss mailing list
>>>     > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>>     > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>     > _______________________________________________
>>>     > gpfsug-discuss mailing list
>>>     > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>>     > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>     >
>>>
>>>     --
>>>     Aaron Knister
>>>     NASA Center for Climate Simulation (Code 606.2)
>>>     Goddard Space Flight Center
>>>     (301) 286-2776
>>>     _______________________________________________
>>>     gpfsug-discuss mailing list
>>>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/f965dfbe/attachment.htm>

From kenneth.waegeman at ugent.be  Fri Apr 21 10:50:55 2017
From: kenneth.waegeman at ugent.be (Kenneth Waegeman)
Date: Fri, 21 Apr 2017 11:50:55 +0200
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <OFCC6C4333.D1C726CA-ONC1258108.0052DA16-C1258108.005315EA@notes.na.collabserv.com>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk>
	<2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov>
	<CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
	<4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
	<OFCC6C4333.D1C726CA-ONC1258108.0052DA16-C1258108.005315EA@notes.na.collabserv.com>
Message-ID: <2b0824a1-e1a2-8dd8-4a55-a57d7b00e09f@ugent.be>

Hi,


prefetching was already disabled at our storage backend, but a good 
thing to recheck :)

thanks!


On 20/04/17 17:07, Uwe Falke wrote:
> Hi Kennmeth,
>
> is prefetching off or on  at your storage backend?
> Raw sequential is very different from GPFS sequential at the storage
> device !
> GPFS does its own prefetching, the storage would never know what sectors
> sequential read at GPFS level maps to at storage level!
>
>   
> Mit freundlichen Gr??en / Kind regards
>
>   
> Dr. Uwe Falke
>   
> IT Specialist
> High Performance Computing Services / Integrated Technology Services /
> Data Center Services
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland
> Rathausstr. 7
> 09111 Chemnitz
> Phone: +49 371 6978 2165
> Mobile: +49 175 575 2877
> E-Mail: uwefalke at de.ibm.com
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung:
> Andreas Hasse, Thorsten Moehring
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
> HRB 17122
>
>
>
>
> From:   Kenneth Waegeman <kenneth.waegeman at ugent.be>
> To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:   04/20/2017 04:53 PM
> Subject:        Re: [gpfsug-discuss] bizarre performance behavior
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
>
>
>
> Hi,
>
> Having an issue that looks the same as this one:
> We can do sequential writes to the filesystem at 7,8 GB/s total , which is
> the expected speed for our current storage
> backend.  While we have even better performance with sequential reads on
> raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd
> server seems limited by 0,5GB/s) independent of the number of clients
> (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params,
> MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in
> this thread, but nothing seems to impact this read performance.
> Any ideas?
> Thanks!
>
> Kenneth
>
> On 17/02/17 19:29, Jan-Frode Myklebust wrote:
> I just had a similar experience from a sandisk infiniflash system
> SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes.
> and 250-300 Mbyte/s on sequential reads!! Random reads were on the order
> of 2 Gbyte/s.
>
> After a bit head scratching snd fumbling around I found out that reducing
> maxMBpS from 10000 to 100 fixed the problem! Digging further I found that
> reducing prefetchThreads from default=72 to 32 also fixed it, while
> leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.
>
> Could something like this be the problem on your box as well?
>
>
>
> -jf
> fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <aaron.s.knister at nasa.gov
>> :
> Well, I'm somewhat scrounging for hardware. This is in our test
> environment :) And yep, it's got the 2U gpu-tray in it although even
> without the riser it has 2 PCIe slots onboard (excluding the on-board
> dual-port mezz card) so I think it would make a fine NSD server even
> without the riser.
>
> -Aaron
>
> On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
> wrote:
>> Maybe its related to interrupt handlers somehow? You drive the load up
> on one socket, you push all the interrupt handling to the other socket
> where the fabric card is attached?
>> Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers,
> I assume its some 2U gpu-tray riser one or something !)
>> Simon
>> ________________________________________
>> From: gpfsug-discuss-bounces at spectrumscale.org [
> gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [
> aaron.s.knister at nasa.gov]
>> Sent: 17 February 2017 15:52
>> To: gpfsug main discussion list
>> Subject: [gpfsug-discuss] bizarre performance behavior
>>
>> This is a good one. I've got an NSD server with 4x 16GB fibre
>> connections coming in and 1x FDR10 and 1x QDR connection going out to
>> the clients. I was having a really hard time getting anything resembling
>> sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
>> reads). The back-end is a DDN SFA12K and I *know* it can do better than
>> that.
>>
>> I don't remember quite how I figured this out but simply by running
>> "openssl speed -multi 16" on the nsd server to drive up the load I saw
>> an almost 4x performance jump which is pretty much goes against every
>> sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
>> quadruple your i/o performance").
>>
>> This feels like some type of C-states frequency scaling shenanigans that
>> I haven't quite ironed down yet. I booted the box with the following
>> kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
>> didn't seem to make much of a difference. I also tried setting the
>> frequency governer to userspace and setting the minimum frequency to
>> 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
>> to run something to drive up the CPU load and then performance improves.
>>
>> I'm wondering if this could be an issue with the C1E state? I'm curious
>> if anyone has seen anything like this. The node is a dx360 M4
>> (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>>
>> -Aaron
>>
>> --
>> Aaron Knister
>> NASA Center for Climate Simulation (Code 606.2)
>> Goddard Space Flight Center
>> (301) 286-2776
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From kenneth.waegeman at ugent.be  Fri Apr 21 10:52:58 2017
From: kenneth.waegeman at ugent.be (Kenneth Waegeman)
Date: Fri, 21 Apr 2017 11:52:58 +0200
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <OF80C82C90.0838F149-ONC1258109.0027B72A-C1258109.0028C647@notes.na.collabserv.com>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk>
	<2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov>
	<CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
	<4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
	<OF204655C9.8F275E1B-ON00258108.00531C81@LocalDomain>
	<OF9EF6AE8D.E66B3638-ONCC258109.000752BA-CC258109.00077E7B@notes.na.collabserv.com>
	<OF80C82C90.0838F149-ONC1258109.0027B72A-C1258109.0028C647@notes.na.collabserv.com>
Message-ID: <94f2ca6e-cf6b-ef6a-1b27-45d7a449a379@ugent.be>

Hi,

Tried these settings, but sadly I'm not seeing any changes.

Thanks,

Kenneth


On 21/04/17 09:25, Olaf Weiser wrote:
> pls check
> workerThreads  (assuming you 're  > 4.2.2) start with 128 .. increase 
> iteratively
> pagepool  at least 8 G
> ignorePrefetchLunCount=yes (1)
>
> then you won't see a difference and GPFS is as fast or even faster ..
>
>
>
> From: "Marcus Koenig1" <marcusk at nz1.ibm.com>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 04/21/2017 03:24 AM
> Subject: Re: [gpfsug-discuss] bizarre performance behavior
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
>
>
>
> Hi Kennmeth,
>
> we also had similar performance numbers in our tests. Native was far 
> quicker than through GPFS. When we learned though that the client 
> tested the performance on the FS at a big blocksize (512k) with small 
> files - we were able to speed it up significantly using a smaller FS 
> blocksize (obviously we had to recreate the FS).
>
> So really depends on how you do your tests.
>
> *Cheers,*
> *
> Marcus Koenig*
> Lab Services Storage & Power Specialist/
> IBM Australia & New Zealand Advanced Technical Skills/
> IBM Systems-Hardware
> ------------------------------------------------------------------------
> 	
> 	*Mobile:*+64 21 67 34 27*
> E-mail:*_marcusk at nz1.ibm.com_ <mailto:brendanp at nz1.ibm.com>
>
> 82 Wyndham Street
> Auckland, AUK 1010
> New Zealand
>
>
>
> 	
> 	
> 			
>
>
>
> Inactive hide details for "Uwe Falke" ---04/21/2017 03:07:48 AM---Hi 
> Kennmeth, is prefetching off or on at your storage backe"Uwe Falke" 
> ---04/21/2017 03:07:48 AM---Hi Kennmeth, is prefetching off or on at 
> your storage backend?
>
> From: "Uwe Falke" <UWEFALKE at de.ibm.com>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 04/21/2017 03:07 AM
> Subject: Re: [gpfsug-discuss] bizarre performance behavior
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>
> ------------------------------------------------------------------------
>
>
>
> Hi Kennmeth,
>
> is prefetching off or on  at your storage backend?
> Raw sequential is very different from GPFS sequential at the storage
> device !
> GPFS does its own prefetching, the storage would never know what sectors
> sequential read at GPFS level maps to at storage level!
>
>
> Mit freundlichen Gr??en / Kind regards
>
>
> Dr. Uwe Falke
>
> IT Specialist
> High Performance Computing Services / Integrated Technology Services /
> Data Center Services
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland
> Rathausstr. 7
> 09111 Chemnitz
> Phone: +49 371 6978 2165
> Mobile: +49 175 575 2877
> E-Mail: uwefalke at de.ibm.com
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung:
> Andreas Hasse, Thorsten Moehring
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
> HRB 17122
>
>
>
>
> From:   Kenneth Waegeman <kenneth.waegeman at ugent.be>
> To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:   04/20/2017 04:53 PM
> Subject:        Re: [gpfsug-discuss] bizarre performance behavior
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
>
>
>
> Hi,
>
> Having an issue that looks the same as this one:
> We can do sequential writes to the filesystem at 7,8 GB/s total , 
> which is
> the expected speed for our current storage
> backend.  While we have even better performance with sequential reads on
> raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd
> server seems limited by 0,5GB/s) independent of the number of clients
> (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params,
> MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in
> this thread, but nothing seems to impact this read performance.
> Any ideas?
> Thanks!
>
> Kenneth
>
> On 17/02/17 19:29, Jan-Frode Myklebust wrote:
> I just had a similar experience from a sandisk infiniflash system
> SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes.
> and 250-300 Mbyte/s on sequential reads!! Random reads were on the order
> of 2 Gbyte/s.
>
> After a bit head scratching snd fumbling around I found out that reducing
> maxMBpS from 10000 to 100 fixed the problem! Digging further I found that
> reducing prefetchThreads from default=72 to 32 also fixed it, while
> leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.
>
> Could something like this be the problem on your box as well?
>
>
>
> -jf
> fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <aaron.s.knister at nasa.gov
> >:
> Well, I'm somewhat scrounging for hardware. This is in our test
> environment :) And yep, it's got the 2U gpu-tray in it although even
> without the riser it has 2 PCIe slots onboard (excluding the on-board
> dual-port mezz card) so I think it would make a fine NSD server even
> without the riser.
>
> -Aaron
>
> On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
> wrote:
> > Maybe its related to interrupt handlers somehow? You drive the load up
> on one socket, you push all the interrupt handling to the other socket
> where the fabric card is attached?
> >
> > Dunno ... (Though I am intrigued you use idataplex nodes as NSD 
> servers,
> I assume its some 2U gpu-tray riser one or something !)
> >
> > Simon
> > ________________________________________
> > From: gpfsug-discuss-bounces at spectrumscale.org [
> gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [
> aaron.s.knister at nasa.gov]
> > Sent: 17 February 2017 15:52
> > To: gpfsug main discussion list
> > Subject: [gpfsug-discuss] bizarre performance behavior
> >
> > This is a good one. I've got an NSD server with 4x 16GB fibre
> > connections coming in and 1x FDR10 and 1x QDR connection going out to
> > the clients. I was having a really hard time getting anything resembling
> > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> > reads). The back-end is a DDN SFA12K and I *know* it can do better than
> > that.
> >
> > I don't remember quite how I figured this out but simply by running
> > "openssl speed -multi 16" on the nsd server to drive up the load I saw
> > an almost 4x performance jump which is pretty much goes against every
> > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> > quadruple your i/o performance").
> >
> > This feels like some type of C-states frequency scaling shenanigans that
> > I haven't quite ironed down yet. I booted the box with the following
> > kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> > didn't seem to make much of a difference. I also tried setting the
> > frequency governer to userspace and setting the minimum frequency to
> > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> > to run something to drive up the CPU load and then performance improves.
> >
> > I'm wondering if this could be an issue with the C1E state? I'm curious
> > if anyone has seen anything like this. The node is a dx360 M4
> > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
> >
> > -Aaron
> >
> > --
> > Aaron Knister
> > NASA Center for Climate Simulation (Code 606.2)
> > Goddard Space Flight Center
> > (301) 286-2776
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 3720 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 2741 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 13421 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment-0005.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment-0006.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/cab22a7f/attachment-0007.gif>

From makaplan at us.ibm.com  Fri Apr 21 13:58:26 2017
From: makaplan at us.ibm.com (Marc A Kaplan)
Date: Fri, 21 Apr 2017 08:58:26 -0400
Subject: [gpfsug-discuss] bizarre performance behavior - prefetchThreads
In-Reply-To: <94f2ca6e-cf6b-ef6a-1b27-45d7a449a379@ugent.be>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov><CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk><2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com><4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be><OF204655C9.8F275E1B-ON00258108.00531C81@LocalDomain><OF9EF6AE8D.E66B3638-ONCC258109.000752BA-CC258109.00077E7B@notes.na.collabserv.com><OF80C82C90.0838F149-ONC1258109.0027B72A-C1258109.0028C647@notes.na.collabserv.com>
	<94f2ca6e-cf6b-ef6a-1b27-45d7a449a379@ugent.be>
Message-ID: <OFB4BE6591.01F97814-ON85258109.0046D48F-85258109.00474469@notes.na.collabserv.com>

Seems counter-logical, but we have testimony that you may need to reduce 
the prefetchThreads parameter.
Of all the parameters, that's the one that directly affects prefetching, 
so worth trying.

 Jan-Frode Myklebust wrote: ...Digging further I found that 
reducing prefetchThreads from default=72 to 32 also fixed it, while 
leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s....

I can speculate that having prefetchThreads to high may create a 
contention situation where more threads causes overall degradation in 
system performance.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/8dd9ff3d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 21994 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/8dd9ff3d/attachment.gif>

From aaron.s.knister at nasa.gov  Fri Apr 21 14:10:49 2017
From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP])
Date: Fri, 21 Apr 2017 13:10:49 +0000
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk>
	<2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov>
	<CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
	<4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
	<67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov>,
	<9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be>
Message-ID: <BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov>

Fantastic news! It might also be worth running "cpupower monitor" or "turbostat" on your NSD servers while you're running dd tests from the clients to see what CPU frequency your cores are actually running at.

A typical NSD server workload (especially with IB verbs and for reads) can be pretty light on CPU which might not prompt your CPU crew governor to up the frequency (which can affect throughout). If your frequency scaling governor isn't kicking up the frequency of your CPUs I've seen that cause this behavior in my testing.

-Aaron


On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman <kenneth.waegeman at ugent.be> wrote:

Hi,

We are running a test setup with 2 NSD Servers backed by 4 Dell Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of the 4 powervaults, nsd02 is primary serving LUNS of controller B.

We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled.

When we do dd from the NSD servers, we see indeed performance going to 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able to get the data at a decent speed. Since we can write from the clients at a good speed, I didn't suspect the communication between clients and nsds being the issue, especially since total performance stays the same using 1 or multiple clients.

I'll use the nsdperf tool to see if we can find anything,

thanks!

K

On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote:
Interesting. Could you share a little more about your architecture? Is it possible to mount the fs on an NSD server and do some dd's from the fs on the NSD server? If that gives you decent performance perhaps try NSDPERF next https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf

-Aaron


On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman <kenneth.waegeman at ugent.be><mailto:kenneth.waegeman at ugent.be> wrote:

Hi,


Having an issue that looks the same as this one:

We can do sequential writes to the filesystem at 7,8 GB/s total , which is the expected speed for our current storage
backend.  While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients
(1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in this thread, but nothing seems to impact this read performance.

Any ideas?

Thanks!

Kenneth

On 17/02/17 19:29, Jan-Frode Myklebust wrote:
I just had a similar experience from a sandisk infiniflash system SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on the order of 2 Gbyte/s.

After a bit head scratching snd fumbling around I found out that reducing maxMBpS from 10000 to 100 fixed the problem! Digging further I found that reducing prefetchThreads from default=72 to 32 also fixed it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.

Could something like this be the problem on your box as well?


-jf
fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <<mailto:aaron.s.knister at nasa.gov>aaron.s.knister at nasa.gov<mailto:aaron.s.knister at nasa.gov>>:
Well, I'm somewhat scrounging for hardware. This is in our test
environment :) And yep, it's got the 2U gpu-tray in it although even
without the riser it has 2 PCIe slots onboard (excluding the on-board
dual-port mezz card) so I think it would make a fine NSD server even
without the riser.

-Aaron

On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
wrote:
> Maybe its related to interrupt handlers somehow? You drive the load up on one socket, you push all the interrupt handling to the other socket where the fabric card is attached?
>
> Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, I assume its some 2U gpu-tray riser one or something !)
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [<mailto:gpfsug-discuss-bounces at spectrumscale.org>gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of Aaron Knister [<mailto:aaron.s.knister at nasa.gov>aaron.s.knister at nasa.gov<mailto:aaron.s.knister at nasa.gov>]
> Sent: 17 February 2017 15:52
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] bizarre performance behavior
>
> This is a good one. I've got an NSD server with 4x 16GB fibre
> connections coming in and 1x FDR10 and 1x QDR connection going out to
> the clients. I was having a really hard time getting anything resembling
> sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> reads). The back-end is a DDN SFA12K and I *know* it can do better than
> that.
>
> I don't remember quite how I figured this out but simply by running
> "openssl speed -multi 16" on the nsd server to drive up the load I saw
> an almost 4x performance jump which is pretty much goes against every
> sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> quadruple your i/o performance").
>
> This feels like some type of C-states frequency scaling shenanigans that
> I haven't quite ironed down yet. I booted the box with the following
> kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> didn't seem to make much of a difference. I also tried setting the
> frequency governer to userspace and setting the minimum frequency to
> 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> to run something to drive up the CPU load and then performance improves.
>
> I'm wondering if this could be an issue with the C1E state? I'm curious
> if anyone has seen anything like this. The node is a dx360 M4
> (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/97d3a684/attachment.htm>

From david_johnson at brown.edu  Fri Apr 21 14:18:34 2017
From: david_johnson at brown.edu (David D Johnson)
Date: Fri, 21 Apr 2017 09:18:34 -0400
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk>
	<2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov>
	<CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
	<4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
	<67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov>
	<9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be>
	<BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov>
Message-ID: <02C0BD31-E743-4F1C-91E7-20555099CBF5@brown.edu>

We had some luck making the client and server IB performance consistently decent by configuring tuned with the profile "latency-performance".
The key is the line 
	/usr/libexec/tuned/pmqos-static.py cpu_dma_latency=1
which prevents cpu from going to sleep just when the next burst of IB traffic is about to arrive.

 -- ddj
Dave Johnson
Brown University CCV

On Apr 21, 2017, at 9:10 AM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] <aaron.s.knister at nasa.gov> wrote:
> 
> Fantastic news! It might also be worth running "cpupower monitor" or "turbostat" on your NSD servers while you're running dd tests from the clients to see what CPU frequency your cores are actually running at. 
> 
> A typical NSD server workload (especially with IB verbs and for reads) can be pretty light on CPU which might not prompt your CPU crew governor to up the frequency (which can affect throughout). If your frequency scaling governor isn't kicking up the frequency of your CPUs I've seen that cause this behavior in my testing. 
> 
> -Aaron
> 
> 
> 
> 
> On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman <kenneth.waegeman at ugent.be> wrote:
>> Hi, 
>> We are running a test setup with 2 NSD Servers backed by 4 Dell Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of the 4 powervaults, nsd02 is primary serving LUNS of controller B. 
>> We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled.
>> When we do dd from the NSD servers, we see indeed performance going to 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able to get the data at a decent speed. Since we can write from the clients at a good speed, I didn't suspect the communication between clients and nsds being the issue, especially since total performance stays the same using 1 or multiple clients. 
>> 
>> I'll use the nsdperf tool to see if we can find anything, 
>> 
>> thanks!
>> 
>> K
>> 
>> On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote:
>>> Interesting. Could you share a little more about your architecture? Is it possible to mount the fs on an NSD server and do some dd's from the fs on the NSD server? If that gives you decent performance perhaps try NSDPERF next https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf <https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf>
>>> 
>>> -Aaron
>>> 
>>> 
>>> 
>>> 
>>> On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman <kenneth.waegeman at ugent.be> <mailto:kenneth.waegeman at ugent.be> wrote:
>>>> Hi,
>>>> 
>>>> 
>>>> Having an issue that looks the same as this one: 
>>>> 
>>>> We can do sequential writes to the filesystem at 7,8 GB/s total , which is the expected speed for our current storage    
>>>> backend.  While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients   
>>>> (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in this thread, but nothing seems to impact this read performance. 
>>>> Any ideas?
>>>> 
>>>> Thanks!
>>>> 
>>>> Kenneth
>>>> 
>>>> On 17/02/17 19:29, Jan-Frode Myklebust wrote:
>>>>> I just had a similar experience from a sandisk infiniflash system SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on the order of 2 Gbyte/s.
>>>>> 
>>>>> After a bit head scratching snd fumbling around I found out that reducing maxMBpS from 10000 to 100 fixed the problem! Digging further I found that reducing prefetchThreads from default=72 to 32 also fixed it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.
>>>>> 
>>>>> Could something like this be the problem on your box as well?
>>>>> 
>>>>> 
>>>>> 
>>>>> -jf
>>>>> fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister < <mailto:aaron.s.knister at nasa.gov>aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>:
>>>>> Well, I'm somewhat scrounging for hardware. This is in our test
>>>>> environment :) And yep, it's got the 2U gpu-tray in it although even
>>>>> without the riser it has 2 PCIe slots onboard (excluding the on-board
>>>>> dual-port mezz card) so I think it would make a fine NSD server even
>>>>> without the riser.
>>>>> 
>>>>> -Aaron
>>>>> 
>>>>> On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
>>>>> wrote:
>>>>> > Maybe its related to interrupt handlers somehow? You drive the load up on one socket, you push all the interrupt handling to the other socket where the fabric card is attached?
>>>>> >
>>>>> > Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, I assume its some 2U gpu-tray riser one or something !)
>>>>> >
>>>>> > Simon
>>>>> > ________________________________________
>>>>> > From: gpfsug-discuss-bounces at spectrumscale.org <mailto:gpfsug-discuss-bounces at spectrumscale.org> [ <mailto:gpfsug-discuss-bounces at spectrumscale.org>gpfsug-discuss-bounces at spectrumscale.org <mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of Aaron Knister [ <mailto:aaron.s.knister at nasa.gov>aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>]
>>>>> > Sent: 17 February 2017 15:52
>>>>> > To: gpfsug main discussion list
>>>>> > Subject: [gpfsug-discuss] bizarre performance behavior
>>>>> >
>>>>> > This is a good one. I've got an NSD server with 4x 16GB fibre
>>>>> > connections coming in and 1x FDR10 and 1x QDR connection going out to
>>>>> > the clients. I was having a really hard time getting anything resembling
>>>>> > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
>>>>> > reads). The back-end is a DDN SFA12K and I *know* it can do better than
>>>>> > that.
>>>>> >
>>>>> > I don't remember quite how I figured this out but simply by running
>>>>> > "openssl speed -multi 16" on the nsd server to drive up the load I saw
>>>>> > an almost 4x performance jump which is pretty much goes against every
>>>>> > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
>>>>> > quadruple your i/o performance").
>>>>> >
>>>>> > This feels like some type of C-states frequency scaling shenanigans that
>>>>> > I haven't quite ironed down yet. I booted the box with the following
>>>>> > kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
>>>>> > didn't seem to make much of a difference. I also tried setting the
>>>>> > frequency governer to userspace and setting the minimum frequency to
>>>>> > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
>>>>> > to run something to drive up the CPU load and then performance improves.
>>>>> >
>>>>> > I'm wondering if this could be an issue with the C1E state? I'm curious
>>>>> > if anyone has seen anything like this. The node is a dx360 M4
>>>>> > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>>>>> >
>>>>> > -Aaron
>>>>> >
>>>>> > --
>>>>> > Aaron Knister
>>>>> > NASA Center for Climate Simulation (Code 606.2)
>>>>> > Goddard Space Flight Center
>>>>> > (301) 286-2776
>>>>> > _______________________________________________
>>>>> > gpfsug-discuss mailing list
>>>>> > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
>>>>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>>>>> > _______________________________________________
>>>>> > gpfsug-discuss mailing list
>>>>> > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
>>>>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>>>>> >
>>>>> 
>>>>> --
>>>>> Aaron Knister
>>>>> NASA Center for Climate Simulation (Code 606.2)
>>>>> Goddard Space Flight Center
>>>>> (301) 286-2776
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at spectrumscale.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/0e0ed0da/attachment.htm>

From kums at us.ibm.com  Fri Apr 21 15:01:33 2017
From: kums at us.ibm.com (Kumaran Rajaram)
Date: Fri, 21 Apr 2017 14:01:33 +0000
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov><CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk><2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com><4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be><67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov>,
	<9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be>
	<BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov>
Message-ID: <OFD0F73660.1A201F6F-ON00258109.004B3EC7-85258109.004D0D0D@notes.na.collabserv.com>

Hi,

Try enabling the following in the BIOS of the NSD servers (screen shots 
below) 
Turbo Mode - Enable
QPI Link Frequency - Max Performance
Operating Mode - Maximum Performance

>>>>While we have even better performance with sequential reads on raw 
storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server 
seems limited by 0,5GB/s) independent of the number of clients   
>>We are testing from 2 testing machines connected to the nsds with 
infiniband, verbs enabled.

Also, It will be good to verify that all the GPFS nodes have Verbs RDMA 
started using "mmfsadm test verbs status" and that the NSD client-server 
communication from client to server during "dd" is actually using Verbs 
RDMA using "mmfsadm test verbs conn" command  (on NSD client doing dd). If 
not, then GPFS might be using TCP/IP network over which the cluster is 
configured impacting performance (If this is the case, GPFS 
mmfs.log.latest for any Verbs RDMA related errors and resolve). 


Regards,
-Kums


From:   "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" 
<aaron.s.knister at nasa.gov>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/21/2017 09:11 AM
Subject:        Re: [gpfsug-discuss] bizarre performance behavior
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Fantastic news! It might also be worth running "cpupower monitor" or 
"turbostat" on your NSD servers while you're running dd tests from the 
clients to see what CPU frequency your cores are actually running at. 

A typical NSD server workload (especially with IB verbs and for reads) can 
be pretty light on CPU which might not prompt your CPU crew governor to up 
the frequency (which can affect throughout). If your frequency scaling 
governor isn't kicking up the frequency of your CPUs I've seen that cause 
this behavior in my testing. 

-Aaron


On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman 
<kenneth.waegeman at ugent.be> wrote:
Hi, 
We are running a test setup with 2 NSD Servers backed by 4 Dell 
Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of the 
4 powervaults, nsd02 is primary serving LUNS of controller B. 
We are testing from 2 testing machines connected to the nsds with 
infiniband, verbs enabled.
When we do dd from the NSD servers, we see indeed performance going to 
5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able to 
get the data at a decent speed. Since we can write from the clients at a 
good speed, I didn't suspect the communication between clients and nsds 
being the issue, especially since total performance stays the same using 1 
or multiple clients. 

I'll use the nsdperf tool to see if we can find anything, 

thanks!

K

On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] 
wrote:
Interesting. Could you share a little more about your architecture? Is it 
possible to mount the fs on an NSD server and do some dd's from the fs on 
the NSD server? If that gives you decent performance perhaps try NSDPERF 
next 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf 


-Aaron


On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman 
<kenneth.waegeman at ugent.be> wrote:
Hi,

Having an issue that looks the same as this one: 
We can do sequential writes to the filesystem at 7,8 GB/s total , which is 
the expected speed for our current storage 
backend.  While we have even better performance with sequential reads on 
raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd 
server seems limited by 0,5GB/s) independent of the number of clients 
(1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, 
MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in 
this thread, but nothing seems to impact this read performance. 
Any ideas?
Thanks!

Kenneth

On 17/02/17 19:29, Jan-Frode Myklebust wrote:
I just had a similar experience from a sandisk infiniflash system 
SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. 
and 250-300 Mbyte/s on sequential reads!! Random reads were on the order 
of 2 Gbyte/s.

After a bit head scratching snd fumbling around I found out that reducing 
maxMBpS from 10000 to 100 fixed the problem! Digging further I found that 
reducing prefetchThreads from default=72 to 32 also fixed it, while 
leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.

Could something like this be the problem on your box as well?


-jf
fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <aaron.s.knister at nasa.gov
>:
Well, I'm somewhat scrounging for hardware. This is in our test
environment :) And yep, it's got the 2U gpu-tray in it although even
without the riser it has 2 PCIe slots onboard (excluding the on-board
dual-port mezz card) so I think it would make a fine NSD server even
without the riser.

-Aaron

On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
wrote:
> Maybe its related to interrupt handlers somehow? You drive the load up 
on one socket, you push all the interrupt handling to the other socket 
where the fabric card is attached?
>
> Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, 
I assume its some 2U gpu-tray riser one or something !)
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org [
gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [
aaron.s.knister at nasa.gov]
> Sent: 17 February 2017 15:52
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] bizarre performance behavior
>
> This is a good one. I've got an NSD server with 4x 16GB fibre
> connections coming in and 1x FDR10 and 1x QDR connection going out to
> the clients. I was having a really hard time getting anything resembling
> sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> reads). The back-end is a DDN SFA12K and I *know* it can do better than
> that.
>
> I don't remember quite how I figured this out but simply by running
> "openssl speed -multi 16" on the nsd server to drive up the load I saw
> an almost 4x performance jump which is pretty much goes against every
> sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> quadruple your i/o performance").
>
> This feels like some type of C-states frequency scaling shenanigans that
> I haven't quite ironed down yet. I booted the box with the following
> kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> didn't seem to make much of a difference. I also tried setting the
> frequency governer to userspace and setting the minimum frequency to
> 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> to run something to drive up the CPU load and then performance improves.
>
> I'm wondering if this could be an issue with the C1E state? I'm curious
> if anyone has seen anything like this. The node is a dx360 M4
> (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/0d0d68f9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 61023 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/0d0d68f9/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 85131 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/0d0d68f9/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 84819 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/0d0d68f9/attachment-0002.gif>

From bbanister at jumptrading.com  Fri Apr 21 16:01:54 2017
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 21 Apr 2017 15:01:54 +0000
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <OFD0F73660.1A201F6F-ON00258109.004B3EC7-85258109.004D0D0D@notes.na.collabserv.com>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov><CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk><2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com><4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be><67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov>,
	<9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be>
	<BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov>
	<OFD0F73660.1A201F6F-ON00258109.004B3EC7-85258109.004D0D0D@notes.na.collabserv.com>
Message-ID: <7dcbac92e19043faa7968702d852668f@jumptrading.com>

I think we have a new topic and new speaker for the next UG meeting at SC!  Kums presenting "Performance considerations for Spectrum Scale"!!

Kums, I have to say you do have a lot to offer here... ;o)
-Bryan

Disclaimer: There are some selfish reasons of me wanting to hang out with you again involved in this suggestion

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kumaran Rajaram
Sent: Friday, April 21, 2017 9:02 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] bizarre performance behavior

Hi,

Try enabling the following in the BIOS of the NSD servers (screen shots below)

  *   Turbo Mode - Enable
  *   QPI Link Frequency - Max Performance
  *   Operating Mode - Maximum Performance

*         >>>>While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients

>>We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled.

Also, It will be good to verify that all the GPFS nodes have Verbs RDMA started using "mmfsadm test verbs status" and that the NSD client-server communication from client to server during "dd" is actually using Verbs RDMA using "mmfsadm test verbs conn" command  (on NSD client doing dd). If not, then GPFS might be using TCP/IP network over which the cluster is configured impacting performance (If this is the case, GPFS mmfs.log.latest for any Verbs RDMA related errors and resolve).

  *
[cid:image001.gif at 01D2BA86.4D4B4C10]
[cid:image002.gif at 01D2BA86.4D4B4C10]
[cid:image003.gif at 01D2BA86.4D4B4C10]

Regards,
-Kums


From:        "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" <aaron.s.knister at nasa.gov<mailto:aaron.s.knister at nasa.gov>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        04/21/2017 09:11 AM
Subject:        Re: [gpfsug-discuss] bizarre performance behavior
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________


Fantastic news! It might also be worth running "cpupower monitor" or "turbostat" on your NSD servers while you're running dd tests from the clients to see what CPU frequency your cores are actually running at.

A typical NSD server workload (especially with IB verbs and for reads) can be pretty light on CPU which might not prompt your CPU crew governor to up the frequency (which can affect throughout). If your frequency scaling governor isn't kicking up the frequency of your CPUs I've seen that cause this behavior in my testing.

-Aaron


On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman <kenneth.waegeman at ugent.be<mailto:kenneth.waegeman at ugent.be>> wrote:

Hi,

We are running a test setup with 2 NSD Servers backed by 4 Dell Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of the 4 powervaults, nsd02 is primary serving LUNS of controller B.

We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled.

When we do dd from the NSD servers, we see indeed performance going to 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able to get the data at a decent speed. Since we can write from the clients at a good speed, I didn't suspect the communication between clients and nsds being the issue, especially since total performance stays the same using 1 or multiple clients.

I'll use the nsdperf tool to see if we can find anything,

thanks!

K

On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote:
Interesting. Could you share a little more about your architecture? Is it possible to mount the fs on an NSD server and do some dd's from the fs on the NSD server? If that gives you decent performance perhaps try NSDPERF next https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf

-Aaron


On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman <kenneth.waegeman at ugent.be><mailto:kenneth.waegeman at ugent.be>wrote:

Hi,

Having an issue that looks the same as this one:

We can do sequential writes to the filesystem at 7,8 GB/s total , which is the expected speed for our current storage
backend.  While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients
(1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in this thread, but nothing seems to impact this read performance.

Any ideas?

Thanks!

Kenneth

On 17/02/17 19:29, Jan-Frode Myklebust wrote:
I just had a similar experience from a sandisk infiniflash system SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on the order of 2 Gbyte/s.

After a bit head scratching snd fumbling around I found out that reducing maxMBpS from 10000 to 100 fixed the problem! Digging further I found that reducing prefetchThreads from default=72 to 32 also fixed it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.

Could something like this be the problem on your box as well?


-jf
fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <aaron.s.knister at nasa.gov<mailto:aaron.s.knister at nasa.gov>>:
Well, I'm somewhat scrounging for hardware. This is in our test
environment :) And yep, it's got the 2U gpu-tray in it although even
without the riser it has 2 PCIe slots onboard (excluding the on-board
dual-port mezz card) so I think it would make a fine NSD server even
without the riser.

-Aaron

On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
wrote:
> Maybe its related to interrupt handlers somehow? You drive the load up on one socket, you push all the interrupt handling to the other socket where the fabric card is attached?
>
> Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, I assume its some 2U gpu-tray riser one or something !)
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>[gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of Aaron Knister [aaron.s.knister at nasa.gov<mailto:aaron.s.knister at nasa.gov>]
> Sent: 17 February 2017 15:52
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] bizarre performance behavior
>
> This is a good one. I've got an NSD server with 4x 16GB fibre
> connections coming in and 1x FDR10 and 1x QDR connection going out to
> the clients. I was having a really hard time getting anything resembling
> sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> reads). The back-end is a DDN SFA12K and I *know* it can do better than
> that.
>
> I don't remember quite how I figured this out but simply by running
> "openssl speed -multi 16" on the nsd server to drive up the load I saw
> an almost 4x performance jump which is pretty much goes against every
> sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> quadruple your i/o performance").
>
> This feels like some type of C-states frequency scaling shenanigans that
> I haven't quite ironed down yet. I booted the box with the following
> kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> didn't seem to make much of a difference. I also tried setting the
> frequency governer to userspace and setting the minimum frequency to
> 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> to run something to drive up the CPU load and then performance improves.
>
> I'm wondering if this could be an issue with the C1E state? I'm curious
> if anyone has seen anything like this. The node is a dx360 M4
> (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ae32a452/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 61023 bytes
Desc: image001.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ae32a452/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 85131 bytes
Desc: image002.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ae32a452/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.gif
Type: image/gif
Size: 84819 bytes
Desc: image003.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ae32a452/attachment-0002.gif>

From g.mangeot at gmail.com  Fri Apr 21 16:04:58 2017
From: g.mangeot at gmail.com (Guillaume Mangeot)
Date: Fri, 21 Apr 2017 17:04:58 +0200
Subject: [gpfsug-discuss] HA on snapshot scheduling in GPFS GUI
Message-ID: <CAJOMXWWPsf5mZ5MyeShv1yuRQbdqsu_uv1w7fjrhd8CRFpBrLQ@mail.gmail.com>

Hi,

I'm looking for a way to get the GUI working in HA to schedule snapshots.
I have 2 servers with gpfs.gui service running on them.

I checked a bit with lssnaprule in /usr/lpp/mmfs/gui/cli
and the file /var/lib/mmfs/gui/snapshots.json

But it doesn't look to be shared between all the GUI servers.

Is there a way to get GPFS GUI working in HA to schedule snapshots?
(keeping the coherency: avoiding to trigger snapshots on both servers in
the same time)


Regards,

Guillaume Mangeot
DDN Storage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/74128c4b/attachment.htm>

From kenneth.waegeman at ugent.be  Fri Apr 21 16:33:16 2017
From: kenneth.waegeman at ugent.be (Kenneth Waegeman)
Date: Fri, 21 Apr 2017 17:33:16 +0200
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk>
	<2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov>
	<CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
	<4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
	<67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov>
	<9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be>
	<BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov>
Message-ID: <41475044-c195-5561-c94a-b54ee30c7e68@ugent.be>


On 21/04/17 15:10, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] 
wrote:
> Fantastic news! It might also be worth running "cpupower monitor" or 
> "turbostat" on your NSD servers while you're running dd tests from the 
> clients to see what CPU frequency your cores are actually running at.
Thanks! I verified with turbostat and cpuinfo, our cpus are running in 
high performance mode and frequency is always at highest level.

>
> A typical NSD server workload (especially with IB verbs and for reads) 
> can be pretty light on CPU which might not prompt your CPU crew 
> governor to up the frequency (which can affect throughout). If your 
> frequency scaling governor isn't kicking up the frequency of your CPUs 
> I've seen that cause this behavior in my testing.
>
> -Aaron
>
>
>
>
> On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman 
> <kenneth.waegeman at ugent.be> wrote:
>>
>> Hi,
>>
>> We are running a test setup with 2 NSD Servers backed by 4 Dell 
>> Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of 
>> the 4 powervaults, nsd02 is primary serving LUNS of controller B.
>>
>> We are testing from 2 testing machines connected to the nsds with 
>> infiniband, verbs enabled.
>>
>> When we do dd from the NSD servers, we see indeed performance going 
>> to 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is 
>> able to get the data at a decent speed. Since we can write from the 
>> clients at a good speed, I didn't suspect the communication between 
>> clients and nsds being the issue, especially since total performance 
>> stays the same using 1 or multiple clients.
>>
>> I'll use the nsdperf tool to see if we can find anything,
>>
>> thanks!
>>
>> K
>>
>> On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE 
>> CORP] wrote:
>>> Interesting. Could you share a little more about your architecture? 
>>> Is it possible to mount the fs on an NSD server and do some dd's 
>>> from the fs on the NSD server? If that gives you decent performance 
>>> perhaps try NSDPERF next 
>>> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf 
>>> <https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General+Parallel+File+System+%28GPFS%29/page/Testing+network+performance+with+nsdperf> 
>>>
>>>
>>> -Aaron
>>>
>>>
>>>
>>>
>>> On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman 
>>> <kenneth.waegeman at ugent.be> <mailto:kenneth.waegeman at ugent.be> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> Having an issue that looks the same as this one:
>>>>
>>>> We can do sequential writes to the filesystem at 7,8 GB/s total , 
>>>> which is the expected speed for our current storage
>>>> backend.  While we have even better performance with sequential 
>>>> reads on raw storage LUNS, using GPFS we can only reach 1GB/s in 
>>>> total (each nsd server seems limited by 0,5GB/s) independent of the 
>>>> number of clients
>>>> (1,2,4,..) or ways we tested (fio,dd). We played with blockdev 
>>>> params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. 
>>>> as discussed in this thread, but nothing seems to impact this read 
>>>> performance.
>>>>
>>>> Any ideas?
>>>>
>>>> Thanks!
>>>>
>>>> Kenneth
>>>>
>>>> On 17/02/17 19:29, Jan-Frode Myklebust wrote:
>>>>> I just had a similar experience from a sandisk infiniflash system 
>>>>> SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for 
>>>>> writes. and 250-300 Mbyte/s on sequential reads!! Random reads 
>>>>> were on the order of 2 Gbyte/s.
>>>>>
>>>>> After a bit head scratching snd fumbling around I found out that 
>>>>> reducing maxMBpS from 10000 to 100 fixed the problem! Digging 
>>>>> further I found that reducing prefetchThreads from default=72 to 
>>>>> 32 also fixed it, while leaving maxMBpS at 10000. Can now also 
>>>>> read at 3,2 GByte/s.
>>>>>
>>>>> Could something like this be the problem on your box as well?
>>>>>
>>>>>
>>>>>
>>>>> -jf
>>>>> fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister 
>>>>> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>:
>>>>>
>>>>>     Well, I'm somewhat scrounging for hardware. This is in our test
>>>>>     environment :) And yep, it's got the 2U gpu-tray in it
>>>>>     although even
>>>>>     without the riser it has 2 PCIe slots onboard (excluding the
>>>>>     on-board
>>>>>     dual-port mezz card) so I think it would make a fine NSD
>>>>>     server even
>>>>>     without the riser.
>>>>>
>>>>>     -Aaron
>>>>>
>>>>>     On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT
>>>>>     Services)
>>>>>     wrote:
>>>>>     > Maybe its related to interrupt handlers somehow? You drive
>>>>>     the load up on one socket, you push all the interrupt handling
>>>>>     to the other socket where the fabric card is attached?
>>>>>     >
>>>>>     > Dunno ... (Though I am intrigued you use idataplex nodes as
>>>>>     NSD servers, I assume its some 2U gpu-tray riser one or
>>>>>     something !)
>>>>>     >
>>>>>     > Simon
>>>>>     > ________________________________________
>>>>>     > From: gpfsug-discuss-bounces at spectrumscale.org
>>>>>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>
>>>>>     [gpfsug-discuss-bounces at spectrumscale.org
>>>>>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf
>>>>>     of Aaron Knister [aaron.s.knister at nasa.gov
>>>>>     <mailto:aaron.s.knister at nasa.gov>]
>>>>>     > Sent: 17 February 2017 15:52
>>>>>     > To: gpfsug main discussion list
>>>>>     > Subject: [gpfsug-discuss] bizarre performance behavior
>>>>>     >
>>>>>     > This is a good one. I've got an NSD server with 4x 16GB fibre
>>>>>     > connections coming in and 1x FDR10 and 1x QDR connection
>>>>>     going out to
>>>>>     > the clients. I was having a really hard time getting
>>>>>     anything resembling
>>>>>     > sensible performance out of it (4-5Gb/s writes but maybe
>>>>>     1.2Gb/s for
>>>>>     > reads). The back-end is a DDN SFA12K and I *know* it can do
>>>>>     better than
>>>>>     > that.
>>>>>     >
>>>>>     > I don't remember quite how I figured this out but simply by
>>>>>     running
>>>>>     > "openssl speed -multi 16" on the nsd server to drive up the
>>>>>     load I saw
>>>>>     > an almost 4x performance jump which is pretty much goes
>>>>>     against every
>>>>>     > sysadmin fiber in me (i.e. "drive up the cpu load with
>>>>>     unrelated crap to
>>>>>     > quadruple your i/o performance").
>>>>>     >
>>>>>     > This feels like some type of C-states frequency scaling
>>>>>     shenanigans that
>>>>>     > I haven't quite ironed down yet. I booted the box with the
>>>>>     following
>>>>>     > kernel parameters "intel_idle.max_cstate=0
>>>>>     processor.max_cstate=0" which
>>>>>     > didn't seem to make much of a difference. I also tried
>>>>>     setting the
>>>>>     > frequency governer to userspace and setting the minimum
>>>>>     frequency to
>>>>>     > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I
>>>>>     still have
>>>>>     > to run something to drive up the CPU load and then
>>>>>     performance improves.
>>>>>     >
>>>>>     > I'm wondering if this could be an issue with the C1E state?
>>>>>     I'm curious
>>>>>     > if anyone has seen anything like this. The node is a dx360 M4
>>>>>     > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>>>>>     >
>>>>>     > -Aaron
>>>>>     >
>>>>>     > --
>>>>>     > Aaron Knister
>>>>>     > NASA Center for Climate Simulation (Code 606.2)
>>>>>     > Goddard Space Flight Center
>>>>>     > (301) 286-2776
>>>>>     > _______________________________________________
>>>>>     > gpfsug-discuss mailing list
>>>>>     > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>>>>     > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>     > _______________________________________________
>>>>>     > gpfsug-discuss mailing list
>>>>>     > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>>>>     > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>     >
>>>>>
>>>>>     --
>>>>>     Aaron Knister
>>>>>     NASA Center for Climate Simulation (Code 606.2)
>>>>>     Goddard Space Flight Center
>>>>>     (301) 286-2776
>>>>>     _______________________________________________
>>>>>     gpfsug-discuss mailing list
>>>>>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>>>>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at spectrumscale.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/025fc3b6/attachment.htm>

From kenneth.waegeman at ugent.be  Fri Apr 21 16:42:34 2017
From: kenneth.waegeman at ugent.be (Kenneth Waegeman)
Date: Fri, 21 Apr 2017 17:42:34 +0200
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <OFD0F73660.1A201F6F-ON00258109.004B3EC7-85258109.004D0D0D@notes.na.collabserv.com>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk>
	<2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov>
	<CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com>
	<4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be>
	<67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov>
	<9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be>
	<BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov>
	<OFD0F73660.1A201F6F-ON00258109.004B3EC7-85258109.004D0D0D@notes.na.collabserv.com>
Message-ID: <7f7349c9-bdd3-5847-1cca-d98d221489fe@ugent.be>

Hi,

We already verified this on our nsds:

[root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --QpiSpeed
QpiSpeed=maxdatarate
[root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --turbomode
turbomode=enable
[root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg ?-SysProfile
SysProfile=perfoptimized

so sadly this is not the issue.

Also the output of the verbs commands look ok, there are connections 
from the client to the nsds are there is data being read and writen.


Thanks again!

Kenneth


On 21/04/17 16:01, Kumaran Rajaram wrote:
> Hi,
>
> Try enabling the following in the BIOS of the NSD servers (screen 
> shots below)
>
>   * Turbo Mode - Enable
>   * QPI Link Frequency - Max Performance
>   * Operating Mode - Maximum Performance
>  *
>
>     >>>>While we have even better performance with sequential reads on
>     raw storage LUNS, using GPFS we can only reach 1GB/s in total
>     (each nsd server seems limited by 0,5GB/s) independent of the
>     number of clients
>
>     >>We are testing from 2 testing machines connected to the nsds
>     with infiniband, verbs enabled.
>
>
> Also, It will be good to verify that all the GPFS nodes have Verbs 
> RDMA started using "mmfsadm test verbs status" and that the NSD 
> client-server communication from client to server during "dd" is 
> actually using Verbs RDMA using "mmfsadm test verbs conn" command  (on 
> NSD client doing dd). If not, then GPFS might be using TCP/IP network 
> over which the cluster is configured impacting performance (If this is 
> the case, GPFS mmfs.log.latest for any Verbs RDMA related errors and 
> resolve).
>
>  *
>
>
>
>
>
>
> Regards,
> -Kums
>
>
>
>
>
>
> From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" 
> <aaron.s.knister at nasa.gov>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 04/21/2017 09:11 AM
> Subject: Re: [gpfsug-discuss] bizarre performance behavior
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
>
>
>
> Fantastic news! It might also be worth running "cpupower monitor" or 
> "turbostat" on your NSD servers while you're running dd tests from the 
> clients to see what CPU frequency your cores are actually running at.
>
> A typical NSD server workload (especially with IB verbs and for reads) 
> can be pretty light on CPU which might not prompt your CPU crew 
> governor to up the frequency (which can affect throughout). If your 
> frequency scaling governor isn't kicking up the frequency of your CPUs 
> I've seen that cause this behavior in my testing.
>
> -Aaron
>
>
>
>
> On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman 
> <kenneth.waegeman at ugent.be> wrote:
>
> Hi,
>
> We are running a test setup with 2 NSD Servers backed by 4 Dell 
> Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of 
> the 4 powervaults, nsd02 is primary serving LUNS of controller B.
>
> We are testing from 2 testing machines connected to the nsds with 
> infiniband, verbs enabled.
>
> When we do dd from the NSD servers, we see indeed performance going to 
> 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is 
> able to get the data at a decent speed. Since we can write from the 
> clients at a good speed, I didn't suspect the communication between 
> clients and nsds being the issue, especially since total performance 
> stays the same using 1 or multiple clients.
>
> I'll use the nsdperf tool to see if we can find anything,
>
> thanks!
>
> K
>
> On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE 
> CORP] wrote:
> Interesting. Could you share a little more about your architecture? Is 
> it possible to mount the fs on an NSD server and do some dd's from the 
> fs on the NSD server? If that gives you decent performance perhaps try 
> NSDPERF next 
> _https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf_ 
> <https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General+Parallel+File+System+%28GPFS%29/page/Testing+network+performance+with+nsdperf>
>
> -Aaron
>
>
>
>
> On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman 
> _<kenneth.waegeman at ugent.be>_ <mailto:kenneth.waegeman at ugent.be>wrote:
>
> Hi,
>
> Having an issue that looks the same as this one:
>
> We can do sequential writes to the filesystem at 7,8 GB/s total , 
> which is the expected speed for our current storage
> backend.  While we have even better performance with sequential reads 
> on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each 
> nsd server seems limited by 0,5GB/s) independent of the number of clients
> (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, 
> MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed 
> in this thread, but nothing seems to impact this read performance.
>
> Any ideas?
>
> Thanks!
>
> Kenneth
>
> On 17/02/17 19:29, Jan-Frode Myklebust wrote:
> I just had a similar experience from a sandisk infiniflash system 
> SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for 
> writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on 
> the order of 2 Gbyte/s.
>
> After a bit head scratching snd fumbling around I found out that 
> reducing maxMBpS from 10000 to 100 fixed the problem! Digging further 
> I found that reducing prefetchThreads from default=72 to 32 also fixed 
> it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.
>
> Could something like this be the problem on your box as well?
>
>
>
> -jf
> fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister 
> <_aaron.s.knister at nasa.gov_ <mailto:aaron.s.knister at nasa.gov>>:
> Well, I'm somewhat scrounging for hardware. This is in our test
> environment :) And yep, it's got the 2U gpu-tray in it although even
> without the riser it has 2 PCIe slots onboard (excluding the on-board
> dual-port mezz card) so I think it would make a fine NSD server even
> without the riser.
>
> -Aaron
>
> On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
> wrote:
> > Maybe its related to interrupt handlers somehow? You drive the load 
> up on one socket, you push all the interrupt handling to the other 
> socket where the fabric card is attached?
> >
> > Dunno ... (Though I am intrigued you use idataplex nodes as NSD 
> servers, I assume its some 2U gpu-tray riser one or something !)
> >
> > Simon
> > ________________________________________
> > From: _gpfsug-discuss-bounces at spectrumscale.org_ 
> <mailto:gpfsug-discuss-bounces at spectrumscale.org>[_gpfsug-discuss-bounces at spectrumscale.org_ 
> <mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of Aaron 
> Knister [_aaron.s.knister at nasa.gov_ <mailto:aaron.s.knister at nasa.gov>]
> > Sent: 17 February 2017 15:52
> > To: gpfsug main discussion list
> > Subject: [gpfsug-discuss] bizarre performance behavior
> >
> > This is a good one. I've got an NSD server with 4x 16GB fibre
> > connections coming in and 1x FDR10 and 1x QDR connection going out to
> > the clients. I was having a really hard time getting anything resembling
> > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> > reads). The back-end is a DDN SFA12K and I *know* it can do better than
> > that.
> >
> > I don't remember quite how I figured this out but simply by running
> > "openssl speed -multi 16" on the nsd server to drive up the load I saw
> > an almost 4x performance jump which is pretty much goes against every
> > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> > quadruple your i/o performance").
> >
> > This feels like some type of C-states frequency scaling shenanigans that
> > I haven't quite ironed down yet. I booted the box with the following
> > kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> > didn't seem to make much of a difference. I also tried setting the
> > frequency governer to userspace and setting the minimum frequency to
> > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> > to run something to drive up the CPU load and then performance improves.
> >
> > I'm wondering if this could be an issue with the C1E state? I'm curious
> > if anyone has seen anything like this. The node is a dx360 M4
> > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
> >
> > -Aaron
> >
> > --
> > Aaron Knister
> > NASA Center for Climate Simulation (Code 606.2)
> > Goddard Space Flight Center
> > (301) 286-2776
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at _spectrumscale.org_ <http://spectrumscale.org/>
> > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at _spectrumscale.org_ <http://spectrumscale.org/>
> > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at _spectrumscale.org_ <http://spectrumscale.org/>_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/9a406f1a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 61023 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/9a406f1a/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 85131 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/9a406f1a/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 84819 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/9a406f1a/attachment-0002.gif>

From kums at us.ibm.com  Fri Apr 21 21:27:49 2017
From: kums at us.ibm.com (Kumaran Rajaram)
Date: Fri, 21 Apr 2017 20:27:49 +0000
Subject: [gpfsug-discuss] bizarre performance behavior
In-Reply-To: <7f7349c9-bdd3-5847-1cca-d98d221489fe@ugent.be>
References: <a9b2c474-314e-0cb5-ff67-36d3c172b1f7@nasa.gov><CF45EE16DEF2FE4B9AA7FF2B6EE26545F58C2929@EX13.adf.bham.ac.uk><2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><CAHwPatjtV0dc4i9bH_Os8-4GAO5w7uBHmYeTdnk3p93ZmhmZyg@mail.gmail.com><4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be><67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov><9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be><BF39DDE5-4E08-4F3E-985A-42E5576E8BA3@nasa.gov><OFD0F73660.1A201F6F-ON00258109.004B3EC7-85258109.004D0D0D@notes.na.collabserv.com>
	<7f7349c9-bdd3-5847-1cca-d98d221489fe@ugent.be>
Message-ID: <OFFBA4CE9D.BAA6A93D-ON00258109.006C6C87-85258109.00706A3A@notes.na.collabserv.com>

Hi Kenneth,

As it was mentioned earlier, it will be good to first verify the raw 
network performance between the NSD client and NSD server using the 
nsdperf tool that is built with RDMA support.
g++ -O2 -DRDMA -o nsdperf -lpthread -lrt -libverbs -lrdmacm nsdperf.C

In addition, since you have 2 x NSD servers it will be good to perform NSD 
client  file-system performance test with just single NSD server 
(mmshutdown the other server, assuming all the NSDs have primary, server 
NSD server configured + Quorum will be intact when a NSD server is brought 
down) to see if it helps to improve the read performance + if there are 
variations in the file-system read bandwidth results between NSD_server#1 
'active' vs. NSD_server #2 'active' (with other NSD server in GPFS "down" 
state). If there is significant variation, it can help to isolate the 
issue to particular NSD server (HW or IB issue?).

You can issue "mmdiag --waiters" on NSD client as well as NSD servers 
during your dd test, to verify if there are unsual long GPFS waiters. In 
addition, you may issue Linux  "perf top -z" command on the GPFS node  to 
see if there is  high CPU usage by any particular call/event (for e.g., If 
GPFS config parameter verbsRdmaMaxSendBytes has been  set to low value 
from the default 16M, then it can cause RDMA completion threads to go CPU 
bound ). Please verify some performance scenarios detailed in Chapter 22 
in Spectrum Scale Problem Determination Guide (link below).

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/pdf/scale_pdg.pdf?view=kc

Thanks,
-Kums 


From:   Kenneth Waegeman <kenneth.waegeman at ugent.be>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   04/21/2017 11:43 AM
Subject:        Re: [gpfsug-discuss] bizarre performance behavior
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
We already verified this on our nsds:
[root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --QpiSpeed
QpiSpeed=maxdatarate
[root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --turbomode
turbomode=enable
[root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg ?-SysProfile 
SysProfile=perfoptimized
so sadly this is not the issue.
Also the output of the verbs commands look ok, there are connections from 
the client to the nsds are there is data being read and writen.

Thanks again! 
Kenneth

On 21/04/17 16:01, Kumaran Rajaram wrote:
Hi,

Try enabling the following in the BIOS of the NSD servers (screen shots 
below) 
Turbo Mode - Enable
QPI Link Frequency - Max Performance
Operating Mode - Maximum Performance
>>>>While we have even better performance with sequential reads on raw 
storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server 
seems limited by 0,5GB/s) independent of the number of clients   
>>We are testing from 2 testing machines connected to the nsds with 
infiniband, verbs enabled.

Also, It will be good to verify that all the GPFS nodes have Verbs RDMA 
started using "mmfsadm test verbs status" and that the NSD client-server 
communication from client to server during "dd" is actually using Verbs 
RDMA using "mmfsadm test verbs conn" command  (on NSD client doing dd). If 
not, then GPFS might be using TCP/IP network over which the cluster is 
configured impacting performance (If this is the case, GPFS 
mmfs.log.latest for any Verbs RDMA related errors and resolve). 


Regards,
-Kums


From:        "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" 
<aaron.s.knister at nasa.gov>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        04/21/2017 09:11 AM
Subject:        Re: [gpfsug-discuss] bizarre performance behavior
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Fantastic news! It might also be worth running "cpupower monitor" or 
"turbostat" on your NSD servers while you're running dd tests from the 
clients to see what CPU frequency your cores are actually running at.  

A typical NSD server workload (especially with IB verbs and for reads) can 
be pretty light on CPU which might not prompt your CPU crew governor to up 
the frequency (which can affect throughout). If your frequency scaling 
governor isn't kicking up the frequency of your CPUs I've seen that cause 
this behavior in my testing.  

-Aaron


On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman 
<kenneth.waegeman at ugent.be> wrote: 
Hi, 
We are running a test setup with 2 NSD Servers backed by 4 Dell 
Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of the 
4 powervaults, nsd02 is primary serving LUNS of controller B. 
We are testing from 2 testing machines connected to the nsds with 
infiniband, verbs enabled.
When we do dd from the NSD servers, we see indeed performance going to 
5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able to 
get the data at a decent speed. Since we can write from the clients at a 
good speed, I didn't suspect the communication between clients and nsds 
being the issue, especially since total performance stays the same using 1 
or multiple clients. 

I'll use the nsdperf tool to see if we can find anything, 

thanks!

K

On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] 
wrote:
Interesting. Could you share a little more about your architecture? Is it 
possible to mount the fs on an NSD server and do some dd's from the fs on 
the NSD server? If that gives you decent performance perhaps try NSDPERF 
next 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf


-Aaron


On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman 
<kenneth.waegeman at ugent.be>wrote:
Hi,
Having an issue that looks the same as this one: 
We can do sequential writes to the filesystem at 7,8 GB/s total , which is 
the expected speed for our current storage    
backend.  While we have even better performance with sequential reads on 
raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd 
server seems limited by 0,5GB/s) independent of the number of clients   
(1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, 
MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in 
this thread, but nothing seems to impact this read performance. 
Any ideas?
Thanks!

Kenneth

On 17/02/17 19:29, Jan-Frode Myklebust wrote:
I just had a similar experience from a sandisk infiniflash system 
SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. 
and 250-300 Mbyte/s on sequential reads!! Random reads were on the order 
of 2 Gbyte/s.

After a bit head scratching snd fumbling around I found out that reducing 
maxMBpS from 10000 to 100 fixed the problem! Digging further I found that 
reducing prefetchThreads from default=72 to 32 also fixed it, while 
leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.

Could something like this be the problem on your box as well?


-jf
fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <aaron.s.knister at nasa.gov
>:
Well, I'm somewhat scrounging for hardware. This is in our test
environment :) And yep, it's got the 2U gpu-tray in it although even
without the riser it has 2 PCIe slots onboard (excluding the on-board
dual-port mezz card) so I think it would make a fine NSD server even
without the riser.

-Aaron

On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
wrote:
> Maybe its related to interrupt handlers somehow? You drive the load up 
on one socket, you push all the interrupt handling to the other socket 
where the fabric card is attached?
>
> Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, 
I assume its some 2U gpu-tray riser one or something !)
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org[
gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [
aaron.s.knister at nasa.gov]
> Sent: 17 February 2017 15:52
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] bizarre performance behavior
>
> This is a good one. I've got an NSD server with 4x 16GB fibre
> connections coming in and 1x FDR10 and 1x QDR connection going out to
> the clients. I was having a really hard time getting anything resembling
> sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> reads). The back-end is a DDN SFA12K and I *know* it can do better than
> that.
>
> I don't remember quite how I figured this out but simply by running
> "openssl speed -multi 16" on the nsd server to drive up the load I saw
> an almost 4x performance jump which is pretty much goes against every
> sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> quadruple your i/o performance").
>
> This feels like some type of C-states frequency scaling shenanigans that
> I haven't quite ironed down yet. I booted the box with the following
> kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> didn't seem to make much of a difference. I also tried setting the
> frequency governer to userspace and setting the minimum frequency to
> 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> to run something to drive up the CPU load and then performance improves.
>
> I'm wondering if this could be an issue with the C1E state? I'm curious
> if anyone has seen anything like this. The node is a dx360 M4
> (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ff876e5e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 61023 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ff876e5e/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 85131 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ff876e5e/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 84819 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/ff876e5e/attachment-0002.gif>

From frank.tower at outlook.com  Thu Apr 20 13:27:13 2017
From: frank.tower at outlook.com (Frank Tower)
Date: Thu, 20 Apr 2017 12:27:13 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
Message-ID: <AM5PR10MB1650FAA731F2084CFDAA9F92F91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>

Hi,


We have here around 2PB GPFS where users access oney through GPFS client (used by an HPC cluster), but we will have to setup protocols nodes.


We will have to share GPFS data to ~ 1000 users, where each users will have different access usage, meaning:


- some will do large I/O (e.g: store 1TB files)

- some will read/write more than 10k files in a raw

- other will do only sequential read


I already read the following wiki page:


https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node

IBM Spectrum Scale Wiki - Sizing Guidance for Protocol Node<https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node>
www.ibm.com
developerWorks wikis allow groups of people to jointly create and maintain content through contribution and collaboration. Wikis apply the wisdom of crowds to ...


But I wondering if some people have recommendations regarding hardware sizing and software tuning for such situation ?

Or better, if someone already such setup ?


Thank you by advance,

Frank.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170420/87e2b88e/attachment.htm>

From valdis.kletnieks at vt.edu  Sat Apr 22 05:30:29 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Sat, 22 Apr 2017 00:30:29 -0400
Subject: [gpfsug-discuss] Protocol node recommendations
In-Reply-To: <AM5PR10MB1650FAA731F2084CFDAA9F92F91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
References: <AM5PR10MB1650FAA731F2084CFDAA9F92F91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <52354.1492835429@turing-police.cc.vt.edu>

On Thu, 20 Apr 2017 12:27:13 -0000, Frank Tower said:

> - some will do large I/O (e.g: store 1TB files)
> - some will read/write more than 10k files in a raw
> - other will do only sequential read

> But I wondering if some people have recommendations regarding hardware sizing
> and software tuning for such situation ?

The three most critical pieces of info are missing here:

1) Do you mean 1,000 human users, or 1,000 machines doing NFS/CIFS mounts?

2) How many of the users are likely to be active at the same time? 1,000
users, each of whom are active an hour a week is entirely different from
200 users that are each active 140 hours a week.

3) What SLA/performance target are they expecting?  If they want
large 1TB I/O and 100MB/sec is acceptable, that's different than if they
have a business need to go at 1.2GB/sec....
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170422/63448bef/attachment.sig>

From frank.tower at outlook.com  Sat Apr 22 07:34:44 2017
From: frank.tower at outlook.com (Frank Tower)
Date: Sat, 22 Apr 2017 06:34:44 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
Message-ID: <AM5PR10MB1650636B630E3FFA6C861DEDF91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>

Hi,

We have here around 2PB GPFS (4.2.2) accessed through an HPC cluster with GPFS client on each node.

We will have to open GPFS to all our users over CIFS and kerberized NFS with ACL support for both protocol for around +1000 users

All users have different use case and needs:
- some will do random I/O through a large set of opened files (~5k files)
- some will do large write with 500GB-1TB files
- other will arrange sequential I/O with ~10k opened files

NFS and CIFS will share the same server, so I through to use SSD drive, at least 128GB memory with 2 sockets.

Regarding tuning parameters, I thought at:

maxFilesToCache 10000
syncIntervalStrict yes
workerThreads (8*core)
prefetchPct 40 (for now and update if needed)

I read the wiki 'Sizing Guidance for Protocol Node', but I was wondering if someone could share his experience/best practice regarding hardware sizing and/or tuning parameters.

Thank by advance,
Frank
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170422/9372ad34/attachment.htm>

From janfrode at tanso.net  Sat Apr 22 09:50:11 2017
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Sat, 22 Apr 2017 08:50:11 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
In-Reply-To: <AM5PR10MB1650636B630E3FFA6C861DEDF91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
References: <AM5PR10MB1650636B630E3FFA6C861DEDF91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <CAHwPatihL+8_zA-BbR2tZo7Ems=qBX=3z0PyaAiCbeTp14Zg4g@mail.gmail.com>

That's a tiny maxFilesToCache...

I would start by implementing the settings from
/usr/lpp/mmfs/*/gpfsprotocolldefaul* plus a 64GB pagepool for your
protocoll nodes, and leave further tuning to when you see you have issues.

Regarding sizing, we have a spreadsheet somewhere where you can input some
workload parameters and get an idea for how many nodes you'll need. Your
node config seems fine, but one node seems too few to serve 1000+ users. We
support max 3000 SMB connections/node, and I believe the recommendation is
4000 NFS connections/node.


-jf
l?r. 22. apr. 2017 kl. 08.34 skrev Frank Tower <frank.tower at outlook.com>:

> Hi,
>
> We have here around 2PB GPFS (4.2.2) accessed through an HPC cluster with
> GPFS client on each node.
>
> We will have to open GPFS to all our users over CIFS and kerberized NFS
> with ACL support for both protocol for around +1000 users
>
> All users have different use case and needs:
> - some will do random I/O through a large set of opened files (~5k files)
> - some will do large write with 500GB-1TB files
> - other will arrange sequential I/O with ~10k opened files
>
> NFS and CIFS will share the same server, so I through to use SSD drive, at
> least 128GB memory with 2 sockets.
>
> Regarding tuning parameters, I thought at:
>
> maxFilesToCache 10000
> syncIntervalStrict yes
> workerThreads (8*core)
> prefetchPct 40 (for now and update if needed)
>
> I read the wiki 'Sizing Guidance for Protocol Node', but I was wondering
> if someone could share his experience/best practice regarding hardware
> sizing and/or tuning parameters.
>
> Thank by advance,
> Frank
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170422/d2d0afbe/attachment.htm>

From frank.tower at outlook.com  Sat Apr 22 19:47:59 2017
From: frank.tower at outlook.com (Frank Tower)
Date: Sat, 22 Apr 2017 18:47:59 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
In-Reply-To: <52354.1492835429@turing-police.cc.vt.edu>
References: <AM5PR10MB1650FAA731F2084CFDAA9F92F91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>,
	<52354.1492835429@turing-police.cc.vt.edu>
Message-ID: <AM5PR10MB16506E6D38FE610A8A6B02F4F91D0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>


Hi,

Thank for your answer.

> 1) Do you mean 1,000 human users, or 1,000 machines doing NFS/CIFS mounts?
True, here the list:
- 800 users that have 1 workstation through 1Gb/s ethernet and will use NFS/CIFS
- 200 users that have 2 workstation through 1Gb/s ethernet, few have 10Gb/s ethernet and will use NFS/CIFS

> 2) How many of the users are likely to be active at the same time? 1,000
> users, each of whom are active an hour a week is entirely different from
> 200 users that are each active 140 hours a week.
True again, around 200 users will actively use GPFS through NFS/CIFS during night and day, but we cannot control if people will use 2 workstations or more :(
We will have peak during day with an average of 700 'workstations'

> 3) What SLA/performance target are they expecting?  If they want
> large 1TB I/O and 100MB/sec is acceptable, that's different than if they
> have a business need to go at 1.2GB/sec....
We just want to provide at normal throughput through an 1GB/s network. Users are aware of such situation and will mainly use HPC cluster for high speed and heavy computation.
But they would like to do 'light' computation on their desktop.
The main topic here is to sustain 'normal' throughput for all users during peak.

Thank for your help.

________________________________
From: valdis.kletnieks at vt.edu <valdis.kletnieks at vt.edu>
Sent: Saturday, April 22, 2017 6:30 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Protocol node recommendations

On Thu, 20 Apr 2017 12:27:13 -0000, Frank Tower said:

> - some will do large I/O (e.g: store 1TB files)
> - some will read/write more than 10k files in a raw
> - other will do only sequential read

> But I wondering if some people have recommendations regarding hardware sizing
> and software tuning for such situation ?

The three most critical pieces of info are missing here:

1) Do you mean 1,000 human users, or 1,000 machines doing NFS/CIFS mounts?

2) How many of the users are likely to be active at the same time? 1,000
users, each of whom are active an hour a week is entirely different from
200 users that are each active 140 hours a week.

3) What SLA/performance target are they expecting?  If they want
large 1TB I/O and 100MB/sec is acceptable, that's different than if they
have a business need to go at 1.2GB/sec....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170422/0d53ba55/attachment.htm>

From frank.tower at outlook.com  Sat Apr 22 20:22:23 2017
From: frank.tower at outlook.com (Frank Tower)
Date: Sat, 22 Apr 2017 19:22:23 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
In-Reply-To: <CAHwPatihL+8_zA-BbR2tZo7Ems=qBX=3z0PyaAiCbeTp14Zg4g@mail.gmail.com>
References: <AM5PR10MB1650636B630E3FFA6C861DEDF91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>,
	<CAHwPatihL+8_zA-BbR2tZo7Ems=qBX=3z0PyaAiCbeTp14Zg4g@mail.gmail.com>
Message-ID: <AM5PR10MB165056A3C5EDDE64EF1C9179F91D0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>

Hi,


Thank for the recommendations.

Now we deal with the situation of:


- take 3 nodes with round robin DNS that handle both protocols

- take 4 nodes, split CIFS and NFS, still use round robin DNS for CIFS and NFS services.


Regarding your recommendations, 256GB memory node could be a plus if we mix both protocols for such case.


Is the spreadsheet publicly available or do we need to ask IBM ?


Thank for your help,

Frank.

________________________________
From: Jan-Frode Myklebust <janfrode at tanso.net>
Sent: Saturday, April 22, 2017 10:50 AM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Protocol node recommendations

That's a tiny maxFilesToCache...

I would start by implementing the settings from /usr/lpp/mmfs/*/gpfsprotocolldefaul* plus a 64GB pagepool for your protocoll nodes, and leave further tuning to when you see you have issues.

Regarding sizing, we have a spreadsheet somewhere where you can input some workload parameters and get an idea for how many nodes you'll need. Your node config seems fine, but one node seems too few to serve 1000+ users. We support max 3000 SMB connections/node, and I believe the recommendation is 4000 NFS connections/node.


-jf
l?r. 22. apr. 2017 kl. 08.34 skrev Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>>:
Hi,

We have here around 2PB GPFS (4.2.2) accessed through an HPC cluster with GPFS client on each node.

We will have to open GPFS to all our users over CIFS and kerberized NFS with ACL support for both protocol for around +1000 users

All users have different use case and needs:
- some will do random I/O through a large set of opened files (~5k files)
- some will do large write with 500GB-1TB files
- other will arrange sequential I/O with ~10k opened files

NFS and CIFS will share the same server, so I through to use SSD drive, at least 128GB memory with 2 sockets.

Regarding tuning parameters, I thought at:

maxFilesToCache 10000
syncIntervalStrict yes
workerThreads (8*core)
prefetchPct 40 (for now and update if needed)

I read the wiki 'Sizing Guidance for Protocol Node', but I was wondering if someone could share his experience/best practice regarding hardware sizing and/or tuning parameters.

Thank by advance,
Frank
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170422/2e74b8d0/attachment.htm>

From janfrode at tanso.net  Sun Apr 23 11:07:38 2017
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Sun, 23 Apr 2017 10:07:38 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
In-Reply-To: <AM5PR10MB165056A3C5EDDE64EF1C9179F91D0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
References: <AM5PR10MB1650636B630E3FFA6C861DEDF91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
	<CAHwPatihL+8_zA-BbR2tZo7Ems=qBX=3z0PyaAiCbeTp14Zg4g@mail.gmail.com>
	<AM5PR10MB165056A3C5EDDE64EF1C9179F91D0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <CAHwPath+u-H1Q_qXEuJZ3e-iEUmZ2U-5oPkiAqqi5X5GcCcQ_g@mail.gmail.com>

The protocol sizing tool should be available from
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node/version/70a4c7c0-a5c6-4dde-b391-8f91c542dd7d
, but I'm getting 404 now.

I think 128GB should be enough for both protocols on same nodes, and I
think your 3 node suggestion is best. Better load sharing with not
dedicating subset of nodes to each protocol.


-jf
l?r. 22. apr. 2017 kl. 21.22 skrev Frank Tower <frank.tower at outlook.com>:

> Hi,
>
>
> Thank for the recommendations.
>
> Now we deal with the situation of:
>
>
> - take 3 nodes with round robin DNS that handle both protocols
>
> - take 4 nodes, split CIFS and NFS, still use round robin DNS for CIFS and
> NFS services.
>
>
> Regarding your recommendations, 256GB memory node could be a plus if we
> mix both protocols for such case.
>
>
> Is the spreadsheet publicly available or do we need to ask IBM ?
>
>
> Thank for your help,
>
> Frank.
>
>
> ------------------------------
> *From:* Jan-Frode Myklebust <janfrode at tanso.net>
> *Sent:* Saturday, April 22, 2017 10:50 AM
> *To:* gpfsug-discuss at spectrumscale.org
> *Subject:* Re: [gpfsug-discuss] Protocol node recommendations
>
> That's a tiny maxFilesToCache...
>
> I would start by implementing the settings from
> /usr/lpp/mmfs/*/gpfsprotocolldefaul* plus a 64GB pagepool for your
> protocoll nodes, and leave further tuning to when you see you have issues.
>
> Regarding sizing, we have a spreadsheet somewhere where you can input some
> workload parameters and get an idea for how many nodes you'll need. Your
> node config seems fine, but one node seems too few to serve 1000+ users. We
> support max 3000 SMB connections/node, and I believe the recommendation is
> 4000 NFS connections/node.
>
>
> -jf
> l?r. 22. apr. 2017 kl. 08.34 skrev Frank Tower <frank.tower at outlook.com>:
>
>> Hi,
>>
>> We have here around 2PB GPFS (4.2.2) accessed through an HPC cluster with
>> GPFS client on each node.
>>
>> We will have to open GPFS to all our users over CIFS and kerberized NFS
>> with ACL support for both protocol for around +1000 users
>>
>> All users have different use case and needs:
>> - some will do random I/O through a large set of opened files (~5k files)
>> - some will do large write with 500GB-1TB files
>> - other will arrange sequential I/O with ~10k opened files
>>
>> NFS and CIFS will share the same server, so I through to use SSD
>> drive, at least 128GB memory with 2 sockets.
>>
>> Regarding tuning parameters, I thought at:
>>
>> maxFilesToCache 10000
>> syncIntervalStrict yes
>> workerThreads (8*core)
>> prefetchPct 40 (for now and update if needed)
>>
>> I read the wiki 'Sizing Guidance for Protocol Node', but I was wondering
>> if someone could share his experience/best practice regarding hardware
>> sizing and/or tuning parameters.
>>
>> Thank by advance,
>> Frank
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170423/fdc48a30/attachment.htm>

From rreuscher at verizon.net  Sun Apr 23 17:43:44 2017
From: rreuscher at verizon.net (Robert Reuscher)
Date: Sun, 23 Apr 2017 11:43:44 -0500
Subject: [gpfsug-discuss] LUN expansion
Message-ID: <4CBF459B-4008-4CA2-904F-1A48882F021E@verizon.net>

We run GPFS on z/Linux and have been using ECKD devices for disks. We are looking at implementing some new filesystems on FCP LUNS. One of the features of a LUN is we can expand a LUN instead of adding new LUNS, where as with ECKD devices. From what I?ve found searching to see if GPFS filesystem can be expanding to see the expanded LUN, it doesn?t seem that this will work, you have to add new LUNS (or new disks) and then add them to the filesystem. Everything I?ve found is at least 2-3 old (most of it much older), and just want to check that this is still is true before we make finalize our LUN/GPFS procedures.

Robert Reuscher
NR5AR

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170423/647691fa/attachment.htm>

From frank.tower at outlook.com  Sun Apr 23 22:27:50 2017
From: frank.tower at outlook.com (Frank Tower)
Date: Sun, 23 Apr 2017 21:27:50 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
In-Reply-To: <CAHwPath+u-H1Q_qXEuJZ3e-iEUmZ2U-5oPkiAqqi5X5GcCcQ_g@mail.gmail.com>
References: <AM5PR10MB1650636B630E3FFA6C861DEDF91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
	<CAHwPatihL+8_zA-BbR2tZo7Ems=qBX=3z0PyaAiCbeTp14Zg4g@mail.gmail.com>
	<AM5PR10MB165056A3C5EDDE64EF1C9179F91D0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>,
	<CAHwPath+u-H1Q_qXEuJZ3e-iEUmZ2U-5oPkiAqqi5X5GcCcQ_g@mail.gmail.com>
Message-ID: <AM5PR10MB16509861650EE451C1D8437CF91C0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>

Hi,


Nice ! didn't pay attention at the revision and the spreadsheet. If someone still have a copy somewhere it could be useful, Google didn't help :(


We will follow your advise and start with 3 protocol nodes equipped with 128GB memory, 2 x 12 cores  (maybe E5-2680 or E5-2670).


>From what I read, NFS-Ganesha mainly depend of the hardware, Linux on a SSD should be a big plus in our case.


Best,

Frank


________________________________
From: Jan-Frode Myklebust <janfrode at tanso.net>
Sent: Sunday, April 23, 2017 12:07:38 PM
To: Frank Tower; gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Protocol node recommendations

The protocol sizing tool should be available from https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node/version/70a4c7c0-a5c6-4dde-b391-8f91c542dd7d , but I'm getting 404 now.

I think 128GB should be enough for both protocols on same nodes, and I think your 3 node suggestion is best. Better load sharing with not dedicating subset of nodes to each protocol.


-jf
l?r. 22. apr. 2017 kl. 21.22 skrev Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>>:

Hi,


Thank for the recommendations.

Now we deal with the situation of:


- take 3 nodes with round robin DNS that handle both protocols

- take 4 nodes, split CIFS and NFS, still use round robin DNS for CIFS and NFS services.


Regarding your recommendations, 256GB memory node could be a plus if we mix both protocols for such case.


Is the spreadsheet publicly available or do we need to ask IBM ?


Thank for your help,

Frank.

________________________________
From: Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>>
Sent: Saturday, April 22, 2017 10:50 AM
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Protocol node recommendations

That's a tiny maxFilesToCache...

I would start by implementing the settings from /usr/lpp/mmfs/*/gpfsprotocolldefaul* plus a 64GB pagepool for your protocoll nodes, and leave further tuning to when you see you have issues.

Regarding sizing, we have a spreadsheet somewhere where you can input some workload parameters and get an idea for how many nodes you'll need. Your node config seems fine, but one node seems too few to serve 1000+ users. We support max 3000 SMB connections/node, and I believe the recommendation is 4000 NFS connections/node.


-jf
l?r. 22. apr. 2017 kl. 08.34 skrev Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>>:
Hi,

We have here around 2PB GPFS (4.2.2) accessed through an HPC cluster with GPFS client on each node.

We will have to open GPFS to all our users over CIFS and kerberized NFS with ACL support for both protocol for around +1000 users

All users have different use case and needs:
- some will do random I/O through a large set of opened files (~5k files)
- some will do large write with 500GB-1TB files
- other will arrange sequential I/O with ~10k opened files

NFS and CIFS will share the same server, so I through to use SSD drive, at least 128GB memory with 2 sockets.

Regarding tuning parameters, I thought at:

maxFilesToCache 10000
syncIntervalStrict yes
workerThreads (8*core)
prefetchPct 40 (for now and update if needed)

I read the wiki 'Sizing Guidance for Protocol Node', but I was wondering if someone could share his experience/best practice regarding hardware sizing and/or tuning parameters.

Thank by advance,
Frank
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170423/3967cbce/attachment.htm>

From sfadden at us.ibm.com  Sun Apr 23 23:44:56 2017
From: sfadden at us.ibm.com (Scott Fadden)
Date: Sun, 23 Apr 2017 22:44:56 +0000
Subject: [gpfsug-discuss] LUN expansion
In-Reply-To: <4CBF459B-4008-4CA2-904F-1A48882F021E@verizon.net>
References: <4CBF459B-4008-4CA2-904F-1A48882F021E@verizon.net>
Message-ID: <OF1ED5C064.17A91CAA-ON0025810B.007CF6E5-0025810B.007CF6EB@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170423/d6029ad3/attachment.htm>

From r.sobey at imperial.ac.uk  Mon Apr 24 10:11:25 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Mon, 24 Apr 2017 09:11:25 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
In-Reply-To: <AM5PR10MB16509861650EE451C1D8437CF91C0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
References: <AM5PR10MB1650636B630E3FFA6C861DEDF91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
	<CAHwPatihL+8_zA-BbR2tZo7Ems=qBX=3z0PyaAiCbeTp14Zg4g@mail.gmail.com>
	<AM5PR10MB165056A3C5EDDE64EF1C9179F91D0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>,
	<CAHwPath+u-H1Q_qXEuJZ3e-iEUmZ2U-5oPkiAqqi5X5GcCcQ_g@mail.gmail.com>
	<AM5PR10MB16509861650EE451C1D8437CF91C0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <AMSPR06MB40557B7DF5616CF316687EADF1F0@AMSPR06MB405.eurprd06.prod.outlook.com>

What's your SSD going to help with... will you implement it as a LROC device? Otherwise I can't see the benefit to using it to boot off.

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Frank Tower
Sent: 23 April 2017 22:28
To: Jan-Frode Myklebust <janfrode at tanso.net>; gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Protocol node recommendations


Hi,


Nice ! didn't pay attention at the revision and the spreadsheet. If someone still have a copy somewhere it could be useful, Google didn't help :(


We will follow your advise and start with 3 protocol nodes equipped with 128GB memory, 2 x 12 cores  (maybe E5-2680 or E5-2670).


>From what I read, NFS-Ganesha mainly depend of the hardware, Linux on a SSD should be a big plus in our case.


Best,

Frank


________________________________
From: Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>>
Sent: Sunday, April 23, 2017 12:07:38 PM
To: Frank Tower; gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Protocol node recommendations

The protocol sizing tool should be available from https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node/version/70a4c7c0-a5c6-4dde-b391-8f91c542dd7d , but I'm getting 404 now.

I think 128GB should be enough for both protocols on same nodes, and I think your 3 node suggestion is best. Better load sharing with not dedicating subset of nodes to each protocol.


-jf
l?r. 22. apr. 2017 kl. 21.22 skrev Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>>:

Hi,


Thank for the recommendations.

Now we deal with the situation of:


- take 3 nodes with round robin DNS that handle both protocols

- take 4 nodes, split CIFS and NFS, still use round robin DNS for CIFS and NFS services.


Regarding your recommendations, 256GB memory node could be a plus if we mix both protocols for such case.


Is the spreadsheet publicly available or do we need to ask IBM ?


Thank for your help,

Frank.

________________________________
From: Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>>
Sent: Saturday, April 22, 2017 10:50 AM
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Protocol node recommendations

That's a tiny maxFilesToCache...

I would start by implementing the settings from /usr/lpp/mmfs/*/gpfsprotocolldefaul* plus a 64GB pagepool for your protocoll nodes, and leave further tuning to when you see you have issues.

Regarding sizing, we have a spreadsheet somewhere where you can input some workload parameters and get an idea for how many nodes you'll need. Your node config seems fine, but one node seems too few to serve 1000+ users. We support max 3000 SMB connections/node, and I believe the recommendation is 4000 NFS connections/node.


-jf
l?r. 22. apr. 2017 kl. 08.34 skrev Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>>:
Hi,

We have here around 2PB GPFS (4.2.2) accessed through an HPC cluster with GPFS client on each node.

We will have to open GPFS to all our users over CIFS and kerberized NFS with ACL support for both protocol for around +1000 users

All users have different use case and needs:
- some will do random I/O through a large set of opened files (~5k files)
- some will do large write with 500GB-1TB files
- other will arrange sequential I/O with ~10k opened files

NFS and CIFS will share the same server, so I through to use SSD drive, at least 128GB memory with 2 sockets.

Regarding tuning parameters, I thought at:

maxFilesToCache 10000
syncIntervalStrict yes
workerThreads (8*core)
prefetchPct 40 (for now and update if needed)

I read the wiki 'Sizing Guidance for Protocol Node', but I was wondering if someone could share his experience/best practice regarding hardware sizing and/or tuning parameters.

Thank by advance,
Frank
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170424/ef7c2288/attachment.htm>

From service at metamodul.com  Mon Apr 24 11:28:08 2017
From: service at metamodul.com (Hans-Joachim Ehlers)
Date: Mon, 24 Apr 2017 12:28:08 +0200 (CEST)
Subject: [gpfsug-discuss] Used virtualization technologies for GPFS/Spectrum
	Scale
Message-ID: <416417651.114582.1493029688959@email.1und1.de>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170424/1760e958/attachment.htm>

From jonathan at buzzard.me.uk  Mon Apr 24 12:14:17 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Mon, 24 Apr 2017 12:14:17 +0100
Subject: [gpfsug-discuss] Used virtualization technologies for
 GPFS/Spectrum Scale
In-Reply-To: <416417651.114582.1493029688959@email.1und1.de>
References: <416417651.114582.1493029688959@email.1und1.de>
Message-ID: <1493032457.11896.20.camel@buzzard.me.uk>

On Mon, 2017-04-24 at 12:28 +0200, Hans-Joachim Ehlers wrote:
> @All
> 
> 
> does anybody uses virtualization technologies for GPFS Server ? If yes
> what kind and why have you selected your soulution.
> 
> I think currently about using Linux on Power using 40G SR-IOV for
> Network and NPIV/Dedidcated FC Adater for storage. As a plus i can
> also assign only a certain amount of CPUs to GPFS. ( Lower license
> cost / You pay for what you use)
> 
> 
> I must admit that i am not familar how "good" KVM/ESX in respect to
> direct assignment of hardware is. Thus the question to the group
> 

For the most part GPFS is used at scale and in general all the
components are redundant. As such why you would want to allocate less
than a whole server into a production GPFS system in somewhat beyond me.

That is you will have a bunch of NSD servers in the system and if one
crashes, well the other NSD's take over. Similar for protocol nodes, and
in general the total file system size is going to hundreds of TB
otherwise why bother with GPFS.

I guess there is currently potential value at sticking the GUI into a
virtual machine to get redundancy.

On the other hand if you want a test rig, then virtualization works
wonders. I have put GPFS on a single Linux box, using LV's for the disks
and mapping them into virtual machines under KVM.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From service at metamodul.com  Mon Apr 24 13:21:09 2017
From: service at metamodul.com (service at metamodul.com)
Date: Mon, 24 Apr 2017 14:21:09 +0200
Subject: [gpfsug-discuss] Used virtualization technologies for
 GPFS/Spectrum Scale
Message-ID: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>

Hi Jonathan
todays hardware is so powerful that imho it might make sense to split a CEC into more "piece". For example the IBM S822L has up to 2x12 cores, 9 PCI3 slots ( 4?16 lans & 5?8 lan ).
I think that such a server is a little bit to big ?just to be a single NSD server.
Note that i use for each GPFS service a dedicated node.
So if i would go for 4 NSD server, 6 protocol nodes and 2 tsm backup nodes and at least 3 test server a total of 11 server is needed.
Inhm 4xS822L could handle this and a little bit more quite well.

Of course blade technology could be used or 1U server.

With kind regards
Hajo

--?
Unix Systems Engineer
MetaModul GmbH
+49 177 4393994

<div>-------- Urspr?ngliche Nachricht --------</div><div>Von: Jonathan Buzzard <jonathan at buzzard.me.uk> </div><div>Datum:2017.04.24  13:14  (GMT+01:00) </div><div>An: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org> </div><div>Betreff: Re: [gpfsug-discuss] Used virtualization technologies for
  GPFS/Spectrum Scale </div><div>
</div>On Mon, 2017-04-24 at 12:28 +0200, Hans-Joachim Ehlers wrote:
> @All
> 
> 
> does anybody uses virtualization technologies for GPFS Server ? If yes
> what kind and why have you selected your soulution.
> 
> I think currently about using Linux on Power using 40G SR-IOV for
> Network and NPIV/Dedidcated FC Adater for storage. As a plus i can
> also assign only a certain amount of CPUs to GPFS. ( Lower license
> cost / You pay for what you use)
> 
> 
> I must admit that i am not familar how "good" KVM/ESX in respect to
> direct assignment of hardware is. Thus the question to the group
> 

For the most part GPFS is used at scale and in general all the
components are redundant. As such why you would want to allocate less
than a whole server into a production GPFS system in somewhat beyond me.

That is you will have a bunch of NSD servers in the system and if one
crashes, well the other NSD's take over. Similar for protocol nodes, and
in general the total file system size is going to hundreds of TB
otherwise why bother with GPFS.

I guess there is currently potential value at sticking the GUI into a
virtual machine to get redundancy.

On the other hand if you want a test rig, then virtualization works
wonders. I have put GPFS on a single Linux box, using LV's for the disks
and mapping them into virtual machines under KVM.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170424/739d1b1b/attachment.htm>

From luis.bolinches at fi.ibm.com  Mon Apr 24 13:42:51 2017
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Mon, 24 Apr 2017 15:42:51 +0300
Subject: [gpfsug-discuss] Used virtualization technologies for
 GPFS/Spectrum Scale
In-Reply-To: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
References: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
Message-ID: <OF05A1DFB2.8C25F8AA-ONC225810C.00455199-C225810C.0045D7F9@notes.na.collabserv.com>

Hi

As tastes vary, I would not partition it so much for the backend. Assuming 
there is little to nothing overhead on the CPU at PHYP level, which it 
depends. On the protocols nodes, due the CTDB keeping locks together 
across all nodes (SMB), you would get more performance on bigger & less 
number of CES nodes than more and smaller.

Certainly a 822 is quite a server if we go back to previous generations 
but I would still keep a simple backend (NSd servers), simple CES (less 
number of nodes the merrier) & then on the client part go as micro 
partitions as you like/can as the effect on the cluster is less relevant 
in the case of resources starvation.

But, it depends on workloads, SLA and money so I say try, establish a 
baseline and it fills the requirements, go for it. If not change till 
does. Have fun


From:   "service at metamodul.com" <service at metamodul.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   24/04/2017 15:21
Subject:        Re: [gpfsug-discuss] Used virtualization technologies for 
GPFS/Spectrum Scale
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Jonathan
todays hardware is so powerful that imho it might make sense to split a 
CEC into more "piece". For example the IBM S822L has up to 2x12 cores, 9 
PCI3 slots ( 4?16 lans & 5?8 lan ).
I think that such a server is a little bit to big  just to be a single NSD 
server.
Note that i use for each GPFS service a dedicated node.
So if i would go for 4 NSD server, 6 protocol nodes and 2 tsm backup nodes 
and at least 3 test server a total of 11 server is needed.
Inhm 4xS822L could handle this and a little bit more quite well.

Of course blade technology could be used or 1U server.

With kind regards
Hajo

-- 
Unix Systems Engineer
MetaModul GmbH
+49 177 4393994


-------- Urspr?ngliche Nachricht --------
Von: Jonathan Buzzard 
Datum:2017.04.24 13:14 (GMT+01:00) 
An: gpfsug main discussion list 
Betreff: Re: [gpfsug-discuss] Used virtualization technologies for 
GPFS/Spectrum Scale 

On Mon, 2017-04-24 at 12:28 +0200, Hans-Joachim Ehlers wrote:
> @All
> 
> 
> does anybody uses virtualization technologies for GPFS Server ? If yes
> what kind and why have you selected your soulution.
> 
> I think currently about using Linux on Power using 40G SR-IOV for
> Network and NPIV/Dedidcated FC Adater for storage. As a plus i can
> also assign only a certain amount of CPUs to GPFS. ( Lower license
> cost / You pay for what you use)
> 
> 
> I must admit that i am not familar how "good" KVM/ESX in respect to
> direct assignment of hardware is. Thus the question to the group
> 

For the most part GPFS is used at scale and in general all the
components are redundant. As such why you would want to allocate less
than a whole server into a production GPFS system in somewhat beyond me.

That is you will have a bunch of NSD servers in the system and if one
crashes, well the other NSD's take over. Similar for protocol nodes, and
in general the total file system size is going to hundreds of TB
otherwise why bother with GPFS.

I guess there is currently potential value at sticking the GUI into a
virtual machine to get redundancy.

On the other hand if you want a test rig, then virtualization works
wonders. I have put GPFS on a single Linux box, using LV's for the disks
and mapping them into virtual machines under KVM.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170424/414e0de0/attachment.htm>

From jonathan at buzzard.me.uk  Mon Apr 24 14:04:26 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Mon, 24 Apr 2017 14:04:26 +0100
Subject: [gpfsug-discuss] Used virtualization technologies for
 GPFS/Spectrum Scale
In-Reply-To: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
References: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
Message-ID: <1493039066.11896.30.camel@buzzard.me.uk>

On Mon, 2017-04-24 at 14:21 +0200, service at metamodul.com wrote:
> Hi Jonathan
> todays hardware is so powerful that imho it might make sense to split
> a CEC into more "piece". For example the IBM S822L has up to 2x12
> cores, 9 PCI3 slots ( 4?16 lans & 5?8 lan ).
> I think that such a server is a little bit to big  just to be a single
> NSD server.

So don't buy it for an NSD server then :-)

> Note that i use for each GPFS service a dedicated node.
> So if i would go for 4 NSD server, 6 protocol nodes and 2 tsm backup
> nodes and at least 3 test server a total of 11 server is needed.
> Inhm 4xS822L could handle this and a little bit more quite well.
> 

I think you are missing the point somewhat. Well by several country
miles and quite possibly an ocean or two to be honest.

Spectrum scale is supposed to be a "scale out" solution. More storage
required add more arrays. More bandwidth add more servers etc.

If you are just going to scale it all up on a *single* server then you
might as well forget GPFS and do an old school standard scale up
solution.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From janfrode at tanso.net  Mon Apr 24 14:14:20 2017
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Mon, 24 Apr 2017 15:14:20 +0200
Subject: [gpfsug-discuss] Used virtualization technologies for
 GPFS/Spectrum Scale
In-Reply-To: <OF05A1DFB2.8C25F8AA-ONC225810C.00455199-C225810C.0045D7F9@notes.na.collabserv.com>
References: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
	<OF05A1DFB2.8C25F8AA-ONC225810C.00455199-C225810C.0045D7F9@notes.na.collabserv.com>
Message-ID: <CAHwPatj7od7Jvx9+GGtJWB4jwbKxdBvpCBy=6B86Dwwmc1ErBA@mail.gmail.com>

I agree with Luis -- why so many nodes?

"""
So if i would go for 4 NSD server, 6 protocol nodes and 2 tsm backup nodes
and at least 3 test server a total of 11 server is needed.
"""

If this is your whole cluster, why not just 3x P822L/P812L running single
partition per node, hosting a cluster of 3x protocol-nodes that does both
direct FC for disk access, and also run backups on same nodes ? No
complications, full hw performance. Then separate node for test, or
separate partition on same nodes with dedicated adapters.

But back to your original question.  My experience is that LPAR/NPIV works
great, but it's a bit annoying having to also have VIOs. Hope we'll get FC
SR-IOV eventually.. Also LPAR/Dedicated-adapters naturally works fine.

VMWare/RDM can be a challenge in some failure situations. It likes to pause
VMs in APD or PDL situations, which will affect all VMs with access to it
:-o

VMs without direct disk access is trivial.


  -jf


On Mon, Apr 24, 2017 at 2:42 PM, Luis Bolinches <luis.bolinches at fi.ibm.com>
wrote:

> Hi
>
> As tastes vary, I would not partition it so much for the backend. Assuming
> there is little to nothing overhead on the CPU at PHYP level, which it
> depends. On the protocols nodes, due the CTDB keeping locks together across
> all nodes (SMB), you would get more performance on bigger & less number of
> CES nodes than more and smaller.
>
> Certainly a 822 is quite a server if we go back to previous generations
> but I would still keep a simple backend (NSd servers), simple CES (less
> number of nodes the merrier) & then on the client part go as micro
> partitions as you like/can as the effect on the cluster is less relevant in
> the case of resources starvation.
>
> But, it depends on workloads, SLA and money so I say try, establish a
> baseline and it fills the requirements, go for it. If not change till does.
> Have fun
>
>
>
> From:        "service at metamodul.com" <service at metamodul.com>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        24/04/2017 15:21
> Subject:        Re: [gpfsug-discuss] Used virtualization technologies for
> GPFS/Spectrum Scale
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Hi Jonathan
> todays hardware is so powerful that imho it might make sense to split a
> CEC into more "piece". For example the IBM S822L has up to 2x12 cores, 9
> PCI3 slots ( 4?16 lans & 5?8 lan ).
> I think that such a server is a little bit to big  just to be a single NSD
> server.
> Note that i use for each GPFS service a dedicated node.
> So if i would go for 4 NSD server, 6 protocol nodes and 2 tsm backup nodes
> and at least 3 test server a total of 11 server is needed.
> Inhm 4xS822L could handle this and a little bit more quite well.
>
> Of course blade technology could be used or 1U server.
>
> With kind regards
> Hajo
>
> --
> Unix Systems Engineer
> MetaModul GmbH
> +49 177 4393994 <+49%20177%204393994>
>
>
> -------- Urspr?ngliche Nachricht --------
> Von: Jonathan Buzzard
> Datum:2017.04.24 13:14 (GMT+01:00)
> An: gpfsug main discussion list
> Betreff: Re: [gpfsug-discuss] Used virtualization technologies for
> GPFS/Spectrum Scale
>
> On Mon, 2017-04-24 at 12:28 +0200, Hans-Joachim Ehlers wrote:
> > @All
> >
> >
> > does anybody uses virtualization technologies for GPFS Server ? If yes
> > what kind and why have you selected your soulution.
> >
> > I think currently about using Linux on Power using 40G SR-IOV for
> > Network and NPIV/Dedidcated FC Adater for storage. As a plus i can
> > also assign only a certain amount of CPUs to GPFS. ( Lower license
> > cost / You pay for what you use)
> >
> >
> > I must admit that i am not familar how "good" KVM/ESX in respect to
> > direct assignment of hardware is. Thus the question to the group
> >
>
> For the most part GPFS is used at scale and in general all the
> components are redundant. As such why you would want to allocate less
> than a whole server into a production GPFS system in somewhat beyond me.
>
> That is you will have a bunch of NSD servers in the system and if one
> crashes, well the other NSD's take over. Similar for protocol nodes, and
> in general the total file system size is going to hundreds of TB
> otherwise why bother with GPFS.
>
> I guess there is currently potential value at sticking the GUI into a
> virtual machine to get redundancy.
>
> On the other hand if you want a test rig, then virtualization works
> wonders. I have put GPFS on a single Linux box, using LV's for the disks
> and mapping them into virtual machines under KVM.
>
> JAB.
>
> --
> Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
> Fife, United Kingdom.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______
> ________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
> Oy IBM Finland Ab
> PL 265, 00101 Helsinki, Finland
> Business ID, Y-tunnus: 0195876-3
> Registered in Finland
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170424/5de98677/attachment.htm>

From valdis.kletnieks at vt.edu  Mon Apr 24 16:29:56 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Mon, 24 Apr 2017 11:29:56 -0400
Subject: [gpfsug-discuss] Used virtualization technologies for
	GPFS/Spectrum Scale
In-Reply-To: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
References: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
Message-ID: <131241.1493047796@turing-police.cc.vt.edu>

On Mon, 24 Apr 2017 14:21:09 +0200, "service at metamodul.com" said:

> todays hardware is so powerful that imho it might make sense to split a CEC
> into more "piece". For example the IBM S822L has up to 2x12 cores, 9 PCI3 slots
> ( 4?16 lans & 5?8 lan ).

We look at it the other way around: Today's hardware is so powerful that
you can build a cluster out of a stack of fairly low-end 1U servers (we
have one cluster that's built out of Dell r630s). And it's more robust
against hardware failures than a VM based solution - although the 822 seems
to allow hot-swap of PCI cards, a dead socket or DIMM will still kill all
the VMs when you go to replace it.  If one 1U out of 4 goes down due to
a bad DIMM (which has happened to us more often than a bad PCI card) you
can just power it down and replace it....
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170424/8050a19d/attachment.sig>

From service at metamodul.com  Mon Apr 24 17:11:25 2017
From: service at metamodul.com (Hans-Joachim Ehlers)
Date: Mon, 24 Apr 2017 18:11:25 +0200 (CEST)
Subject: [gpfsug-discuss] Used virtualization technologies for
 GPFS/Spectrum Scale
In-Reply-To: <CAHwPatj7od7Jvx9+GGtJWB4jwbKxdBvpCBy=6B86Dwwmc1ErBA@mail.gmail.com>
References: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
	<OF05A1DFB2.8C25F8AA-ONC225810C.00455199-C225810C.0045D7F9@notes.na.collabserv.com>
	<CAHwPatj7od7Jvx9+GGtJWB4jwbKxdBvpCBy=6B86Dwwmc1ErBA@mail.gmail.com>
Message-ID: <1961501377.286669.1493050285874@email.1und1.de>


> Jan-Frode Myklebust <janfrode at tanso.net> hat am 24. April 2017 um 15:14 geschrieben:
> I agree with Luis -- why so many nodes?

Many ? IMHO it is not that much.
I do not like to have one server doing more than one task.
Thus a NSD Server does only serves GPFS. A Protocol server serves either NFS or SMB but not both except IBM says it would be better to run NFS/SMB on the same node.
A backup server runs also on its "own" hardware.

So i would need at least 4 NSD Server since if 1 fails i am losing only 25% of my "performance" and still having a 4/5 quorum. Nice in case an Update of a NSD failed. 
Each protocol service requires at least 2 nodes and the backup service as well.

I can only say that with that approach i never had problems. I have be running into problems each time i did not followed that apporach. But of course YMMV
But keep in mind that each service might requires different GPFS configuration or even slightly different hardware.

Saying so i am a fan of having many GPFS Server ( NSD, Protocol , Backup a.s.o ) and i do not understand why not to use many nodes ^_^

Cheers Hajo


From jonathan at buzzard.me.uk  Mon Apr 24 17:24:29 2017
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Mon, 24 Apr 2017 17:24:29 +0100
Subject: [gpfsug-discuss] Used virtualization technologies for
 GPFS/Spectrum Scale
In-Reply-To: <131241.1493047796@turing-police.cc.vt.edu>
References: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
	<131241.1493047796@turing-police.cc.vt.edu>
Message-ID: <1493051069.11896.39.camel@buzzard.me.uk>

On Mon, 2017-04-24 at 11:29 -0400, valdis.kletnieks at vt.edu wrote:
> On Mon, 24 Apr 2017 14:21:09 +0200, "service at metamodul.com" said:
> 
> > todays hardware is so powerful that imho it might make sense to split a CEC
> > into more "piece". For example the IBM S822L has up to 2x12 cores, 9 PCI3 slots
> > ( 4?16 lans & 5?8 lan ).
> 
> We look at it the other way around: Today's hardware is so powerful that
> you can build a cluster out of a stack of fairly low-end 1U servers (we
> have one cluster that's built out of Dell r630s). And it's more robust
> against hardware failures than a VM based solution - although the 822 seems
> to allow hot-swap of PCI cards, a dead socket or DIMM will still kill all
> the VMs when you go to replace it.  If one 1U out of 4 goes down due to
> a bad DIMM (which has happened to us more often than a bad PCI card) you
> can just power it down and replace it....

Hate to say but the 822 will happily keep trucking when the CPU
(assuming it has more than one) fails and similar with the DIMM's. In
fact mirrored DIMM's is reasonably common on x86 machines these days,
though very few people ever use it.

That said CPU failures are incredibly rare in my experience. The only
time I have ever come across a failed CPU was on a pSeries machine and
then it was only because the backup was running really slow (it was
running TSM) that prompted us to look closer and see what had happened.
Monitoring (Zenoss) was not setup to register the event because like
when does a CPU fail and the machine keep running!

I am not 100% sure on the 822 put I suspect that the DIMM's and any
socketed CPU's can be hot swapped in addition to the PCI card's which I
have personally done on pSeries machines.

However it is a stupidly over priced solution to run GPFS, because there
are better or at the very least vastly cheaper ways to get the same
level of reliability.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From valdis.kletnieks at vt.edu  Mon Apr 24 18:58:17 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Mon, 24 Apr 2017 13:58:17 -0400
Subject: [gpfsug-discuss] Used virtualization technologies for
	GPFS/Spectrum Scale
In-Reply-To: <1493051069.11896.39.camel@buzzard.me.uk>
References: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>
	<131241.1493047796@turing-police.cc.vt.edu>
	<1493051069.11896.39.camel@buzzard.me.uk>
Message-ID: <7337.1493056697@turing-police.cc.vt.edu>

On Mon, 24 Apr 2017 17:24:29 +0100, Jonathan Buzzard said:

> Hate to say but the 822 will happily keep trucking when the CPU
> (assuming it has more than one) fails and similar with the DIMM's. In

How about when you go to replace the DIMM? You able to hot-swap the memory
without anything losing its mind? (I know this is possible in the Z/series
world, but those usually have at least 2-3 more zeros in the price tag).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170424/aab9bcd4/attachment.sig>

From luis.bolinches at fi.ibm.com  Mon Apr 24 19:08:32 2017
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Mon, 24 Apr 2017 21:08:32 +0300
Subject: [gpfsug-discuss] Used virtualization technologies
 for	GPFS/Spectrum Scale
In-Reply-To: <7337.1493056697@turing-police.cc.vt.edu>
References: <oqdijyvhtchr1ypxe3uy8hfo.1493035628415@email.android.com>	<131241.1493047796@turing-police.cc.vt.edu>
	<1493051069.11896.39.camel@buzzard.me.uk>
	<7337.1493056697@turing-police.cc.vt.edu>
Message-ID: <OF06729DEA.C7D8CCDB-ONC225810C.0062E0C5-C225810C.0063A93A@notes.na.collabserv.com>

Hi

822 is an entry scale out Power machine, it has limited RAS compared with 
the high end ones (870/880). The 822 needs to be down for CPU / DIMM 
replacement: 
https://www.ibm.com/support/knowledgecenter/5148-21L/p8eg3/p8eg3_83x_8rx_kickoff.htm 
. And it is not a end user task. You can argue that, I owuld but it is the 
current statement and you pay for support for these kind of stuff.


From:   valdis.kletnieks at vt.edu
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   24/04/2017 20:58
Subject:        Re: [gpfsug-discuss] Used virtualization technologies for 
GPFS/Spectrum Scale
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


On Mon, 24 Apr 2017 17:24:29 +0100, Jonathan Buzzard said:

> Hate to say but the 822 will happily keep trucking when the CPU
> (assuming it has more than one) fails and similar with the DIMM's. In

How about when you go to replace the DIMM? You able to hot-swap the memory
without anything losing its mind? (I know this is possible in the Z/series
world, but those usually have at least 2-3 more zeros in the price tag).
[attachment "attqolcz.dat" deleted by Luis Bolinches/Finland/IBM] 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Ellei edell? ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170424/099b9270/attachment.htm>

From frank.tower at outlook.com  Mon Apr 24 22:12:14 2017
From: frank.tower at outlook.com (Frank Tower)
Date: Mon, 24 Apr 2017 21:12:14 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
In-Reply-To: <AMSPR06MB40557B7DF5616CF316687EADF1F0@AMSPR06MB405.eurprd06.prod.outlook.com>
References: <AM5PR10MB1650636B630E3FFA6C861DEDF91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
	<CAHwPatihL+8_zA-BbR2tZo7Ems=qBX=3z0PyaAiCbeTp14Zg4g@mail.gmail.com>
	<AM5PR10MB165056A3C5EDDE64EF1C9179F91D0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>,
	<CAHwPath+u-H1Q_qXEuJZ3e-iEUmZ2U-5oPkiAqqi5X5GcCcQ_g@mail.gmail.com>
	<AM5PR10MB16509861650EE451C1D8437CF91C0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>,
	<AMSPR06MB40557B7DF5616CF316687EADF1F0@AMSPR06MB405.eurprd06.prod.outlook.com>
Message-ID: <AM5PR10MB16504E814FB8AD656832A9C7F91F0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>


>From what I've read from the Wiki:

'The NFS protocol performance is largely dependent on the base system performance of the protocol node hardware and network. This includes multiple factors the type and number of CPUs, the size of the main memory in the nodes, the type of disk drives used (HDD, SSD, etc.) and the disk configuration (RAID-level, replication etc.). In addition, NFS protocol performance can be impacted by the overall load of the node (such as number of clients accessing, snapshot creation/deletion and more) and administrative tasks (for example filesystem checks or online re-striping of disk arrays).'

Nowadays, SSD is worst to invest.
LROC could be an option in the future, but we need to quantify NFS/CIFS workload first.

Are you using LROC with your GPFS installation ?

Best,
Frank.
________________________________
From: Sobey, Richard A <r.sobey at imperial.ac.uk>
Sent: Monday, April 24, 2017 11:11 AM
To: gpfsug main discussion list; Jan-Frode Myklebust
Subject: Re: [gpfsug-discuss] Protocol node recommendations

What?s your SSD going to help with? will you implement it as a LROC device? Otherwise I can?t see the benefit to using it to boot off.

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Frank Tower
Sent: 23 April 2017 22:28
To: Jan-Frode Myklebust <janfrode at tanso.net>; gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Protocol node recommendations


Hi,


Nice ! didn't pay attention at the revision and the spreadsheet. If someone still have a copy somewhere it could be useful, Google didn't help :(


We will follow your advise and start with 3 protocol nodes equipped with 128GB memory, 2 x 12 cores  (maybe E5-2680 or E5-2670).


>From what I read, NFS-Ganesha mainly depend of the hardware, Linux on a SSD should be a big plus in our case.


Best,

Frank


________________________________
From: Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>>
Sent: Sunday, April 23, 2017 12:07:38 PM
To: Frank Tower; gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Protocol node recommendations

The protocol sizing tool should be available from https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node/version/70a4c7c0-a5c6-4dde-b391-8f91c542dd7d , but I'm getting 404 now.

I think 128GB should be enough for both protocols on same nodes, and I think your 3 node suggestion is best. Better load sharing with not dedicating subset of nodes to each protocol.


-jf
l?r. 22. apr. 2017 kl. 21.22 skrev Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>>:

Hi,


Thank for the recommendations.

Now we deal with the situation of:


- take 3 nodes with round robin DNS that handle both protocols

- take 4 nodes, split CIFS and NFS, still use round robin DNS for CIFS and NFS services.


Regarding your recommendations, 256GB memory node could be a plus if we mix both protocols for such case.


Is the spreadsheet publicly available or do we need to ask IBM ?


Thank for your help,

Frank.

________________________________
From: Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>>
Sent: Saturday, April 22, 2017 10:50 AM
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Protocol node recommendations

That's a tiny maxFilesToCache...

I would start by implementing the settings from /usr/lpp/mmfs/*/gpfsprotocolldefaul* plus a 64GB pagepool for your protocoll nodes, and leave further tuning to when you see you have issues.

Regarding sizing, we have a spreadsheet somewhere where you can input some workload parameters and get an idea for how many nodes you'll need. Your node config seems fine, but one node seems too few to serve 1000+ users. We support max 3000 SMB connections/node, and I believe the recommendation is 4000 NFS connections/node.


-jf
l?r. 22. apr. 2017 kl. 08.34 skrev Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>>:
Hi,

We have here around 2PB GPFS (4.2.2) accessed through an HPC cluster with GPFS client on each node.

We will have to open GPFS to all our users over CIFS and kerberized NFS with ACL support for both protocol for around +1000 users

All users have different use case and needs:
- some will do random I/O through a large set of opened files (~5k files)
- some will do large write with 500GB-1TB files
- other will arrange sequential I/O with ~10k opened files

NFS and CIFS will share the same server, so I through to use SSD drive, at least 128GB memory with 2 sockets.

Regarding tuning parameters, I thought at:

maxFilesToCache 10000
syncIntervalStrict yes
workerThreads (8*core)
prefetchPct 40 (for now and update if needed)

I read the wiki 'Sizing Guidance for Protocol Node', but I was wondering if someone could share his experience/best practice regarding hardware sizing and/or tuning parameters.

Thank by advance,
Frank
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170424/e0111cce/attachment.htm>

From r.sobey at imperial.ac.uk  Tue Apr 25 09:19:10 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Tue, 25 Apr 2017 08:19:10 +0000
Subject: [gpfsug-discuss] Protocol node recommendations
In-Reply-To: <AM5PR10MB16504E814FB8AD656832A9C7F91F0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
References: <AM5PR10MB1650636B630E3FFA6C861DEDF91B0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
	<CAHwPatihL+8_zA-BbR2tZo7Ems=qBX=3z0PyaAiCbeTp14Zg4g@mail.gmail.com>
	<AM5PR10MB165056A3C5EDDE64EF1C9179F91D0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>,
	<CAHwPath+u-H1Q_qXEuJZ3e-iEUmZ2U-5oPkiAqqi5X5GcCcQ_g@mail.gmail.com>
	<AM5PR10MB16509861650EE451C1D8437CF91C0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>,
	<AMSPR06MB40557B7DF5616CF316687EADF1F0@AMSPR06MB405.eurprd06.prod.outlook.com>
	<AM5PR10MB16504E814FB8AD656832A9C7F91F0@AM5PR10MB1650.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <AMSPR06MB405A5F07B08801C9BD0C309DF1E0@AMSPR06MB405.eurprd06.prod.outlook.com>

I tried it on one node but investing in what could be up to ?5000 in SSDs when we don't know the gains isn't something I can argue. Not that LROC will hurt the environment but my users may not see any benefit. My cluster is the complete opposite of busy (relative to people saying they're seeing sustained 800MB/sec throughput), I just need it stable.

Richard
From: Frank Tower [mailto:frank.tower at outlook.com]
Sent: 24 April 2017 22:12
To: Sobey, Richard A <r.sobey at imperial.ac.uk>; gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>; Jan-Frode Myklebust <janfrode at tanso.net>
Subject: Re: [gpfsug-discuss] Protocol node recommendations


>From what I've read from the Wiki:

'The NFS protocol performance is largely dependent on the base system performance of the protocol node hardware and network. This includes multiple factors the type and number of CPUs, the size of the main memory in the nodes, the type of disk drives used (HDD, SSD, etc.) and the disk configuration (RAID-level, replication etc.). In addition, NFS protocol performance can be impacted by the overall load of the node (such as number of clients accessing, snapshot creation/deletion and more) and administrative tasks (for example filesystem checks or online re-striping of disk arrays).'


Nowadays, SSD is worst to invest.
LROC could be an option in the future, but we need to quantify NFS/CIFS workload first.

Are you using LROC with your GPFS installation ?

Best,
Frank.
________________________________
From: Sobey, Richard A <r.sobey at imperial.ac.uk<mailto:r.sobey at imperial.ac.uk>>
Sent: Monday, April 24, 2017 11:11 AM
To: gpfsug main discussion list; Jan-Frode Myklebust
Subject: Re: [gpfsug-discuss] Protocol node recommendations

What's your SSD going to help with... will you implement it as a LROC device? Otherwise I can't see the benefit to using it to boot off.

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Frank Tower
Sent: 23 April 2017 22:28
To: Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>>; gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Protocol node recommendations


Hi,


Nice ! didn't pay attention at the revision and the spreadsheet. If someone still have a copy somewhere it could be useful, Google didn't help :(


We will follow your advise and start with 3 protocol nodes equipped with 128GB memory, 2 x 12 cores  (maybe E5-2680 or E5-2670).


>From what I read, NFS-Ganesha mainly depend of the hardware, Linux on a SSD should be a big plus in our case.


Best,

Frank


________________________________
From: Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>>
Sent: Sunday, April 23, 2017 12:07:38 PM
To: Frank Tower; gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Protocol node recommendations

The protocol sizing tool should be available from https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node/version/70a4c7c0-a5c6-4dde-b391-8f91c542dd7d , but I'm getting 404 now.

I think 128GB should be enough for both protocols on same nodes, and I think your 3 node suggestion is best. Better load sharing with not dedicating subset of nodes to each protocol.


-jf
l?r. 22. apr. 2017 kl. 21.22 skrev Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>>:

Hi,


Thank for the recommendations.

Now we deal with the situation of:


- take 3 nodes with round robin DNS that handle both protocols

- take 4 nodes, split CIFS and NFS, still use round robin DNS for CIFS and NFS services.


Regarding your recommendations, 256GB memory node could be a plus if we mix both protocols for such case.


Is the spreadsheet publicly available or do we need to ask IBM ?


Thank for your help,

Frank.

________________________________
From: Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>>
Sent: Saturday, April 22, 2017 10:50 AM
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Protocol node recommendations

That's a tiny maxFilesToCache...

I would start by implementing the settings from /usr/lpp/mmfs/*/gpfsprotocolldefaul* plus a 64GB pagepool for your protocoll nodes, and leave further tuning to when you see you have issues.

Regarding sizing, we have a spreadsheet somewhere where you can input some workload parameters and get an idea for how many nodes you'll need. Your node config seems fine, but one node seems too few to serve 1000+ users. We support max 3000 SMB connections/node, and I believe the recommendation is 4000 NFS connections/node.


-jf
l?r. 22. apr. 2017 kl. 08.34 skrev Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>>:
Hi,

We have here around 2PB GPFS (4.2.2) accessed through an HPC cluster with GPFS client on each node.

We will have to open GPFS to all our users over CIFS and kerberized NFS with ACL support for both protocol for around +1000 users

All users have different use case and needs:
- some will do random I/O through a large set of opened files (~5k files)
- some will do large write with 500GB-1TB files
- other will arrange sequential I/O with ~10k opened files

NFS and CIFS will share the same server, so I through to use SSD drive, at least 128GB memory with 2 sockets.

Regarding tuning parameters, I thought at:

maxFilesToCache 10000
syncIntervalStrict yes
workerThreads (8*core)
prefetchPct 40 (for now and update if needed)

I read the wiki 'Sizing Guidance for Protocol Node', but I was wondering if someone could share his experience/best practice regarding hardware sizing and/or tuning parameters.

Thank by advance,
Frank
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170425/c33b5b9c/attachment.htm>

From chair at spectrumscale.org  Tue Apr 25 09:23:32 2017
From: chair at spectrumscale.org (Spectrum Scale UG Chair (Simon Thompson))
Date: Tue, 25 Apr 2017 09:23:32 +0100
Subject: [gpfsug-discuss] User group meeting May 9th/10th 2017
Message-ID: <D524C614.3AA2D%chair@spectrumscale.org>

The UK user group is now just 2 weeks away!

Its time to register ...

https://www.eventbrite.com/e/spectrum-scalegpfs-user-group-spring-2017-regi
stration-32113696932


(or https://goo.gl/tRptru)

Remember user group meetings are free to attend, and this year's 2 day
meeting is packed full of sessions and several of the breakout sessions
are cloud-focussed looking at how Spectrum Scale can be used with cloud
deployments.

And as usual, we have the ever popular Sven speaking with his views from
the Research topics.

Thanks to our sponsors Arcastream, DDN, Ellexus, Lenovo, IBM, Mellanox,
OCF and Seagate for helping make this happen!


We need to finalise numbers for the evening event soon, so make sure you
book your place now!

Simon


From S.J.Thompson at bham.ac.uk  Tue Apr 25 12:20:39 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Tue, 25 Apr 2017 11:20:39 +0000
Subject: [gpfsug-discuss] NFS issues
Message-ID: <D524EF96.3AA6C%s.j.thompson@bham.ac.uk>

Hi,

We have recently started deploying NFS in addition our existing SMB
exports on our protocol nodes.

We use a RR DNS name that points to 4 VIPs for SMB services and failover
seems to work fine with SMB clients. We figured we could use the same name
and IPs and run Ganesha on the protocol servers, however we are seeing
issues with NFS clients when IP failover occurs.

In normal operation on a client, we might see several mounts from
different IPs obviously due to the way the DNS RR is working, but it all
works fine.

In a failover situation, the IP will move to another node and some clients
will carry on, others will hang IO to the mount points referred to by the
IP which has moved. We can *sometimes* trigger this by manually suspending
a CES node, but not always and some clients mounting from the IP moving
will be fine, others won't.

If we resume a node an it fails back, the clients that are hanging will
usually recover fine. We can reboot a client prior to failback and it will
be fine, stopping and starting the ganesha service on a protocol node will
also sometimes resolve the issues.

So, has anyone seen this sort of issue and any suggestions for how we
could either debug more or workaround?

We are currently running the packages
nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).

At one point we were seeing it a lot, and could track it back to an
underlying GPFS network issue that was causing protocol nodes to be
expelled occasionally, we resolved that and the issues became less
apparent, but maybe we just fixed one failure mode so see it less often.

On the clients, we use -o sync,hard BTW as in the IBM docs.

On a client showing the issues, we'll see in dmesg, NFS related messages
like:
[Wed Apr 12 16:59:53 2017] nfs: server MYNFSSERVER.bham.ac.uk not
responding, timed out

Which explains the client hang on certain mount points.

The symptoms feel very much like those logged in this Gluster/ganesha bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1354439


Thanks

Simon


From Mark.Bush at siriuscom.com  Tue Apr 25 14:27:38 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Tue, 25 Apr 2017 13:27:38 +0000
Subject: [gpfsug-discuss] Perfmon and GUI
Message-ID: <321F04D4-5F3A-443F-A598-0616642C9F96@siriuscom.com>

Anyone know why in the GUI when I go to look at things like nodes and select a protocol node and then pick NFS or SMB why it has the boxes where a graph is supposed to be and it has a Red circled X and says ?Performance collector did not return any data??
I?ve added the things from the link into my protocol Nodes /opt/IBM/zimon/ZIMonSensors.cfg file https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_configuringthePMT.htm

Also restarted both pmsensors and pmcollector on the nodes.  What am I missing?  Here?s my ZIMonSensors.cfg file

[root at n3 zimon]# cat ZIMonSensors.cfg
cephMon = "/opt/IBM/zimon/CephMonProxy"
cephRados = "/opt/IBM/zimon/CephRadosProxy"
colCandidates = "n1"
colRedundancy = 1
collectors = {
        host = "n1"
        port = "4739"
}
config = "/opt/IBM/zimon/ZIMonSensors.cfg"
ctdbstat = ""
daemonize = T
hostname = ""
ipfixinterface = "0.0.0.0"
logfile = "/var/log/zimon/ZIMonSensors.log"
loglevel = "info"
mmcmd = "/opt/IBM/zimon/MMCmdProxy"
mmdfcmd = "/opt/IBM/zimon/MMDFProxy"
mmpmon = "/opt/IBM/zimon/MmpmonSockProxy"
piddir = "/var/run"
release = "4.2.3-0"
sensors = {
        name = "CPU"
        period = 1
},
{
        name = "Load"
        period = 1
},
{
        name = "Memory"
        period = 1
},
{
        name = "Network"
        period = 1
},
{
        name = "Netstat"
        period = 10
},
{
        name = "Diskstat"
        period = 0
},
{
        name = "DiskFree"
        period = 600
},
{
        name = "GPFSDisk"
        period = 0
},
{
        name = "GPFSFilesystem"
        period = 1
},
{
        name = "GPFSNSDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSPoolIO"
        period = 0
},
{
        name = "GPFSVFS"
        period = 1
},
{
        name = "GPFSIOC"
        period = 0
},
{
        name = "GPFSVIO"
        period = 0
},
{
        name = "GPFSPDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSvFLUSH"
        period = 0
},
{
        name = "GPFSNode"
        period = 1
},
{
        name = "GPFSNodeAPI"
        period = 1
},
{
        name = "GPFSFilesystemAPI"
        period = 1
},
{
        name = "GPFSLROC"
        period = 0
},
{
        name = "GPFSCHMS"
        period = 0
},
{
        name = "GPFSAFM"
        period = 0
},
{
        name = "GPFSAFMFS"
        period = 0
},
{
        name = "GPFSAFMFSET"
        period = 0
},
{
        name = "GPFSRPCS"
        period = 10
},
{
        name = "GPFSWaiters"
        period = 10
},
{
        name = "GPFSFilesetQuota"
        period = 3600
},
{
        name = "GPFSDiskCap"
        period = 0
},
{
        name = "GPFSFileset"
        period = 0
        restrict = "n1"
},
{
        name = "GPFSPool"
        period = 0
        restrict = "n1"
},
{
        name = "Infiniband"
        period = 0
},
{
        name = "CTDBDBStats"
        period = 1
        type = "Generic"
},
{
        name = "CTDBStats"
        period = 1
        type = "Generic"
},
{
        name = "NFSIO"
        period = 1
        type = "Generic"
},
{
        name = "SMBGlobalStats"
        period = 1
        type = "Generic"
},
{
        name = "SMBStats"
        period = 1
        type = "Generic"
}
smbstat = ""


This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170425/1e7cace8/attachment.htm>

From r.sobey at imperial.ac.uk  Tue Apr 25 14:44:59 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Tue, 25 Apr 2017 13:44:59 +0000
Subject: [gpfsug-discuss] Perfmon and GUI
In-Reply-To: <321F04D4-5F3A-443F-A598-0616642C9F96@siriuscom.com>
References: <321F04D4-5F3A-443F-A598-0616642C9F96@siriuscom.com>
Message-ID: <AMSPR06MB4055CF6383068C7AE977FCEDF1E0@AMSPR06MB405.eurprd06.prod.outlook.com>

I would have thought this would be fixed by now as this happened to me in 4.2.1-(0?) ? here?s what support said. Can you try? I think you?ve already got the relevant bits in your .cfg files so it should just be a case of copying the files across and restarting pmsensors and pmcollector.

Again bear in mind this affected me on 4.2.1 and you?re using 4.2.3 so ymmv..

?
I spoke with development and normally these files would be copied over
to /opt/IBM/zimon when using the automatic installer but since this case
doesn't use the installer we have to copy them over manually. We
acknowledge this should be in the docs, and the reason it is not
included in pmsensors rpm is due to the fact these do not come from the
zimon team.

The following files can be copied over to /opt/IBM/zimon

[root at node1 default]# pwd
/usr/lpp/mmfs/4.2.1.0/installer/cookbooks/zimon_on_gpfs/files/default

[root at node1 default]# ls
CTDBDBStats.cfg  CTDBStats.cfg  NFSIO.cfg  SMBGlobalStats.cfg
SMBSensors.cfg  SMBStats.cfg  ZIMonCollector.cfg
?

Richard

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush
Sent: 25 April 2017 14:28
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Perfmon and GUI

Anyone know why in the GUI when I go to look at things like nodes and select a protocol node and then pick NFS or SMB why it has the boxes where a graph is supposed to be and it has a Red circled X and says ?Performance collector did not return any data??
I?ve added the things from the link into my protocol Nodes /opt/IBM/zimon/ZIMonSensors.cfg file https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_configuringthePMT.htm

Also restarted both pmsensors and pmcollector on the nodes.  What am I missing?  Here?s my ZIMonSensors.cfg file

[root at n3 zimon]# cat ZIMonSensors.cfg
cephMon = "/opt/IBM/zimon/CephMonProxy"
cephRados = "/opt/IBM/zimon/CephRadosProxy"
colCandidates = "n1"
colRedundancy = 1
collectors = {
        host = "n1"
        port = "4739"
}
config = "/opt/IBM/zimon/ZIMonSensors.cfg"
ctdbstat = ""
daemonize = T
hostname = ""
ipfixinterface = "0.0.0.0"
logfile = "/var/log/zimon/ZIMonSensors.log"
loglevel = "info"
mmcmd = "/opt/IBM/zimon/MMCmdProxy"
mmdfcmd = "/opt/IBM/zimon/MMDFProxy"
mmpmon = "/opt/IBM/zimon/MmpmonSockProxy"
piddir = "/var/run"
release = "4.2.3-0"
sensors = {
        name = "CPU"
        period = 1
},
{
        name = "Load"
        period = 1
},
{
        name = "Memory"
        period = 1
},
{
        name = "Network"
        period = 1
},
{
        name = "Netstat"
        period = 10
},
{
        name = "Diskstat"
        period = 0
},
{
        name = "DiskFree"
        period = 600
},
{
        name = "GPFSDisk"
        period = 0
},
{
        name = "GPFSFilesystem"
        period = 1
},
{
        name = "GPFSNSDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSPoolIO"
        period = 0
},
{
        name = "GPFSVFS"
        period = 1
},
{
        name = "GPFSIOC"
        period = 0
},
{
        name = "GPFSVIO"
        period = 0
},
{
        name = "GPFSPDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSvFLUSH"
        period = 0
},
{
        name = "GPFSNode"
        period = 1
},
{
        name = "GPFSNodeAPI"
        period = 1
},
{
        name = "GPFSFilesystemAPI"
        period = 1
},
{
        name = "GPFSLROC"
        period = 0
},
{
        name = "GPFSCHMS"
        period = 0
},
{
        name = "GPFSAFM"
        period = 0
},
{
        name = "GPFSAFMFS"
        period = 0
},
{
        name = "GPFSAFMFSET"
        period = 0
},
{
        name = "GPFSRPCS"
        period = 10
},
{
        name = "GPFSWaiters"
        period = 10
},
{
        name = "GPFSFilesetQuota"
        period = 3600
},
{
        name = "GPFSDiskCap"
        period = 0
},
{
        name = "GPFSFileset"
        period = 0
        restrict = "n1"
},
{
        name = "GPFSPool"
        period = 0
        restrict = "n1"
},
{
        name = "Infiniband"
        period = 0
},
{
        name = "CTDBDBStats"
        period = 1
        type = "Generic"
},
{
        name = "CTDBStats"
        period = 1
        type = "Generic"
},
{
        name = "NFSIO"
        period = 1
        type = "Generic"
},
{
        name = "SMBGlobalStats"
        period = 1
        type = "Generic"
},
{
        name = "SMBStats"
        period = 1
        type = "Generic"
}
smbstat = ""


This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170425/355231b2/attachment.htm>

From j.ouwehand at vumc.nl  Tue Apr 25 14:51:22 2017
From: j.ouwehand at vumc.nl (Ouwehand, JJ)
Date: Tue, 25 Apr 2017 13:51:22 +0000
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <D524EF96.3AA6C%s.j.thompson@bham.ac.uk>
References: <D524EF96.3AA6C%s.j.thompson@bham.ac.uk>
Message-ID: <5594921EA5B3674AB44AD9276126AAF40170DD3159@sp-mx-mbx42>

Hello,

At first a short introduction. My name is Jaap Jan Ouwehand, I work at a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our critical (office, research and clinical data) business process. We have three large GPFS filesystems for different purposes.

We also had such a situation with cNFS. A failover (IPtakeover) was technically good, only clients experienced "stale filehandles". We opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few months later, the solution appeared to be in the fsid option.

An NFS filehandle is built by a combination of fsid and a hash function on the inode. After a failover, the fsid value can be different and the client has a "stale filehandle". To avoid this, the fsid value can be statically specified. See:

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_nfslin.htm

Maybe there is also a value in Ganesha that changes after a failover. Certainly since most sessions will be re-established after a failback. Maybe you see more debug information with tcpdump.


Kind regards,
?
Jaap Jan Ouwehand
ICT Specialist (Storage & Linux)
VUmc - ICT
E: jj.ouwehand at vumc.nl
W: www.vumc.com


-----Oorspronkelijk bericht-----
Van: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Namens Simon Thompson (IT Research Support)
Verzonden: dinsdag 25 april 2017 13:21
Aan: gpfsug-discuss at spectrumscale.org
Onderwerp: [gpfsug-discuss] NFS issues

Hi,

We have recently started deploying NFS in addition our existing SMB exports on our protocol nodes.

We use a RR DNS name that points to 4 VIPs for SMB services and failover seems to work fine with SMB clients. We figured we could use the same name and IPs and run Ganesha on the protocol servers, however we are seeing issues with NFS clients when IP failover occurs.

In normal operation on a client, we might see several mounts from different IPs obviously due to the way the DNS RR is working, but it all works fine.

In a failover situation, the IP will move to another node and some clients will carry on, others will hang IO to the mount points referred to by the IP which has moved. We can *sometimes* trigger this by manually suspending a CES node, but not always and some clients mounting from the IP moving will be fine, others won't.

If we resume a node an it fails back, the clients that are hanging will usually recover fine. We can reboot a client prior to failback and it will be fine, stopping and starting the ganesha service on a protocol node will also sometimes resolve the issues.

So, has anyone seen this sort of issue and any suggestions for how we could either debug more or workaround?

We are currently running the packages
nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).

At one point we were seeing it a lot, and could track it back to an underlying GPFS network issue that was causing protocol nodes to be expelled occasionally, we resolved that and the issues became less apparent, but maybe we just fixed one failure mode so see it less often.

On the clients, we use -o sync,hard BTW as in the IBM docs.

On a client showing the issues, we'll see in dmesg, NFS related messages
like:
[Wed Apr 12 16:59:53 2017] nfs: server MYNFSSERVER.bham.ac.uk not responding, timed out

Which explains the client hang on certain mount points.

The symptoms feel very much like those logged in this Gluster/ganesha bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1354439


Thanks

Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From S.J.Thompson at bham.ac.uk  Tue Apr 25 15:06:04 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Tue, 25 Apr 2017 14:06:04 +0000
Subject: [gpfsug-discuss] NFS issues
Message-ID: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>

Hi,

>From what I can see, Ganesha uses the Export_Id option in the config file
(which is managed by CES) for this. I did find some reference in the
Ganesha devs list that if its not set, then it would read the FSID from
the GPFS file-system, either way they should surely be consistent across
all the nodes. The posts I found were from someone with an IBM email
address, so I guess someone in the IBM teams.

I checked a couple of my protocol nodes and they use the same Export_Id
consistently, though I guess that might not be the same as the FSID value.

Perhaps someone from IBM could comment on if FSID is likely to the cause
of my problems?

Thanks

Simon

On 25/04/2017, 14:51, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of Ouwehand, JJ" <gpfsug-discuss-bounces at spectrumscale.org on behalf of
j.ouwehand at vumc.nl> wrote:

>Hello,
>
>At first a short introduction. My name is Jaap Jan Ouwehand, I work at a
>Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of IBM
>Spectrum Scale, Spectrum Archive and Spectrum Protect in our critical
>(office, research and clinical data) business process. We have three
>large GPFS filesystems for different purposes.
>
>We also had such a situation with cNFS. A failover (IPtakeover) was
>technically good, only clients experienced "stale filehandles". We opened
>a PMR at IBM and after testing, deliver logs, tcpdumps and a few months
>later, the solution appeared to be in the fsid option.
>
>An NFS filehandle is built by a combination of fsid and a hash function
>on the inode. After a failover, the fsid value can be different and the
>client has a "stale filehandle". To avoid this, the fsid value can be
>statically specified. See:
>
>https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.
>scale.v4r22.doc/bl1adm_nfslin.htm
>
>Maybe there is also a value in Ganesha that changes after a failover.
>Certainly since most sessions will be re-established after a failback.
>Maybe you see more debug information with tcpdump.
>
>
>Kind regards,
> 
>Jaap Jan Ouwehand
>ICT Specialist (Storage & Linux)
>VUmc - ICT
>E: jj.ouwehand at vumc.nl
>W: www.vumc.com
>
>
>
>-----Oorspronkelijk bericht-----
>Van: gpfsug-discuss-bounces at spectrumscale.org
>[mailto:gpfsug-discuss-bounces at spectrumscale.org] Namens Simon Thompson
>(IT Research Support)
>Verzonden: dinsdag 25 april 2017 13:21
>Aan: gpfsug-discuss at spectrumscale.org
>Onderwerp: [gpfsug-discuss] NFS issues
>
>Hi,
>
>We have recently started deploying NFS in addition our existing SMB
>exports on our protocol nodes.
>
>We use a RR DNS name that points to 4 VIPs for SMB services and failover
>seems to work fine with SMB clients. We figured we could use the same
>name and IPs and run Ganesha on the protocol servers, however we are
>seeing issues with NFS clients when IP failover occurs.
>
>In normal operation on a client, we might see several mounts from
>different IPs obviously due to the way the DNS RR is working, but it all
>works fine.
>
>In a failover situation, the IP will move to another node and some
>clients will carry on, others will hang IO to the mount points referred
>to by the IP which has moved. We can *sometimes* trigger this by manually
>suspending a CES node, but not always and some clients mounting from the
>IP moving will be fine, others won't.
>
>If we resume a node an it fails back, the clients that are hanging will
>usually recover fine. We can reboot a client prior to failback and it
>will be fine, stopping and starting the ganesha service on a protocol
>node will also sometimes resolve the issues.
>
>So, has anyone seen this sort of issue and any suggestions for how we
>could either debug more or workaround?
>
>We are currently running the packages
>nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
>
>At one point we were seeing it a lot, and could track it back to an
>underlying GPFS network issue that was causing protocol nodes to be
>expelled occasionally, we resolved that and the issues became less
>apparent, but maybe we just fixed one failure mode so see it less often.
>
>On the clients, we use -o sync,hard BTW as in the IBM docs.
>
>On a client showing the issues, we'll see in dmesg, NFS related messages
>like:
>[Wed Apr 12 16:59:53 2017] nfs: server MYNFSSERVER.bham.ac.uk not
>responding, timed out
>
>Which explains the client hang on certain mount points.
>
>The symptoms feel very much like those logged in this Gluster/ganesha bug:
>https://bugzilla.redhat.com/show_bug.cgi?id=1354439
>
>
>Thanks
>
>Simon
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Mark.Bush at siriuscom.com  Tue Apr 25 15:13:58 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Tue, 25 Apr 2017 14:13:58 +0000
Subject: [gpfsug-discuss] Perfmon and GUI
Message-ID: <2A0DC44A-D9FF-428B-8B02-FC6EC504BD34@siriuscom.com>

Interesting.  Some files were indeed already there but it was missing a few NFSIO.cfg being the most notable to me.  I?ve gone ahead and copied those to all my nodes (just three in this cluster) and restarted services.  Still no luck.  I?m going to restart the GUI service next to see if that makes a difference.  Interestingly I can do things like mmperfmon query smb2 and that tends to work and give me real data so not sure where the breakdown is in the GUI.


Mark

From: "Sobey, Richard A" <r.sobey at imperial.ac.uk>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, April 25, 2017 at 8:44 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

I would have thought this would be fixed by now as this happened to me in 4.2.1-(0?) ? here?s what support said. Can you try? I think you?ve already got the relevant bits in your .cfg files so it should just be a case of copying the files across and restarting pmsensors and pmcollector.

Again bear in mind this affected me on 4.2.1 and you?re using 4.2.3 so ymmv..

?
I spoke with development and normally these files would be copied over
to /opt/IBM/zimon when using the automatic installer but since this case
doesn't use the installer we have to copy them over manually. We
acknowledge this should be in the docs, and the reason it is not
included in pmsensors rpm is due to the fact these do not come from the
zimon team.

The following files can be copied over to /opt/IBM/zimon

[root at node1 default]# pwd
/usr/lpp/mmfs/4.2.1.0/installer/cookbooks/zimon_on_gpfs/files/default

[root at node1 default]# ls
CTDBDBStats.cfg  CTDBStats.cfg  NFSIO.cfg  SMBGlobalStats.cfg
SMBSensors.cfg  SMBStats.cfg  ZIMonCollector.cfg
?

Richard

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush
Sent: 25 April 2017 14:28
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Perfmon and GUI

Anyone know why in the GUI when I go to look at things like nodes and select a protocol node and then pick NFS or SMB why it has the boxes where a graph is supposed to be and it has a Red circled X and says ?Performance collector did not return any data??
I?ve added the things from the link into my protocol Nodes /opt/IBM/zimon/ZIMonSensors.cfg file https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_configuringthePMT.htm

Also restarted both pmsensors and pmcollector on the nodes.  What am I missing?  Here?s my ZIMonSensors.cfg file

[root at n3 zimon]# cat ZIMonSensors.cfg
cephMon = "/opt/IBM/zimon/CephMonProxy"
cephRados = "/opt/IBM/zimon/CephRadosProxy"
colCandidates = "n1"
colRedundancy = 1
collectors = {
        host = "n1"
        port = "4739"
}
config = "/opt/IBM/zimon/ZIMonSensors.cfg"
ctdbstat = ""
daemonize = T
hostname = ""
ipfixinterface = "0.0.0.0"
logfile = "/var/log/zimon/ZIMonSensors.log"
loglevel = "info"
mmcmd = "/opt/IBM/zimon/MMCmdProxy"
mmdfcmd = "/opt/IBM/zimon/MMDFProxy"
mmpmon = "/opt/IBM/zimon/MmpmonSockProxy"
piddir = "/var/run"
release = "4.2.3-0"
sensors = {
        name = "CPU"
        period = 1
},
{
        name = "Load"
        period = 1
},
{
        name = "Memory"
        period = 1
},
{
        name = "Network"
        period = 1
},
{
        name = "Netstat"
        period = 10
},
{
        name = "Diskstat"
        period = 0
},
{
        name = "DiskFree"
        period = 600
},
{
        name = "GPFSDisk"
        period = 0
},
{
        name = "GPFSFilesystem"
        period = 1
},
{
        name = "GPFSNSDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSPoolIO"
        period = 0
},
{
        name = "GPFSVFS"
        period = 1
},
{
        name = "GPFSIOC"
        period = 0
},
{
        name = "GPFSVIO"
        period = 0
},
{
        name = "GPFSPDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSvFLUSH"
        period = 0
},
{
        name = "GPFSNode"
        period = 1
},
{
        name = "GPFSNodeAPI"
        period = 1
},
{
        name = "GPFSFilesystemAPI"
        period = 1
},
{
        name = "GPFSLROC"
        period = 0
},
{
        name = "GPFSCHMS"
        period = 0
},
{
        name = "GPFSAFM"
        period = 0
},
{
        name = "GPFSAFMFS"
        period = 0
},
{
        name = "GPFSAFMFSET"
        period = 0
},
{
        name = "GPFSRPCS"
        period = 10
},
{
        name = "GPFSWaiters"
        period = 10
},
{
        name = "GPFSFilesetQuota"
        period = 3600
},
{
        name = "GPFSDiskCap"
        period = 0
},
{
        name = "GPFSFileset"
        period = 0
        restrict = "n1"
},
{
        name = "GPFSPool"
        period = 0
        restrict = "n1"
},
{
        name = "Infiniband"
        period = 0
},
{
        name = "CTDBDBStats"
        period = 1
        type = "Generic"
},
{
        name = "CTDBStats"
        period = 1
        type = "Generic"
},
{
        name = "NFSIO"
        period = 1
        type = "Generic"
},
{
        name = "SMBGlobalStats"
        period = 1
        type = "Generic"
},
{
        name = "SMBStats"
        period = 1
        type = "Generic"
}
smbstat = ""


This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170425/6b53df75/attachment.htm>

From Mark.Bush at siriuscom.com  Tue Apr 25 15:29:07 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Tue, 25 Apr 2017 14:29:07 +0000
Subject: [gpfsug-discuss] Perfmon and GUI
In-Reply-To: <2A0DC44A-D9FF-428B-8B02-FC6EC504BD34@siriuscom.com>
References: <2A0DC44A-D9FF-428B-8B02-FC6EC504BD34@siriuscom.com>
Message-ID: <D3CC2940-CB65-4A84-B28F-9558FEAEA190@siriuscom.com>

Update:  So SMB monitoring is now working after copying all files per Richard?s recommendation (thank you sir) and restarting pmsensors, pmcollector, and gpfsfui.  Sadly, NFS monitoring isn?t.  It doesn?t work from the cli either though.  So clearly, something is up with that part.  I continue to troubleshoot.

From: Mark Bush <Mark.Bush at siriuscom.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, April 25, 2017 at 9:13 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

Interesting.  Some files were indeed already there but it was missing a few NFSIO.cfg being the most notable to me.  I?ve gone ahead and copied those to all my nodes (just three in this cluster) and restarted services.  Still no luck.  I?m going to restart the GUI service next to see if that makes a difference.  Interestingly I can do things like mmperfmon query smb2 and that tends to work and give me real data so not sure where the breakdown is in the GUI.


Mark

From: "Sobey, Richard A" <r.sobey at imperial.ac.uk>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, April 25, 2017 at 8:44 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

I would have thought this would be fixed by now as this happened to me in 4.2.1-(0?) ? here?s what support said. Can you try? I think you?ve already got the relevant bits in your .cfg files so it should just be a case of copying the files across and restarting pmsensors and pmcollector.

Again bear in mind this affected me on 4.2.1 and you?re using 4.2.3 so ymmv..

?
I spoke with development and normally these files would be copied over
to /opt/IBM/zimon when using the automatic installer but since this case
doesn't use the installer we have to copy them over manually. We
acknowledge this should be in the docs, and the reason it is not
included in pmsensors rpm is due to the fact these do not come from the
zimon team.

The following files can be copied over to /opt/IBM/zimon

[root at node1 default]# pwd
/usr/lpp/mmfs/4.2.1.0/installer/cookbooks/zimon_on_gpfs/files/default

[root at node1 default]# ls
CTDBDBStats.cfg  CTDBStats.cfg  NFSIO.cfg  SMBGlobalStats.cfg
SMBSensors.cfg  SMBStats.cfg  ZIMonCollector.cfg
?

Richard

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush
Sent: 25 April 2017 14:28
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Perfmon and GUI

Anyone know why in the GUI when I go to look at things like nodes and select a protocol node and then pick NFS or SMB why it has the boxes where a graph is supposed to be and it has a Red circled X and says ?Performance collector did not return any data??
I?ve added the things from the link into my protocol Nodes /opt/IBM/zimon/ZIMonSensors.cfg file https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_configuringthePMT.htm

Also restarted both pmsensors and pmcollector on the nodes.  What am I missing?  Here?s my ZIMonSensors.cfg file

[root at n3 zimon]# cat ZIMonSensors.cfg
cephMon = "/opt/IBM/zimon/CephMonProxy"
cephRados = "/opt/IBM/zimon/CephRadosProxy"
colCandidates = "n1"
colRedundancy = 1
collectors = {
        host = "n1"
        port = "4739"
}
config = "/opt/IBM/zimon/ZIMonSensors.cfg"
ctdbstat = ""
daemonize = T
hostname = ""
ipfixinterface = "0.0.0.0"
logfile = "/var/log/zimon/ZIMonSensors.log"
loglevel = "info"
mmcmd = "/opt/IBM/zimon/MMCmdProxy"
mmdfcmd = "/opt/IBM/zimon/MMDFProxy"
mmpmon = "/opt/IBM/zimon/MmpmonSockProxy"
piddir = "/var/run"
release = "4.2.3-0"
sensors = {
        name = "CPU"
        period = 1
},
{
        name = "Load"
        period = 1
},
{
        name = "Memory"
        period = 1
},
{
        name = "Network"
        period = 1
},
{
        name = "Netstat"
        period = 10
},
{
        name = "Diskstat"
        period = 0
},
{
        name = "DiskFree"
        period = 600
},
{
        name = "GPFSDisk"
        period = 0
},
{
        name = "GPFSFilesystem"
        period = 1
},
{
        name = "GPFSNSDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSPoolIO"
        period = 0
},
{
        name = "GPFSVFS"
        period = 1
},
{
        name = "GPFSIOC"
        period = 0
},
{
        name = "GPFSVIO"
        period = 0
},
{
        name = "GPFSPDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSvFLUSH"
        period = 0
},
{
        name = "GPFSNode"
        period = 1
},
{
        name = "GPFSNodeAPI"
        period = 1
},
{
        name = "GPFSFilesystemAPI"
        period = 1
},
{
        name = "GPFSLROC"
        period = 0
},
{
        name = "GPFSCHMS"
        period = 0
},
{
        name = "GPFSAFM"
        period = 0
},
{
        name = "GPFSAFMFS"
        period = 0
},
{
        name = "GPFSAFMFSET"
        period = 0
},
{
        name = "GPFSRPCS"
        period = 10
},
{
        name = "GPFSWaiters"
        period = 10
},
{
        name = "GPFSFilesetQuota"
        period = 3600
},
{
        name = "GPFSDiskCap"
        period = 0
},
{
        name = "GPFSFileset"
        period = 0
        restrict = "n1"
},
{
        name = "GPFSPool"
        period = 0
        restrict = "n1"
},
{
        name = "Infiniband"
        period = 0
},
{
        name = "CTDBDBStats"
        period = 1
        type = "Generic"
},
{
        name = "CTDBStats"
        period = 1
        type = "Generic"
},
{
        name = "NFSIO"
        period = 1
        type = "Generic"
},
{
        name = "SMBGlobalStats"
        period = 1
        type = "Generic"
},
{
        name = "SMBStats"
        period = 1
        type = "Generic"
}
smbstat = ""


This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170425/82a05cdd/attachment.htm>

From r.sobey at imperial.ac.uk  Tue Apr 25 15:31:13 2017
From: r.sobey at imperial.ac.uk (Sobey, Richard A)
Date: Tue, 25 Apr 2017 14:31:13 +0000
Subject: [gpfsug-discuss] Perfmon and GUI
In-Reply-To: <D3CC2940-CB65-4A84-B28F-9558FEAEA190@siriuscom.com>
References: <2A0DC44A-D9FF-428B-8B02-FC6EC504BD34@siriuscom.com>
	<D3CC2940-CB65-4A84-B28F-9558FEAEA190@siriuscom.com>
Message-ID: <AMSPR06MB40566662ED3105F8C461C7DDF1E0@AMSPR06MB405.eurprd06.prod.outlook.com>

No worries Mark. We don?t use NFS here (yet) so I can?t help there.

Glad I could help.

Richard

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush
Sent: 25 April 2017 15:29
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

Update:  So SMB monitoring is now working after copying all files per Richard?s recommendation (thank you sir) and restarting pmsensors, pmcollector, and gpfsfui.  Sadly, NFS monitoring isn?t.  It doesn?t work from the cli either though.  So clearly, something is up with that part.  I continue to troubleshoot.

From: Mark Bush <Mark.Bush at siriuscom.com<mailto:Mark.Bush at siriuscom.com>>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Tuesday, April 25, 2017 at 9:13 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

Interesting.  Some files were indeed already there but it was missing a few NFSIO.cfg being the most notable to me.  I?ve gone ahead and copied those to all my nodes (just three in this cluster) and restarted services.  Still no luck.  I?m going to restart the GUI service next to see if that makes a difference.  Interestingly I can do things like mmperfmon query smb2 and that tends to work and give me real data so not sure where the breakdown is in the GUI.


Mark

From: "Sobey, Richard A" <r.sobey at imperial.ac.uk<mailto:r.sobey at imperial.ac.uk>>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Tuesday, April 25, 2017 at 8:44 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

I would have thought this would be fixed by now as this happened to me in 4.2.1-(0?) ? here?s what support said. Can you try? I think you?ve already got the relevant bits in your .cfg files so it should just be a case of copying the files across and restarting pmsensors and pmcollector.

Again bear in mind this affected me on 4.2.1 and you?re using 4.2.3 so ymmv..

?
I spoke with development and normally these files would be copied over
to /opt/IBM/zimon when using the automatic installer but since this case
doesn't use the installer we have to copy them over manually. We
acknowledge this should be in the docs, and the reason it is not
included in pmsensors rpm is due to the fact these do not come from the
zimon team.

The following files can be copied over to /opt/IBM/zimon

[root at node1 default]# pwd
/usr/lpp/mmfs/4.2.1.0/installer/cookbooks/zimon_on_gpfs/files/default

[root at node1 default]# ls
CTDBDBStats.cfg  CTDBStats.cfg  NFSIO.cfg  SMBGlobalStats.cfg
SMBSensors.cfg  SMBStats.cfg  ZIMonCollector.cfg
?

Richard

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush
Sent: 25 April 2017 14:28
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Perfmon and GUI

Anyone know why in the GUI when I go to look at things like nodes and select a protocol node and then pick NFS or SMB why it has the boxes where a graph is supposed to be and it has a Red circled X and says ?Performance collector did not return any data??
I?ve added the things from the link into my protocol Nodes /opt/IBM/zimon/ZIMonSensors.cfg file https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_configuringthePMT.htm

Also restarted both pmsensors and pmcollector on the nodes.  What am I missing?  Here?s my ZIMonSensors.cfg file

[root at n3 zimon]# cat ZIMonSensors.cfg
cephMon = "/opt/IBM/zimon/CephMonProxy"
cephRados = "/opt/IBM/zimon/CephRadosProxy"
colCandidates = "n1"
colRedundancy = 1
collectors = {
        host = "n1"
        port = "4739"
}
config = "/opt/IBM/zimon/ZIMonSensors.cfg"
ctdbstat = ""
daemonize = T
hostname = ""
ipfixinterface = "0.0.0.0"
logfile = "/var/log/zimon/ZIMonSensors.log"
loglevel = "info"
mmcmd = "/opt/IBM/zimon/MMCmdProxy"
mmdfcmd = "/opt/IBM/zimon/MMDFProxy"
mmpmon = "/opt/IBM/zimon/MmpmonSockProxy"
piddir = "/var/run"
release = "4.2.3-0"
sensors = {
        name = "CPU"
        period = 1
},
{
        name = "Load"
        period = 1
},
{
        name = "Memory"
        period = 1
},
{
        name = "Network"
        period = 1
},
{
        name = "Netstat"
        period = 10
},
{
        name = "Diskstat"
        period = 0
},
{
        name = "DiskFree"
        period = 600
},
{
        name = "GPFSDisk"
        period = 0
},
{
        name = "GPFSFilesystem"
        period = 1
},
{
        name = "GPFSNSDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSPoolIO"
        period = 0
},
{
        name = "GPFSVFS"
        period = 1
},
{
        name = "GPFSIOC"
        period = 0
},
{
        name = "GPFSVIO"
        period = 0
},
{
        name = "GPFSPDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSvFLUSH"
        period = 0
},
{
        name = "GPFSNode"
        period = 1
},
{
        name = "GPFSNodeAPI"
        period = 1
},
{
        name = "GPFSFilesystemAPI"
        period = 1
},
{
        name = "GPFSLROC"
        period = 0
},
{
        name = "GPFSCHMS"
        period = 0
},
{
        name = "GPFSAFM"
        period = 0
},
{
        name = "GPFSAFMFS"
        period = 0
},
{
        name = "GPFSAFMFSET"
        period = 0
},
{
        name = "GPFSRPCS"
        period = 10
},
{
        name = "GPFSWaiters"
        period = 10
},
{
        name = "GPFSFilesetQuota"
        period = 3600
},
{
        name = "GPFSDiskCap"
        period = 0
},
{
        name = "GPFSFileset"
        period = 0
        restrict = "n1"
},
{
        name = "GPFSPool"
        period = 0
        restrict = "n1"
},
{
        name = "Infiniband"
        period = 0
},
{
        name = "CTDBDBStats"
        period = 1
        type = "Generic"
},
{
        name = "CTDBStats"
        period = 1
        type = "Generic"
},
{
        name = "NFSIO"
        period = 1
        type = "Generic"
},
{
        name = "SMBGlobalStats"
        period = 1
        type = "Generic"
},
{
        name = "SMBStats"
        period = 1
        type = "Generic"
}
smbstat = ""


This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170425/9f761fdc/attachment.htm>

From janfrode at tanso.net  Tue Apr 25 18:04:41 2017
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Tue, 25 Apr 2017 17:04:41 +0000
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>
References: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>
Message-ID: <CAHwPatjOnYjn3igycRosUABF4xHxPYckWLP0SHaC7sRLs2T0Aw@mail.gmail.com>

I *think* I've seen this, and that we then had open TCP connection from
client to NFS server according to netstat, but these connections were not
visible from netstat on NFS-server side.

Unfortunately I don't remember what the fix was..


  -jf

tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support) <
S.J.Thompson at bham.ac.uk>:

> Hi,
>
> From what I can see, Ganesha uses the Export_Id option in the config file
> (which is managed by CES) for this. I did find some reference in the
> Ganesha devs list that if its not set, then it would read the FSID from
> the GPFS file-system, either way they should surely be consistent across
> all the nodes. The posts I found were from someone with an IBM email
> address, so I guess someone in the IBM teams.
>
> I checked a couple of my protocol nodes and they use the same Export_Id
> consistently, though I guess that might not be the same as the FSID value.
>
> Perhaps someone from IBM could comment on if FSID is likely to the cause
> of my problems?
>
> Thanks
>
> Simon
>
> On 25/04/2017, 14:51, "gpfsug-discuss-bounces at spectrumscale.org on behalf
> of Ouwehand, JJ" <gpfsug-discuss-bounces at spectrumscale.org on behalf of
> j.ouwehand at vumc.nl> wrote:
>
> >Hello,
> >
> >At first a short introduction. My name is Jaap Jan Ouwehand, I work at a
> >Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of IBM
> >Spectrum Scale, Spectrum Archive and Spectrum Protect in our critical
> >(office, research and clinical data) business process. We have three
> >large GPFS filesystems for different purposes.
> >
> >We also had such a situation with cNFS. A failover (IPtakeover) was
> >technically good, only clients experienced "stale filehandles". We opened
> >a PMR at IBM and after testing, deliver logs, tcpdumps and a few months
> >later, the solution appeared to be in the fsid option.
> >
> >An NFS filehandle is built by a combination of fsid and a hash function
> >on the inode. After a failover, the fsid value can be different and the
> >client has a "stale filehandle". To avoid this, the fsid value can be
> >statically specified. See:
> >
> >https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum
> .
> >scale.v4r22.doc/bl1adm_nfslin.htm
> >
> >Maybe there is also a value in Ganesha that changes after a failover.
> >Certainly since most sessions will be re-established after a failback.
> >Maybe you see more debug information with tcpdump.
> >
> >
> >Kind regards,
> >
> >Jaap Jan Ouwehand
> >ICT Specialist (Storage & Linux)
> >VUmc - ICT
> >E: jj.ouwehand at vumc.nl
> >W: www.vumc.com
> >
> >
> >
> >-----Oorspronkelijk bericht-----
> >Van: gpfsug-discuss-bounces at spectrumscale.org
> >[mailto:gpfsug-discuss-bounces at spectrumscale.org] Namens Simon Thompson
> >(IT Research Support)
> >Verzonden: dinsdag 25 april 2017 13:21
> >Aan: gpfsug-discuss at spectrumscale.org
> >Onderwerp: [gpfsug-discuss] NFS issues
> >
> >Hi,
> >
> >We have recently started deploying NFS in addition our existing SMB
> >exports on our protocol nodes.
> >
> >We use a RR DNS name that points to 4 VIPs for SMB services and failover
> >seems to work fine with SMB clients. We figured we could use the same
> >name and IPs and run Ganesha on the protocol servers, however we are
> >seeing issues with NFS clients when IP failover occurs.
> >
> >In normal operation on a client, we might see several mounts from
> >different IPs obviously due to the way the DNS RR is working, but it all
> >works fine.
> >
> >In a failover situation, the IP will move to another node and some
> >clients will carry on, others will hang IO to the mount points referred
> >to by the IP which has moved. We can *sometimes* trigger this by manually
> >suspending a CES node, but not always and some clients mounting from the
> >IP moving will be fine, others won't.
> >
> >If we resume a node an it fails back, the clients that are hanging will
> >usually recover fine. We can reboot a client prior to failback and it
> >will be fine, stopping and starting the ganesha service on a protocol
> >node will also sometimes resolve the issues.
> >
> >So, has anyone seen this sort of issue and any suggestions for how we
> >could either debug more or workaround?
> >
> >We are currently running the packages
> >nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
> >
> >At one point we were seeing it a lot, and could track it back to an
> >underlying GPFS network issue that was causing protocol nodes to be
> >expelled occasionally, we resolved that and the issues became less
> >apparent, but maybe we just fixed one failure mode so see it less often.
> >
> >On the clients, we use -o sync,hard BTW as in the IBM docs.
> >
> >On a client showing the issues, we'll see in dmesg, NFS related messages
> >like:
> >[Wed Apr 12 16:59:53 2017] nfs: server MYNFSSERVER.bham.ac.uk not
> >responding, timed out
> >
> >Which explains the client hang on certain mount points.
> >
> >The symptoms feel very much like those logged in this Gluster/ganesha bug:
> >https://bugzilla.redhat.com/show_bug.cgi?id=1354439
> >
> >
> >Thanks
> >
> >Simon
> >
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org
> >http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org
> >http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170425/dc6ea11b/attachment.htm>

From hoang.nguyen at seagate.com  Tue Apr 25 18:12:19 2017
From: hoang.nguyen at seagate.com (Hoang Nguyen)
Date: Tue, 25 Apr 2017 10:12:19 -0700
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>
References: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>
Message-ID: <CAGPc+JeGt6tbWpeYYtTNPabpqOV7DzEvOtgsUTy+4fvMyVQ47A@mail.gmail.com>

I have a customer with a slightly different issue but sounds somewhat
related. If you stop and stop the NFS service on a CES node or update an
existing export which will restart Ganesha. Some of their NFS clients do
not reconnect in a very similar fashion as you described.

I haven't been able to reproduce it on my test system repeatedly but using
soft NFS mounts seems to help. Seems like it happens more often to clients
currently running NFS IO during the outage. But I'm interested to see what
you guys uncover.

Thanks,
Hoang


On Tue, Apr 25, 2017 at 7:06 AM, Simon Thompson (IT Research Support) <
S.J.Thompson at bham.ac.uk> wrote:

> Hi,
>
> From what I can see, Ganesha uses the Export_Id option in the config file
> (which is managed by CES) for this. I did find some reference in the
> Ganesha devs list that if its not set, then it would read the FSID from
> the GPFS file-system, either way they should surely be consistent across
> all the nodes. The posts I found were from someone with an IBM email
> address, so I guess someone in the IBM teams.
>
> I checked a couple of my protocol nodes and they use the same Export_Id
> consistently, though I guess that might not be the same as the FSID value.
>
> Perhaps someone from IBM could comment on if FSID is likely to the cause
> of my problems?
>
> Thanks
>
> Simon
>
> On 25/04/2017, 14:51, "gpfsug-discuss-bounces at spectrumscale.org on behalf
> of Ouwehand, JJ" <gpfsug-discuss-bounces at spectrumscale.org on behalf of
> j.ouwehand at vumc.nl> wrote:
>
> >Hello,
> >
> >At first a short introduction. My name is Jaap Jan Ouwehand, I work at a
> >Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of IBM
> >Spectrum Scale, Spectrum Archive and Spectrum Protect in our critical
> >(office, research and clinical data) business process. We have three
> >large GPFS filesystems for different purposes.
> >
> >We also had such a situation with cNFS. A failover (IPtakeover) was
> >technically good, only clients experienced "stale filehandles". We opened
> >a PMR at IBM and after testing, deliver logs, tcpdumps and a few months
> >later, the solution appeared to be in the fsid option.
> >
> >An NFS filehandle is built by a combination of fsid and a hash function
> >on the inode. After a failover, the fsid value can be different and the
> >client has a "stale filehandle". To avoid this, the fsid value can be
> >statically specified. See:
> >
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ibm.com_support_
> knowledgecenter_STXKQY-5F4.2.2_com.ibm.spectrum&d=DwICAg&c=
> IGDlg0lD0b-nebmJJ0Kp8A&r=erT0ET1g1dsvTDYndRRTAAZ6Dneebt
> G6F47PIUMDXFw&m=K3iXrW2N_HcdrGDuKmRWFjypuPLPJDIm9VosFIIsFoI&s=
> PIXnA0UQbneTHMRxvUcmsvZK6z5V2XU4jR_GIVaZP5Q&e= .
> >scale.v4r22.doc/bl1adm_nfslin.htm
> >
> >Maybe there is also a value in Ganesha that changes after a failover.
> >Certainly since most sessions will be re-established after a failback.
> >Maybe you see more debug information with tcpdump.
> >
> >
> >Kind regards,
> >
> >Jaap Jan Ouwehand
> >ICT Specialist (Storage & Linux)
> >VUmc - ICT
> >E: jj.ouwehand at vumc.nl
> >W: www.vumc.com
> >
> >
> >
> >-----Oorspronkelijk bericht-----
> >Van: gpfsug-discuss-bounces at spectrumscale.org
> >[mailto:gpfsug-discuss-bounces at spectrumscale.org] Namens Simon Thompson
> >(IT Research Support)
> >Verzonden: dinsdag 25 april 2017 13:21
> >Aan: gpfsug-discuss at spectrumscale.org
> >Onderwerp: [gpfsug-discuss] NFS issues
> >
> >Hi,
> >
> >We have recently started deploying NFS in addition our existing SMB
> >exports on our protocol nodes.
> >
> >We use a RR DNS name that points to 4 VIPs for SMB services and failover
> >seems to work fine with SMB clients. We figured we could use the same
> >name and IPs and run Ganesha on the protocol servers, however we are
> >seeing issues with NFS clients when IP failover occurs.
> >
> >In normal operation on a client, we might see several mounts from
> >different IPs obviously due to the way the DNS RR is working, but it all
> >works fine.
> >
> >In a failover situation, the IP will move to another node and some
> >clients will carry on, others will hang IO to the mount points referred
> >to by the IP which has moved. We can *sometimes* trigger this by manually
> >suspending a CES node, but not always and some clients mounting from the
> >IP moving will be fine, others won't.
> >
> >If we resume a node an it fails back, the clients that are hanging will
> >usually recover fine. We can reboot a client prior to failback and it
> >will be fine, stopping and starting the ganesha service on a protocol
> >node will also sometimes resolve the issues.
> >
> >So, has anyone seen this sort of issue and any suggestions for how we
> >could either debug more or workaround?
> >
> >We are currently running the packages
> >nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
> >
> >At one point we were seeing it a lot, and could track it back to an
> >underlying GPFS network issue that was causing protocol nodes to be
> >expelled occasionally, we resolved that and the issues became less
> >apparent, but maybe we just fixed one failure mode so see it less often.
> >
> >On the clients, we use -o sync,hard BTW as in the IBM docs.
> >
> >On a client showing the issues, we'll see in dmesg, NFS related messages
> >like:
> >[Wed Apr 12 16:59:53 2017] nfs: server MYNFSSERVER.bham.ac.uk not
> >responding, timed out
> >
> >Which explains the client hang on certain mount points.
> >
> >The symptoms feel very much like those logged in this Gluster/ganesha bug:
> >https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1354439&d=
> DwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=erT0ET1g1dsvTDYndRRTAAZ6Dneebt
> G6F47PIUMDXFw&m=K3iXrW2N_HcdrGDuKmRWFjypuPLPJDIm9VosFII
> sFoI&s=KN5WKk1vLEt0Y_17nVQeDi1lK5mSQUZQ7lPtQK3FBG4&e=
> >
> >
> >Thanks
> >
> >Simon
> >
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org
> >https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_
> listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=
> erT0ET1g1dsvTDYndRRTAAZ6DneebtG6F47PIUMDXFw&m=K3iXrW2N_
> HcdrGDuKmRWFjypuPLPJDIm9VosFIIsFoI&s=rvZX6mp5gZr7h3QuwTM2EVZaG-
> d1VXwSDKDhKVyQurw&e=
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org
> >https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_
> listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=
> erT0ET1g1dsvTDYndRRTAAZ6DneebtG6F47PIUMDXFw&m=K3iXrW2N_
> HcdrGDuKmRWFjypuPLPJDIm9VosFIIsFoI&s=rvZX6mp5gZr7h3QuwTM2EVZaG-
> d1VXwSDKDhKVyQurw&e=
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.
> org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=
> erT0ET1g1dsvTDYndRRTAAZ6DneebtG6F47PIUMDXFw&m=K3iXrW2N_
> HcdrGDuKmRWFjypuPLPJDIm9VosFIIsFoI&s=rvZX6mp5gZr7h3QuwTM2EVZaG-
> d1VXwSDKDhKVyQurw&e=
>


-- 
Hoang Nguyen *? *Sr Staff Engineer
Seagate Technology
office:   +1 (858) 751-4487
mobile: +1 (858) 284-7846
www.seagate.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170425/d8a83f1e/attachment.htm>

From S.J.Thompson at bham.ac.uk  Tue Apr 25 18:30:40 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Tue, 25 Apr 2017 17:30:40 +0000
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <CAHwPatjOnYjn3igycRosUABF4xHxPYckWLP0SHaC7sRLs2T0Aw@mail.gmail.com>
References: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>,
	<CAHwPatjOnYjn3igycRosUABF4xHxPYckWLP0SHaC7sRLs2T0Aw@mail.gmail.com>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DEF87@EX13.adf.bham.ac.uk>

I did some digging in the mmcesfuncs to see what happens server side on fail over.

Basically the server losing the IP is supposed to terminate all sessions and the receiver server sends ACK tickles.

My current supposition is that for whatever reason, the losing server isn't releasing something and the client still has hold of a connection which is mostly dead. The tickle then fails to the client from the new server.

This would explain why failing the IP back to the original server usually brings the client back to life.

This is only my working theory at the moment as we can't reliably reproduce this. Next time it happens we plan to grab some netstat from each side. 

Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the server that received the IP and see if that fixes it (i.e. the receiver server didn't tickle properly). (Usage extracted from mmcesfuncs which is ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd) for anyone interested.

Then try and kill he sessions on the losing server to check if there is stuff still open and re-tickle the client.

If we can get steps to workaround, I'll log a PMR. I suppose I could do that now, but given its non deterministic and we want to be 100% sure it's not us doing something wrong, I'm inclined to wait until we do some more testing.

I agree with the suggestion that it's probably IO pending nodes that are affected, but don't have any data to back that up yet. We did try with a read workload on a client, but may we need either long IO blocked reads or writes (from the GPFS end).

We also originally had soft as the default option, but saw issues then and the docs suggested hard, so we switched and also enabled sync (we figured maybe it was NFS client with uncommited writes), but neither have resolved the issues entirely. Difficult for me to say if they improved the issue though given its sporadic.

Appreciate people's suggestions!

Thanks

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net]
Sent: 25 April 2017 18:04
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] NFS issues

I *think* I've seen this, and that we then had open TCP connection from client to NFS server according to netstat, but these connections were not visible from netstat on NFS-server side.

Unfortunately I don't remember what the fix was..


  -jf

tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support) <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:
Hi,

>From what I can see, Ganesha uses the Export_Id option in the config file
(which is managed by CES) for this. I did find some reference in the
Ganesha devs list that if its not set, then it would read the FSID from
the GPFS file-system, either way they should surely be consistent across
all the nodes. The posts I found were from someone with an IBM email
address, so I guess someone in the IBM teams.

I checked a couple of my protocol nodes and they use the same Export_Id
consistently, though I guess that might not be the same as the FSID value.

Perhaps someone from IBM could comment on if FSID is likely to the cause
of my problems?

Thanks

Simon

On 25/04/2017, 14:51, "gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> on behalf
of Ouwehand, JJ" <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> on behalf of
j.ouwehand at vumc.nl<mailto:j.ouwehand at vumc.nl>> wrote:

>Hello,
>
>At first a short introduction. My name is Jaap Jan Ouwehand, I work at a
>Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of IBM
>Spectrum Scale, Spectrum Archive and Spectrum Protect in our critical
>(office, research and clinical data) business process. We have three
>large GPFS filesystems for different purposes.
>
>We also had such a situation with cNFS. A failover (IPtakeover) was
>technically good, only clients experienced "stale filehandles". We opened
>a PMR at IBM and after testing, deliver logs, tcpdumps and a few months
>later, the solution appeared to be in the fsid option.
>
>An NFS filehandle is built by a combination of fsid and a hash function
>on the inode. After a failover, the fsid value can be different and the
>client has a "stale filehandle". To avoid this, the fsid value can be
>statically specified. See:
>
>https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.
>scale.v4r22.doc/bl1adm_nfslin.htm
>
>Maybe there is also a value in Ganesha that changes after a failover.
>Certainly since most sessions will be re-established after a failback.
>Maybe you see more debug information with tcpdump.
>
>
>Kind regards,
>
>Jaap Jan Ouwehand
>ICT Specialist (Storage & Linux)
>VUmc - ICT
>E: jj.ouwehand at vumc.nl<mailto:jj.ouwehand at vumc.nl>
>W: www.vumc.com<http://www.vumc.com>
>
>
>
>-----Oorspronkelijk bericht-----
>Van: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
>[mailto:gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>] Namens Simon Thompson
>(IT Research Support)
>Verzonden: dinsdag 25 april 2017 13:21
>Aan: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
>Onderwerp: [gpfsug-discuss] NFS issues
>
>Hi,
>
>We have recently started deploying NFS in addition our existing SMB
>exports on our protocol nodes.
>
>We use a RR DNS name that points to 4 VIPs for SMB services and failover
>seems to work fine with SMB clients. We figured we could use the same
>name and IPs and run Ganesha on the protocol servers, however we are
>seeing issues with NFS clients when IP failover occurs.
>
>In normal operation on a client, we might see several mounts from
>different IPs obviously due to the way the DNS RR is working, but it all
>works fine.
>
>In a failover situation, the IP will move to another node and some
>clients will carry on, others will hang IO to the mount points referred
>to by the IP which has moved. We can *sometimes* trigger this by manually
>suspending a CES node, but not always and some clients mounting from the
>IP moving will be fine, others won't.
>
>If we resume a node an it fails back, the clients that are hanging will
>usually recover fine. We can reboot a client prior to failback and it
>will be fine, stopping and starting the ganesha service on a protocol
>node will also sometimes resolve the issues.
>
>So, has anyone seen this sort of issue and any suggestions for how we
>could either debug more or workaround?
>
>We are currently running the packages
>nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
>
>At one point we were seeing it a lot, and could track it back to an
>underlying GPFS network issue that was causing protocol nodes to be
>expelled occasionally, we resolved that and the issues became less
>apparent, but maybe we just fixed one failure mode so see it less often.
>
>On the clients, we use -o sync,hard BTW as in the IBM docs.
>
>On a client showing the issues, we'll see in dmesg, NFS related messages
>like:
>[Wed Apr 12 16:59:53 2017] nfs: server MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not
>responding, timed out
>
>Which explains the client hang on certain mount points.
>
>The symptoms feel very much like those logged in this Gluster/ganesha bug:
>https://bugzilla.redhat.com/show_bug.cgi?id=1354439
>
>
>Thanks
>
>Simon
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Greg.Lehmann at csiro.au  Wed Apr 26 00:46:35 2017
From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au)
Date: Tue, 25 Apr 2017 23:46:35 +0000
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DEF87@EX13.adf.bham.ac.uk>
References: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>,
	<CAHwPatjOnYjn3igycRosUABF4xHxPYckWLP0SHaC7sRLs2T0Aw@mail.gmail.com>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DEF87@EX13.adf.bham.ac.uk>
Message-ID: <540d5b070cc8438ebe73df14a1ab619b@exch1-cdc.nexus.csiro.au>

Are you using infiniband or Ethernet? I'm wondering if IBM have solved the gratuitous arp issue which we see with our non-protocols NFS implementation.

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support)
Sent: Wednesday, 26 April 2017 3:31 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] NFS issues

I did some digging in the mmcesfuncs to see what happens server side on fail over.

Basically the server losing the IP is supposed to terminate all sessions and the receiver server sends ACK tickles.

My current supposition is that for whatever reason, the losing server isn't releasing something and the client still has hold of a connection which is mostly dead. The tickle then fails to the client from the new server.

This would explain why failing the IP back to the original server usually brings the client back to life.

This is only my working theory at the moment as we can't reliably reproduce this. Next time it happens we plan to grab some netstat from each side. 

Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the server that received the IP and see if that fixes it (i.e. the receiver server didn't tickle properly). (Usage extracted from mmcesfuncs which is ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd) for anyone interested.

Then try and kill he sessions on the losing server to check if there is stuff still open and re-tickle the client.

If we can get steps to workaround, I'll log a PMR. I suppose I could do that now, but given its non deterministic and we want to be 100% sure it's not us doing something wrong, I'm inclined to wait until we do some more testing.

I agree with the suggestion that it's probably IO pending nodes that are affected, but don't have any data to back that up yet. We did try with a read workload on a client, but may we need either long IO blocked reads or writes (from the GPFS end).

We also originally had soft as the default option, but saw issues then and the docs suggested hard, so we switched and also enabled sync (we figured maybe it was NFS client with uncommited writes), but neither have resolved the issues entirely. Difficult for me to say if they improved the issue though given its sporadic.

Appreciate people's suggestions!

Thanks

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net]
Sent: 25 April 2017 18:04
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] NFS issues

I *think* I've seen this, and that we then had open TCP connection from client to NFS server according to netstat, but these connections were not visible from netstat on NFS-server side.

Unfortunately I don't remember what the fix was..


  -jf

tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support) <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:
Hi,

>From what I can see, Ganesha uses the Export_Id option in the config file (which is managed by CES) for this. I did find some reference in the Ganesha devs list that if its not set, then it would read the FSID from the GPFS file-system, either way they should surely be consistent across all the nodes. The posts I found were from someone with an IBM email address, so I guess someone in the IBM teams.

I checked a couple of my protocol nodes and they use the same Export_Id consistently, though I guess that might not be the same as the FSID value.

Perhaps someone from IBM could comment on if FSID is likely to the cause of my problems?

Thanks

Simon

On 25/04/2017, 14:51, "gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> on behalf of Ouwehand, JJ" <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> on behalf of j.ouwehand at vumc.nl<mailto:j.ouwehand at vumc.nl>> wrote:

>Hello,
>
>At first a short introduction. My name is Jaap Jan Ouwehand, I work at 
>a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of 
>IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our 
>critical (office, research and clinical data) business process. We have 
>three large GPFS filesystems for different purposes.
>
>We also had such a situation with cNFS. A failover (IPtakeover) was 
>technically good, only clients experienced "stale filehandles". We 
>opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few 
>months later, the solution appeared to be in the fsid option.
>
>An NFS filehandle is built by a combination of fsid and a hash function 
>on the inode. After a failover, the fsid value can be different and the 
>client has a "stale filehandle". To avoid this, the fsid value can be 
>statically specified. See:
>
>https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.
>scale.v4r22.doc/bl1adm_nfslin.htm
>
>Maybe there is also a value in Ganesha that changes after a failover.
>Certainly since most sessions will be re-established after a failback.
>Maybe you see more debug information with tcpdump.
>
>
>Kind regards,
>
>Jaap Jan Ouwehand
>ICT Specialist (Storage & Linux)
>VUmc - ICT
>E: jj.ouwehand at vumc.nl<mailto:jj.ouwehand at vumc.nl>
>W: www.vumc.com<http://www.vumc.com>
>
>
>
>-----Oorspronkelijk bericht-----
>Van: 
>gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces@
>spectrumscale.org> 
>[mailto:gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-
>bounces at spectrumscale.org>] Namens Simon Thompson (IT Research Support)
>Verzonden: dinsdag 25 april 2017 13:21
>Aan: 
>gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.or
>g>
>Onderwerp: [gpfsug-discuss] NFS issues
>
>Hi,
>
>We have recently started deploying NFS in addition our existing SMB 
>exports on our protocol nodes.
>
>We use a RR DNS name that points to 4 VIPs for SMB services and 
>failover seems to work fine with SMB clients. We figured we could use 
>the same name and IPs and run Ganesha on the protocol servers, however 
>we are seeing issues with NFS clients when IP failover occurs.
>
>In normal operation on a client, we might see several mounts from 
>different IPs obviously due to the way the DNS RR is working, but it 
>all works fine.
>
>In a failover situation, the IP will move to another node and some 
>clients will carry on, others will hang IO to the mount points referred 
>to by the IP which has moved. We can *sometimes* trigger this by 
>manually suspending a CES node, but not always and some clients 
>mounting from the IP moving will be fine, others won't.
>
>If we resume a node an it fails back, the clients that are hanging will 
>usually recover fine. We can reboot a client prior to failback and it 
>will be fine, stopping and starting the ganesha service on a protocol 
>node will also sometimes resolve the issues.
>
>So, has anyone seen this sort of issue and any suggestions for how we 
>could either debug more or workaround?
>
>We are currently running the packages
>nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
>
>At one point we were seeing it a lot, and could track it back to an 
>underlying GPFS network issue that was causing protocol nodes to be 
>expelled occasionally, we resolved that and the issues became less 
>apparent, but maybe we just fixed one failure mode so see it less often.
>
>On the clients, we use -o sync,hard BTW as in the IBM docs.
>
>On a client showing the issues, we'll see in dmesg, NFS related 
>messages
>like:
>[Wed Apr 12 16:59:53 2017] nfs: server 
>MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not responding, 
>timed out
>
>Which explains the client hang on certain mount points.
>
>The symptoms feel very much like those logged in this Gluster/ganesha bug:
>https://bugzilla.redhat.com/show_bug.cgi?id=1354439
>
>
>Thanks
>
>Simon
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Mark.Bush at siriuscom.com  Wed Apr 26 14:26:08 2017
From: Mark.Bush at siriuscom.com (Mark Bush)
Date: Wed, 26 Apr 2017 13:26:08 +0000
Subject: [gpfsug-discuss] Perfmon and GUI
In-Reply-To: <AMSPR06MB40566662ED3105F8C461C7DDF1E0@AMSPR06MB405.eurprd06.prod.outlook.com>
References: <2A0DC44A-D9FF-428B-8B02-FC6EC504BD34@siriuscom.com>
	<D3CC2940-CB65-4A84-B28F-9558FEAEA190@siriuscom.com>
	<AMSPR06MB40566662ED3105F8C461C7DDF1E0@AMSPR06MB405.eurprd06.prod.outlook.com>
Message-ID: <AF212967-5148-4F63-B71E-CFCAD6CBBDC8@siriuscom.com>

My saga has come to an end.  Turns out to get perf stats for NFS you need the gpfs.pm-ganesha package - duh.  I typically do manual installs of scale so I just missed this one as it was buried in /usr/lpp/mmfs/4.2.3.0/zimon_rpms/rhel7.  Anyway, package installed and now I get NFS stats in the gui and from cli.


From: "Sobey, Richard A" <r.sobey at imperial.ac.uk>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, April 25, 2017 at 9:31 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

No worries Mark. We don?t use NFS here (yet) so I can?t help there.

Glad I could help.

Richard

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush
Sent: 25 April 2017 15:29
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

Update:  So SMB monitoring is now working after copying all files per Richard?s recommendation (thank you sir) and restarting pmsensors, pmcollector, and gpfsfui.  Sadly, NFS monitoring isn?t.  It doesn?t work from the cli either though.  So clearly, something is up with that part.  I continue to troubleshoot.

From: Mark Bush <Mark.Bush at siriuscom.com<mailto:Mark.Bush at siriuscom.com>>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Tuesday, April 25, 2017 at 9:13 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

Interesting.  Some files were indeed already there but it was missing a few NFSIO.cfg being the most notable to me.  I?ve gone ahead and copied those to all my nodes (just three in this cluster) and restarted services.  Still no luck.  I?m going to restart the GUI service next to see if that makes a difference.  Interestingly I can do things like mmperfmon query smb2 and that tends to work and give me real data so not sure where the breakdown is in the GUI.


Mark

From: "Sobey, Richard A" <r.sobey at imperial.ac.uk<mailto:r.sobey at imperial.ac.uk>>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Tuesday, April 25, 2017 at 8:44 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Perfmon and GUI

I would have thought this would be fixed by now as this happened to me in 4.2.1-(0?) ? here?s what support said. Can you try? I think you?ve already got the relevant bits in your .cfg files so it should just be a case of copying the files across and restarting pmsensors and pmcollector.

Again bear in mind this affected me on 4.2.1 and you?re using 4.2.3 so ymmv..

?
I spoke with development and normally these files would be copied over
to /opt/IBM/zimon when using the automatic installer but since this case
doesn't use the installer we have to copy them over manually. We
acknowledge this should be in the docs, and the reason it is not
included in pmsensors rpm is due to the fact these do not come from the
zimon team.

The following files can be copied over to /opt/IBM/zimon

[root at node1 default]# pwd
/usr/lpp/mmfs/4.2.1.0/installer/cookbooks/zimon_on_gpfs/files/default

[root at node1 default]# ls
CTDBDBStats.cfg  CTDBStats.cfg  NFSIO.cfg  SMBGlobalStats.cfg
SMBSensors.cfg  SMBStats.cfg  ZIMonCollector.cfg
?

Richard

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush
Sent: 25 April 2017 14:28
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Perfmon and GUI

Anyone know why in the GUI when I go to look at things like nodes and select a protocol node and then pick NFS or SMB why it has the boxes where a graph is supposed to be and it has a Red circled X and says ?Performance collector did not return any data??
I?ve added the things from the link into my protocol Nodes /opt/IBM/zimon/ZIMonSensors.cfg file https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_configuringthePMT.htm

Also restarted both pmsensors and pmcollector on the nodes.  What am I missing?  Here?s my ZIMonSensors.cfg file

[root at n3 zimon]# cat ZIMonSensors.cfg
cephMon = "/opt/IBM/zimon/CephMonProxy"
cephRados = "/opt/IBM/zimon/CephRadosProxy"
colCandidates = "n1"
colRedundancy = 1
collectors = {
        host = "n1"
        port = "4739"
}
config = "/opt/IBM/zimon/ZIMonSensors.cfg"
ctdbstat = ""
daemonize = T
hostname = ""
ipfixinterface = "0.0.0.0"
logfile = "/var/log/zimon/ZIMonSensors.log"
loglevel = "info"
mmcmd = "/opt/IBM/zimon/MMCmdProxy"
mmdfcmd = "/opt/IBM/zimon/MMDFProxy"
mmpmon = "/opt/IBM/zimon/MmpmonSockProxy"
piddir = "/var/run"
release = "4.2.3-0"
sensors = {
        name = "CPU"
        period = 1
},
{
        name = "Load"
        period = 1
},
{
        name = "Memory"
        period = 1
},
{
        name = "Network"
        period = 1
},
{
        name = "Netstat"
        period = 10
},
{
        name = "Diskstat"
        period = 0
},
{
        name = "DiskFree"
        period = 600
},
{
        name = "GPFSDisk"
        period = 0
},
{
        name = "GPFSFilesystem"
        period = 1
},
{
        name = "GPFSNSDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSPoolIO"
        period = 0
},
{
        name = "GPFSVFS"
        period = 1
},
{
        name = "GPFSIOC"
        period = 0
},
{
        name = "GPFSVIO"
        period = 0
},
{
        name = "GPFSPDDisk"
        period = 0
        restrict = "nsdNodes"
},
{
        name = "GPFSvFLUSH"
        period = 0
},
{
        name = "GPFSNode"
        period = 1
},
{
        name = "GPFSNodeAPI"
        period = 1
},
{
        name = "GPFSFilesystemAPI"
        period = 1
},
{
        name = "GPFSLROC"
        period = 0
},
{
        name = "GPFSCHMS"
        period = 0
},
{
        name = "GPFSAFM"
        period = 0
},
{
        name = "GPFSAFMFS"
        period = 0
},
{
        name = "GPFSAFMFSET"
        period = 0
},
{
        name = "GPFSRPCS"
        period = 10
},
{
        name = "GPFSWaiters"
        period = 10
},
{
        name = "GPFSFilesetQuota"
        period = 3600
},
{
        name = "GPFSDiskCap"
        period = 0
},
{
        name = "GPFSFileset"
        period = 0
        restrict = "n1"
},
{
        name = "GPFSPool"
        period = 0
        restrict = "n1"
},
{
        name = "Infiniband"
        period = 0
},
{
        name = "CTDBDBStats"
        period = 1
        type = "Generic"
},
{
        name = "CTDBStats"
        period = 1
        type = "Generic"
},
{
        name = "NFSIO"
        period = 1
        type = "Generic"
},
{
        name = "SMBGlobalStats"
        period = 1
        type = "Generic"
},
{
        name = "SMBStats"
        period = 1
        type = "Generic"
}
smbstat = ""


This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170426/d03c4f4f/attachment.htm>

From S.J.Thompson at bham.ac.uk  Wed Apr 26 15:20:30 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Wed, 26 Apr 2017 14:20:30 +0000
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <540d5b070cc8438ebe73df14a1ab619b@exch1-cdc.nexus.csiro.au>
References: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>
	<CAHwPatjOnYjn3igycRosUABF4xHxPYckWLP0SHaC7sRLs2T0Aw@mail.gmail.com>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DEF87@EX13.adf.bham.ac.uk>
	<540d5b070cc8438ebe73df14a1ab619b@exch1-cdc.nexus.csiro.au>
Message-ID: <D52669CF.3AD6C%s.j.thompson@bham.ac.uk>

Nope, the clients are all L3 connected, so not an arp issue.

Two things we have observed:

1. It triggers when one of the CES IPs moves and quickly moves back again.
The move occurs because the NFS server goes into grace:

2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
GRACE, duration 60
2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
recovery event 2 nodeid -1 ip <CESIP>
2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4
recovery release ip <CESIP>
2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
2017-04-25 20:37:42 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
GRACE, duration 60
2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
GRACE, duration 60
2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
recovery event 4 nodeid 2 ip


We can't see in any of the logs WHY ganesha is going into grace. Any
suggestions on how to debug this further? (I.e. If we can stop the grace
issues, we can solve the problem mostly).


2. Our clients are using LDAP which is bound to the CES IPs. If we
shutdown nslcd on the client we can get the client to recover once all the
TIME_WAIT connections have gone. Maybe this was a bad choice on our side
to bind to the CES IPs - we figured it would handily move the IPs for us,
but I guess the mmcesfuncs isn't aware of this and so doesn't kill the
connections to the IP as it goes away.


So two approaches we are going to try. Reconfigure the nslcd on a couple
of clients and see if they still show up the issues when fail-over occurs.
Second is to work out why the NFS servers are going into grace in the
first place.

Simon

On 26/04/2017, 00:46, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of Greg.Lehmann at csiro.au" <gpfsug-discuss-bounces at spectrumscale.org on
behalf of Greg.Lehmann at csiro.au> wrote:

>Are you using infiniband or Ethernet? I'm wondering if IBM have solved
>the gratuitous arp issue which we see with our non-protocols NFS
>implementation.
>
>-----Original Message-----
>From: gpfsug-discuss-bounces at spectrumscale.org
>[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon
>Thompson (IT Research Support)
>Sent: Wednesday, 26 April 2017 3:31 AM
>To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>Subject: Re: [gpfsug-discuss] NFS issues
>
>I did some digging in the mmcesfuncs to see what happens server side on
>fail over.
>
>Basically the server losing the IP is supposed to terminate all sessions
>and the receiver server sends ACK tickles.
>
>My current supposition is that for whatever reason, the losing server
>isn't releasing something and the client still has hold of a connection
>which is mostly dead. The tickle then fails to the client from the new
>server.
>
>This would explain why failing the IP back to the original server usually
>brings the client back to life.
>
>This is only my working theory at the moment as we can't reliably
>reproduce this. Next time it happens we plan to grab some netstat from
>each side. 
>
>Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the
>server that received the IP and see if that fixes it (i.e. the receiver
>server didn't tickle properly). (Usage extracted from mmcesfuncs which is
>ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd)
>for anyone interested.
>
>Then try and kill he sessions on the losing server to check if there is
>stuff still open and re-tickle the client.
>
>If we can get steps to workaround, I'll log a PMR. I suppose I could do
>that now, but given its non deterministic and we want to be 100% sure
>it's not us doing something wrong, I'm inclined to wait until we do some
>more testing.
>
>I agree with the suggestion that it's probably IO pending nodes that are
>affected, but don't have any data to back that up yet. We did try with a
>read workload on a client, but may we need either long IO blocked reads
>or writes (from the GPFS end).
>
>We also originally had soft as the default option, but saw issues then
>and the docs suggested hard, so we switched and also enabled sync (we
>figured maybe it was NFS client with uncommited writes), but neither have
>resolved the issues entirely. Difficult for me to say if they improved
>the issue though given its sporadic.
>
>Appreciate people's suggestions!
>
>Thanks
>
>Simon
>________________________________________
>From: gpfsug-discuss-bounces at spectrumscale.org
>[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode
>Myklebust [janfrode at tanso.net]
>Sent: 25 April 2017 18:04
>To: gpfsug main discussion list
>Subject: Re: [gpfsug-discuss] NFS issues
>
>I *think* I've seen this, and that we then had open TCP connection from
>client to NFS server according to netstat, but these connections were not
>visible from netstat on NFS-server side.
>
>Unfortunately I don't remember what the fix was..
>
>
>
>  -jf
>
>tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support)
><S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:
>Hi,
>
>From what I can see, Ganesha uses the Export_Id option in the config file
>(which is managed by CES) for this. I did find some reference in the
>Ganesha devs list that if its not set, then it would read the FSID from
>the GPFS file-system, either way they should surely be consistent across
>all the nodes. The posts I found were from someone with an IBM email
>address, so I guess someone in the IBM teams.
>
>I checked a couple of my protocol nodes and they use the same Export_Id
>consistently, though I guess that might not be the same as the FSID value.
>
>Perhaps someone from IBM could comment on if FSID is likely to the cause
>of my problems?
>
>Thanks
>
>Simon
>
>On 25/04/2017, 14:51,
>"gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>ectrumscale.org> on behalf of Ouwehand, JJ"
><gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>ectrumscale.org> on behalf of
>j.ouwehand at vumc.nl<mailto:j.ouwehand at vumc.nl>> wrote:
>
>>Hello,
>>
>>At first a short introduction. My name is Jaap Jan Ouwehand, I work at
>>a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of
>>IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our
>>critical (office, research and clinical data) business process. We have
>>three large GPFS filesystems for different purposes.
>>
>>We also had such a situation with cNFS. A failover (IPtakeover) was
>>technically good, only clients experienced "stale filehandles". We
>>opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few
>>months later, the solution appeared to be in the fsid option.
>>
>>An NFS filehandle is built by a combination of fsid and a hash function
>>on the inode. After a failover, the fsid value can be different and the
>>client has a "stale filehandle". To avoid this, the fsid value can be
>>statically specified. See:
>>
>>https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum
>>.
>>scale.v4r22.doc/bl1adm_nfslin.htm
>>
>>Maybe there is also a value in Ganesha that changes after a failover.
>>Certainly since most sessions will be re-established after a failback.
>>Maybe you see more debug information with tcpdump.
>>
>>
>>Kind regards,
>>
>>Jaap Jan Ouwehand
>>ICT Specialist (Storage & Linux)
>>VUmc - ICT
>>E: jj.ouwehand at vumc.nl<mailto:jj.ouwehand at vumc.nl>
>>W: www.vumc.com<http://www.vumc.com>
>>
>>
>>
>>-----Oorspronkelijk bericht-----
>>Van: 
>>gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces@
>>spectrumscale.org>
>>[mailto:gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-
>>bounces at spectrumscale.org>] Namens Simon Thompson (IT Research Support)
>>Verzonden: dinsdag 25 april 2017 13:21
>>Aan: 
>>gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.or
>>g>
>>Onderwerp: [gpfsug-discuss] NFS issues
>>
>>Hi,
>>
>>We have recently started deploying NFS in addition our existing SMB
>>exports on our protocol nodes.
>>
>>We use a RR DNS name that points to 4 VIPs for SMB services and
>>failover seems to work fine with SMB clients. We figured we could use
>>the same name and IPs and run Ganesha on the protocol servers, however
>>we are seeing issues with NFS clients when IP failover occurs.
>>
>>In normal operation on a client, we might see several mounts from
>>different IPs obviously due to the way the DNS RR is working, but it
>>all works fine.
>>
>>In a failover situation, the IP will move to another node and some
>>clients will carry on, others will hang IO to the mount points referred
>>to by the IP which has moved. We can *sometimes* trigger this by
>>manually suspending a CES node, but not always and some clients
>>mounting from the IP moving will be fine, others won't.
>>
>>If we resume a node an it fails back, the clients that are hanging will
>>usually recover fine. We can reboot a client prior to failback and it
>>will be fine, stopping and starting the ganesha service on a protocol
>>node will also sometimes resolve the issues.
>>
>>So, has anyone seen this sort of issue and any suggestions for how we
>>could either debug more or workaround?
>>
>>We are currently running the packages
>>nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
>>
>>At one point we were seeing it a lot, and could track it back to an
>>underlying GPFS network issue that was causing protocol nodes to be
>>expelled occasionally, we resolved that and the issues became less
>>apparent, but maybe we just fixed one failure mode so see it less often.
>>
>>On the clients, we use -o sync,hard BTW as in the IBM docs.
>>
>>On a client showing the issues, we'll see in dmesg, NFS related
>>messages
>>like:
>>[Wed Apr 12 16:59:53 2017] nfs: server
>>MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not responding,
>>timed out
>>
>>Which explains the client hang on certain mount points.
>>
>>The symptoms feel very much like those logged in this Gluster/ganesha
>>bug:
>>https://bugzilla.redhat.com/show_bug.cgi?id=1354439
>>
>>
>>Thanks
>>
>>Simon
>>
>>_______________________________________________
>>gpfsug-discuss mailing list
>>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>_______________________________________________
>>gpfsug-discuss mailing list
>>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From janfrode at tanso.net  Wed Apr 26 15:27:03 2017
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Wed, 26 Apr 2017 14:27:03 +0000
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <D52669CF.3AD6C%s.j.thompson@bham.ac.uk>
References: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>
	<CAHwPatjOnYjn3igycRosUABF4xHxPYckWLP0SHaC7sRLs2T0Aw@mail.gmail.com>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DEF87@EX13.adf.bham.ac.uk>
	<540d5b070cc8438ebe73df14a1ab619b@exch1-cdc.nexus.csiro.au>
	<D52669CF.3AD6C%s.j.thompson@bham.ac.uk>
Message-ID: <CAHwPatjn70-W6aLaWqXVHNcVNBcpUxN_+vgbuZEFqTRHjbKfkw@mail.gmail.com>

Would it help to lower the grace time?

mmnfs configuration change LEASE_LIFETIME=10
mmnfs configuration change GRACE_PERIOD=10


-jf
ons. 26. apr. 2017 kl. 16.20 skrev Simon Thompson (IT Research Support) <
S.J.Thompson at bham.ac.uk>:

> Nope, the clients are all L3 connected, so not an arp issue.
>
> Two things we have observed:
>
> 1. It triggers when one of the CES IPs moves and quickly moves back again.
> The move occurs because the NFS server goes into grace:
>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 2 nodeid -1 ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4
> recovery release ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
> 2017-04-25 20:37:42 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 4 nodeid 2 ip
>
>
>
> We can't see in any of the logs WHY ganesha is going into grace. Any
> suggestions on how to debug this further? (I.e. If we can stop the grace
> issues, we can solve the problem mostly).
>
>
> 2. Our clients are using LDAP which is bound to the CES IPs. If we
> shutdown nslcd on the client we can get the client to recover once all the
> TIME_WAIT connections have gone. Maybe this was a bad choice on our side
> to bind to the CES IPs - we figured it would handily move the IPs for us,
> but I guess the mmcesfuncs isn't aware of this and so doesn't kill the
> connections to the IP as it goes away.
>
>
> So two approaches we are going to try. Reconfigure the nslcd on a couple
> of clients and see if they still show up the issues when fail-over occurs.
> Second is to work out why the NFS servers are going into grace in the
> first place.
>
> Simon
>
> On 26/04/2017, 00:46, "gpfsug-discuss-bounces at spectrumscale.org on behalf
> of Greg.Lehmann at csiro.au" <gpfsug-discuss-bounces at spectrumscale.org on
> behalf of Greg.Lehmann at csiro.au> wrote:
>
> >Are you using infiniband or Ethernet? I'm wondering if IBM have solved
> >the gratuitous arp issue which we see with our non-protocols NFS
> >implementation.
> >
> >-----Original Message-----
> >From: gpfsug-discuss-bounces at spectrumscale.org
> >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon
> >Thompson (IT Research Support)
> >Sent: Wednesday, 26 April 2017 3:31 AM
> >To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> >Subject: Re: [gpfsug-discuss] NFS issues
> >
> >I did some digging in the mmcesfuncs to see what happens server side on
> >fail over.
> >
> >Basically the server losing the IP is supposed to terminate all sessions
> >and the receiver server sends ACK tickles.
> >
> >My current supposition is that for whatever reason, the losing server
> >isn't releasing something and the client still has hold of a connection
> >which is mostly dead. The tickle then fails to the client from the new
> >server.
> >
> >This would explain why failing the IP back to the original server usually
> >brings the client back to life.
> >
> >This is only my working theory at the moment as we can't reliably
> >reproduce this. Next time it happens we plan to grab some netstat from
> >each side.
> >
> >Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the
> >server that received the IP and see if that fixes it (i.e. the receiver
> >server didn't tickle properly). (Usage extracted from mmcesfuncs which is
> >ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd)
> >for anyone interested.
> >
> >Then try and kill he sessions on the losing server to check if there is
> >stuff still open and re-tickle the client.
> >
> >If we can get steps to workaround, I'll log a PMR. I suppose I could do
> >that now, but given its non deterministic and we want to be 100% sure
> >it's not us doing something wrong, I'm inclined to wait until we do some
> >more testing.
> >
> >I agree with the suggestion that it's probably IO pending nodes that are
> >affected, but don't have any data to back that up yet. We did try with a
> >read workload on a client, but may we need either long IO blocked reads
> >or writes (from the GPFS end).
> >
> >We also originally had soft as the default option, but saw issues then
> >and the docs suggested hard, so we switched and also enabled sync (we
> >figured maybe it was NFS client with uncommited writes), but neither have
> >resolved the issues entirely. Difficult for me to say if they improved
> >the issue though given its sporadic.
> >
> >Appreciate people's suggestions!
> >
> >Thanks
> >
> >Simon
> >________________________________________
> >From: gpfsug-discuss-bounces at spectrumscale.org
> >[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode
> >Myklebust [janfrode at tanso.net]
> >Sent: 25 April 2017 18:04
> >To: gpfsug main discussion list
> >Subject: Re: [gpfsug-discuss] NFS issues
> >
> >I *think* I've seen this, and that we then had open TCP connection from
> >client to NFS server according to netstat, but these connections were not
> >visible from netstat on NFS-server side.
> >
> >Unfortunately I don't remember what the fix was..
> >
> >
> >
> >  -jf
> >
> >tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support)
> ><S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:
> >Hi,
> >
> >From what I can see, Ganesha uses the Export_Id option in the config file
> >(which is managed by CES) for this. I did find some reference in the
> >Ganesha devs list that if its not set, then it would read the FSID from
> >the GPFS file-system, either way they should surely be consistent across
> >all the nodes. The posts I found were from someone with an IBM email
> >address, so I guess someone in the IBM teams.
> >
> >I checked a couple of my protocol nodes and they use the same Export_Id
> >consistently, though I guess that might not be the same as the FSID value.
> >
> >Perhaps someone from IBM could comment on if FSID is likely to the cause
> >of my problems?
> >
> >Thanks
> >
> >Simon
> >
> >On 25/04/2017, 14:51,
> >"gpfsug-discuss-bounces at spectrumscale.org<mailto:
> gpfsug-discuss-bounces at sp
> >ectrumscale.org> on behalf of Ouwehand, JJ"
> ><gpfsug-discuss-bounces at spectrumscale.org<mailto:
> gpfsug-discuss-bounces at sp
> >ectrumscale.org> on behalf of
> >j.ouwehand at vumc.nl<mailto:j.ouwehand at vumc.nl>> wrote:
> >
> >>Hello,
> >>
> >>At first a short introduction. My name is Jaap Jan Ouwehand, I work at
> >>a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of
> >>IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our
> >>critical (office, research and clinical data) business process. We have
> >>three large GPFS filesystems for different purposes.
> >>
> >>We also had such a situation with cNFS. A failover (IPtakeover) was
> >>technically good, only clients experienced "stale filehandles". We
> >>opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few
> >>months later, the solution appeared to be in the fsid option.
> >>
> >>An NFS filehandle is built by a combination of fsid and a hash function
> >>on the inode. After a failover, the fsid value can be different and the
> >>client has a "stale filehandle". To avoid this, the fsid value can be
> >>statically specified. See:
> >>
> >>
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum
> >>.
> >>scale.v4r22.doc/bl1adm_nfslin.htm
> >>
> >>Maybe there is also a value in Ganesha that changes after a failover.
> >>Certainly since most sessions will be re-established after a failback.
> >>Maybe you see more debug information with tcpdump.
> >>
> >>
> >>Kind regards,
> >>
> >>Jaap Jan Ouwehand
> >>ICT Specialist (Storage & Linux)
> >>VUmc - ICT
> >>E: jj.ouwehand at vumc.nl<mailto:jj.ouwehand at vumc.nl>
> >>W: www.vumc.com<http://www.vumc.com>
> >>
> >>
> >>
> >>-----Oorspronkelijk bericht-----
> >>Van:
> >>gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces@
> >>spectrumscale.org>
> >>[mailto:gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-
> >>bounces at spectrumscale.org>] Namens Simon Thompson (IT Research Support)
> >>Verzonden: dinsdag 25 april 2017 13:21
> >>Aan:
> >>gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.or
> >>g>
> >>Onderwerp: [gpfsug-discuss] NFS issues
> >>
> >>Hi,
> >>
> >>We have recently started deploying NFS in addition our existing SMB
> >>exports on our protocol nodes.
> >>
> >>We use a RR DNS name that points to 4 VIPs for SMB services and
> >>failover seems to work fine with SMB clients. We figured we could use
> >>the same name and IPs and run Ganesha on the protocol servers, however
> >>we are seeing issues with NFS clients when IP failover occurs.
> >>
> >>In normal operation on a client, we might see several mounts from
> >>different IPs obviously due to the way the DNS RR is working, but it
> >>all works fine.
> >>
> >>In a failover situation, the IP will move to another node and some
> >>clients will carry on, others will hang IO to the mount points referred
> >>to by the IP which has moved. We can *sometimes* trigger this by
> >>manually suspending a CES node, but not always and some clients
> >>mounting from the IP moving will be fine, others won't.
> >>
> >>If we resume a node an it fails back, the clients that are hanging will
> >>usually recover fine. We can reboot a client prior to failback and it
> >>will be fine, stopping and starting the ganesha service on a protocol
> >>node will also sometimes resolve the issues.
> >>
> >>So, has anyone seen this sort of issue and any suggestions for how we
> >>could either debug more or workaround?
> >>
> >>We are currently running the packages
> >>nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
> >>
> >>At one point we were seeing it a lot, and could track it back to an
> >>underlying GPFS network issue that was causing protocol nodes to be
> >>expelled occasionally, we resolved that and the issues became less
> >>apparent, but maybe we just fixed one failure mode so see it less often.
> >>
> >>On the clients, we use -o sync,hard BTW as in the IBM docs.
> >>
> >>On a client showing the issues, we'll see in dmesg, NFS related
> >>messages
> >>like:
> >>[Wed Apr 12 16:59:53 2017] nfs: server
> >>MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not responding,
> >>timed out
> >>
> >>Which explains the client hang on certain mount points.
> >>
> >>The symptoms feel very much like those logged in this Gluster/ganesha
> >>bug:
> >>https://bugzilla.redhat.com/show_bug.cgi?id=1354439
> >>
> >>
> >>Thanks
> >>
> >>Simon
> >>
> >>_______________________________________________
> >>gpfsug-discuss mailing list
> >>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> >>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>_______________________________________________
> >>gpfsug-discuss mailing list
> >>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> >>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> >http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org
> >http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org
> >http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170426/741e63c4/attachment.htm>

From peserocka at gmail.com  Wed Apr 26 18:53:51 2017
From: peserocka at gmail.com (Peter Serocka)
Date: Wed, 26 Apr 2017 19:53:51 +0200
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <D52669CF.3AD6C%s.j.thompson@bham.ac.uk>
References: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>
	<CAHwPatjOnYjn3igycRosUABF4xHxPYckWLP0SHaC7sRLs2T0Aw@mail.gmail.com>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DEF87@EX13.adf.bham.ac.uk>
	<540d5b070cc8438ebe73df14a1ab619b@exch1-cdc.nexus.csiro.au>
	<D52669CF.3AD6C%s.j.thompson@bham.ac.uk>
Message-ID: <B9FF3BF0-0E4D-4849-AF55-AB92C413BC11@gmail.com>


> On 2017 Apr 26 Wed, at 16:20, Simon Thompson (IT Research Support) <S.J.Thompson at bham.ac.uk> wrote:
> 
> Nope, the clients are all L3 connected, so not an arp issue.


...not on the client, but the server-facing L3 switch
still need to manage its ARP table, and might miss
the IP moving to a new MAC. 

Cisco switches have  a default ARP cache timeout of 4 hours, fwiw.

Can your network team provide you the ARP status
from the switch when you see a fail-over being stuck?

? Peter


> 
> Two things we have observed:
> 
> 1. It triggers when one of the CES IPs moves and quickly moves back again.
> The move occurs because the NFS server goes into grace:
> 
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 2 nodeid -1 ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4
> recovery release ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
> 2017-04-25 20:37:42 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 4 nodeid 2 ip
> 
> 
> 
> We can't see in any of the logs WHY ganesha is going into grace. Any
> suggestions on how to debug this further? (I.e. If we can stop the grace
> issues, we can solve the problem mostly).
> 
> 
> 2. Our clients are using LDAP which is bound to the CES IPs. If we
> shutdown nslcd on the client we can get the client to recover once all the
> TIME_WAIT connections have gone. Maybe this was a bad choice on our side
> to bind to the CES IPs - we figured it would handily move the IPs for us,
> but I guess the mmcesfuncs isn't aware of this and so doesn't kill the
> connections to the IP as it goes away.
> 
> 
> So two approaches we are going to try. Reconfigure the nslcd on a couple
> of clients and see if they still show up the issues when fail-over occurs.
> Second is to work out why the NFS servers are going into grace in the
> first place.
> 
> Simon
> 
> On 26/04/2017, 00:46, "gpfsug-discuss-bounces at spectrumscale.org on behalf
> of Greg.Lehmann at csiro.au" <gpfsug-discuss-bounces at spectrumscale.org on
> behalf of Greg.Lehmann at csiro.au> wrote:
> 
>> Are you using infiniband or Ethernet? I'm wondering if IBM have solved
>> the gratuitous arp issue which we see with our non-protocols NFS
>> implementation.
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at spectrumscale.org
>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon
>> Thompson (IT Research Support)
>> Sent: Wednesday, 26 April 2017 3:31 AM
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Subject: Re: [gpfsug-discuss] NFS issues
>> 
>> I did some digging in the mmcesfuncs to see what happens server side on
>> fail over.
>> 
>> Basically the server losing the IP is supposed to terminate all sessions
>> and the receiver server sends ACK tickles.
>> 
>> My current supposition is that for whatever reason, the losing server
>> isn't releasing something and the client still has hold of a connection
>> which is mostly dead. The tickle then fails to the client from the new
>> server.
>> 
>> This would explain why failing the IP back to the original server usually
>> brings the client back to life.
>> 
>> This is only my working theory at the moment as we can't reliably
>> reproduce this. Next time it happens we plan to grab some netstat from
>> each side. 
>> 
>> Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the
>> server that received the IP and see if that fixes it (i.e. the receiver
>> server didn't tickle properly). (Usage extracted from mmcesfuncs which is
>> ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd)
>> for anyone interested.
>> 
>> Then try and kill he sessions on the losing server to check if there is
>> stuff still open and re-tickle the client.
>> 
>> If we can get steps to workaround, I'll log a PMR. I suppose I could do
>> that now, but given its non deterministic and we want to be 100% sure
>> it's not us doing something wrong, I'm inclined to wait until we do some
>> more testing.
>> 
>> I agree with the suggestion that it's probably IO pending nodes that are
>> affected, but don't have any data to back that up yet. We did try with a
>> read workload on a client, but may we need either long IO blocked reads
>> or writes (from the GPFS end).
>> 
>> We also originally had soft as the default option, but saw issues then
>> and the docs suggested hard, so we switched and also enabled sync (we
>> figured maybe it was NFS client with uncommited writes), but neither have
>> resolved the issues entirely. Difficult for me to say if they improved
>> the issue though given its sporadic.
>> 
>> Appreciate people's suggestions!
>> 
>> Thanks
>> 
>> Simon
>> ________________________________________
>> From: gpfsug-discuss-bounces at spectrumscale.org
>> [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode
>> Myklebust [janfrode at tanso.net]
>> Sent: 25 April 2017 18:04
>> To: gpfsug main discussion list
>> Subject: Re: [gpfsug-discuss] NFS issues
>> 
>> I *think* I've seen this, and that we then had open TCP connection from
>> client to NFS server according to netstat, but these connections were not
>> visible from netstat on NFS-server side.
>> 
>> Unfortunately I don't remember what the fix was..
>> 
>> 
>> 
>> -jf
>> 
>> tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support)
>> <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:
>> Hi,
>> 
>> From what I can see, Ganesha uses the Export_Id option in the config file
>> (which is managed by CES) for this. I did find some reference in the
>> Ganesha devs list that if its not set, then it would read the FSID from
>> the GPFS file-system, either way they should surely be consistent across
>> all the nodes. The posts I found were from someone with an IBM email
>> address, so I guess someone in the IBM teams.
>> 
>> I checked a couple of my protocol nodes and they use the same Export_Id
>> consistently, though I guess that might not be the same as the FSID value.
>> 
>> Perhaps someone from IBM could comment on if FSID is likely to the cause
>> of my problems?
>> 
>> Thanks
>> 
>> Simon
>> 
>> On 25/04/2017, 14:51,
>> "gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>> ectrumscale.org> on behalf of Ouwehand, JJ"
>> <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>> ectrumscale.org> on behalf of
>> j.ouwehand at vumc.nl<mailto:j.ouwehand at vumc.nl>> wrote:
>> 
>>> Hello,
>>> 
>>> At first a short introduction. My name is Jaap Jan Ouwehand, I work at
>>> a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of
>>> IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our
>>> critical (office, research and clinical data) business process. We have
>>> three large GPFS filesystems for different purposes.
>>> 
>>> We also had such a situation with cNFS. A failover (IPtakeover) was
>>> technically good, only clients experienced "stale filehandles". We
>>> opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few
>>> months later, the solution appeared to be in the fsid option.
>>> 
>>> An NFS filehandle is built by a combination of fsid and a hash function
>>> on the inode. After a failover, the fsid value can be different and the
>>> client has a "stale filehandle". To avoid this, the fsid value can be
>>> statically specified. See:
>>> 
>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum
>>> .
>>> scale.v4r22.doc/bl1adm_nfslin.htm
>>> 
>>> Maybe there is also a value in Ganesha that changes after a failover.
>>> Certainly since most sessions will be re-established after a failback.
>>> Maybe you see more debug information with tcpdump.
>>> 
>>> 
>>> Kind regards,
>>> 
>>> Jaap Jan Ouwehand
>>> ICT Specialist (Storage & Linux)
>>> VUmc - ICT
>>> E: jj.ouwehand at vumc.nl<mailto:jj.ouwehand at vumc.nl>
>>> W: www.vumc.com<http://www.vumc.com>
>>> 
>>> 
>>> 
>>> -----Oorspronkelijk bericht-----
>>> Van: 
>>> gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces@
>>> spectrumscale.org>
>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-
>>> bounces at spectrumscale.org>] Namens Simon Thompson (IT Research Support)
>>> Verzonden: dinsdag 25 april 2017 13:21
>>> Aan: 
>>> gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.or
>>> g>
>>> Onderwerp: [gpfsug-discuss] NFS issues
>>> 
>>> Hi,
>>> 
>>> We have recently started deploying NFS in addition our existing SMB
>>> exports on our protocol nodes.
>>> 
>>> We use a RR DNS name that points to 4 VIPs for SMB services and
>>> failover seems to work fine with SMB clients. We figured we could use
>>> the same name and IPs and run Ganesha on the protocol servers, however
>>> we are seeing issues with NFS clients when IP failover occurs.
>>> 
>>> In normal operation on a client, we might see several mounts from
>>> different IPs obviously due to the way the DNS RR is working, but it
>>> all works fine.
>>> 
>>> In a failover situation, the IP will move to another node and some
>>> clients will carry on, others will hang IO to the mount points referred
>>> to by the IP which has moved. We can *sometimes* trigger this by
>>> manually suspending a CES node, but not always and some clients
>>> mounting from the IP moving will be fine, others won't.
>>> 
>>> If we resume a node an it fails back, the clients that are hanging will
>>> usually recover fine. We can reboot a client prior to failback and it
>>> will be fine, stopping and starting the ganesha service on a protocol
>>> node will also sometimes resolve the issues.
>>> 
>>> So, has anyone seen this sort of issue and any suggestions for how we
>>> could either debug more or workaround?
>>> 
>>> We are currently running the packages
>>> nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
>>> 
>>> At one point we were seeing it a lot, and could track it back to an
>>> underlying GPFS network issue that was causing protocol nodes to be
>>> expelled occasionally, we resolved that and the issues became less
>>> apparent, but maybe we just fixed one failure mode so see it less often.
>>> 
>>> On the clients, we use -o sync,hard BTW as in the IBM docs.
>>> 
>>> On a client showing the issues, we'll see in dmesg, NFS related
>>> messages
>>> like:
>>> [Wed Apr 12 16:59:53 2017] nfs: server
>>> MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not responding,
>>> timed out
>>> 
>>> Which explains the client hang on certain mount points.
>>> 
>>> The symptoms feel very much like those logged in this Gluster/ganesha
>>> bug:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1354439
>>> 
>>> 
>>> Thanks
>>> 
>>> Simon
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From S.J.Thompson at bham.ac.uk  Wed Apr 26 19:00:06 2017
From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support))
Date: Wed, 26 Apr 2017 18:00:06 +0000
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <B9FF3BF0-0E4D-4849-AF55-AB92C413BC11@gmail.com>
References: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>
	<CAHwPatjOnYjn3igycRosUABF4xHxPYckWLP0SHaC7sRLs2T0Aw@mail.gmail.com>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DEF87@EX13.adf.bham.ac.uk>
	<540d5b070cc8438ebe73df14a1ab619b@exch1-cdc.nexus.csiro.au>
	<D52669CF.3AD6C%s.j.thompson@bham.ac.uk>,
	<B9FF3BF0-0E4D-4849-AF55-AB92C413BC11@gmail.com>
Message-ID: <CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DF4F2@EX13.adf.bham.ac.uk>

We have no issues with L3 SMB accessing clients, so I'm pretty sure it's not arp. And some of the boxes on the other side of the L3 gateway don't see the issues.

We don't use Cisco kit.

I posted in a different update that we think it's related to connections to other ports on the same IP which get left open when the IP quickly gets moved away and back again.

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Peter Serocka [peserocka at gmail.com]
Sent: 26 April 2017 18:53
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] NFS issues

> On 2017 Apr 26 Wed, at 16:20, Simon Thompson (IT Research Support) <S.J.Thompson at bham.ac.uk> wrote:
>
> Nope, the clients are all L3 connected, so not an arp issue.


...not on the client, but the server-facing L3 switch
still need to manage its ARP table, and might miss
the IP moving to a new MAC.

Cisco switches have  a default ARP cache timeout of 4 hours, fwiw.

Can your network team provide you the ARP status
from the switch when you see a fail-over being stuck?

? Peter


>
> Two things we have observed:
>
> 1. It triggers when one of the CES IPs moves and quickly moves back again.
> The move occurs because the NFS server goes into grace:
>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 2 nodeid -1 ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4
> recovery release ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
> 2017-04-25 20:37:42 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 4 nodeid 2 ip
>
>
>
> We can't see in any of the logs WHY ganesha is going into grace. Any
> suggestions on how to debug this further? (I.e. If we can stop the grace
> issues, we can solve the problem mostly).
>
>
> 2. Our clients are using LDAP which is bound to the CES IPs. If we
> shutdown nslcd on the client we can get the client to recover once all the
> TIME_WAIT connections have gone. Maybe this was a bad choice on our side
> to bind to the CES IPs - we figured it would handily move the IPs for us,
> but I guess the mmcesfuncs isn't aware of this and so doesn't kill the
> connections to the IP as it goes away.
>
>
> So two approaches we are going to try. Reconfigure the nslcd on a couple
> of clients and see if they still show up the issues when fail-over occurs.
> Second is to work out why the NFS servers are going into grace in the
> first place.
>
> Simon
>
> On 26/04/2017, 00:46, "gpfsug-discuss-bounces at spectrumscale.org on behalf
> of Greg.Lehmann at csiro.au" <gpfsug-discuss-bounces at spectrumscale.org on
> behalf of Greg.Lehmann at csiro.au> wrote:
>
>> Are you using infiniband or Ethernet? I'm wondering if IBM have solved
>> the gratuitous arp issue which we see with our non-protocols NFS
>> implementation.
>>
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at spectrumscale.org
>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon
>> Thompson (IT Research Support)
>> Sent: Wednesday, 26 April 2017 3:31 AM
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Subject: Re: [gpfsug-discuss] NFS issues
>>
>> I did some digging in the mmcesfuncs to see what happens server side on
>> fail over.
>>
>> Basically the server losing the IP is supposed to terminate all sessions
>> and the receiver server sends ACK tickles.
>>
>> My current supposition is that for whatever reason, the losing server
>> isn't releasing something and the client still has hold of a connection
>> which is mostly dead. The tickle then fails to the client from the new
>> server.
>>
>> This would explain why failing the IP back to the original server usually
>> brings the client back to life.
>>
>> This is only my working theory at the moment as we can't reliably
>> reproduce this. Next time it happens we plan to grab some netstat from
>> each side.
>>
>> Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the
>> server that received the IP and see if that fixes it (i.e. the receiver
>> server didn't tickle properly). (Usage extracted from mmcesfuncs which is
>> ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd)
>> for anyone interested.
>>
>> Then try and kill he sessions on the losing server to check if there is
>> stuff still open and re-tickle the client.
>>
>> If we can get steps to workaround, I'll log a PMR. I suppose I could do
>> that now, but given its non deterministic and we want to be 100% sure
>> it's not us doing something wrong, I'm inclined to wait until we do some
>> more testing.
>>
>> I agree with the suggestion that it's probably IO pending nodes that are
>> affected, but don't have any data to back that up yet. We did try with a
>> read workload on a client, but may we need either long IO blocked reads
>> or writes (from the GPFS end).
>>
>> We also originally had soft as the default option, but saw issues then
>> and the docs suggested hard, so we switched and also enabled sync (we
>> figured maybe it was NFS client with uncommited writes), but neither have
>> resolved the issues entirely. Difficult for me to say if they improved
>> the issue though given its sporadic.
>>
>> Appreciate people's suggestions!
>>
>> Thanks
>>
>> Simon
>> ________________________________________
>> From: gpfsug-discuss-bounces at spectrumscale.org
>> [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode
>> Myklebust [janfrode at tanso.net]
>> Sent: 25 April 2017 18:04
>> To: gpfsug main discussion list
>> Subject: Re: [gpfsug-discuss] NFS issues
>>
>> I *think* I've seen this, and that we then had open TCP connection from
>> client to NFS server according to netstat, but these connections were not
>> visible from netstat on NFS-server side.
>>
>> Unfortunately I don't remember what the fix was..
>>
>>
>>
>> -jf
>>
>> tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support)
>> <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:
>> Hi,
>>
>> From what I can see, Ganesha uses the Export_Id option in the config file
>> (which is managed by CES) for this. I did find some reference in the
>> Ganesha devs list that if its not set, then it would read the FSID from
>> the GPFS file-system, either way they should surely be consistent across
>> all the nodes. The posts I found were from someone with an IBM email
>> address, so I guess someone in the IBM teams.
>>
>> I checked a couple of my protocol nodes and they use the same Export_Id
>> consistently, though I guess that might not be the same as the FSID value.
>>
>> Perhaps someone from IBM could comment on if FSID is likely to the cause
>> of my problems?
>>
>> Thanks
>>
>> Simon
>>
>> On 25/04/2017, 14:51,
>> "gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>> ectrumscale.org> on behalf of Ouwehand, JJ"
>> <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>> ectrumscale.org> on behalf of
>> j.ouwehand at vumc.nl<mailto:j.ouwehand at vumc.nl>> wrote:
>>
>>> Hello,
>>>
>>> At first a short introduction. My name is Jaap Jan Ouwehand, I work at
>>> a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of
>>> IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our
>>> critical (office, research and clinical data) business process. We have
>>> three large GPFS filesystems for different purposes.
>>>
>>> We also had such a situation with cNFS. A failover (IPtakeover) was
>>> technically good, only clients experienced "stale filehandles". We
>>> opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few
>>> months later, the solution appeared to be in the fsid option.
>>>
>>> An NFS filehandle is built by a combination of fsid and a hash function
>>> on the inode. After a failover, the fsid value can be different and the
>>> client has a "stale filehandle". To avoid this, the fsid value can be
>>> statically specified. See:
>>>
>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum
>>> .
>>> scale.v4r22.doc/bl1adm_nfslin.htm
>>>
>>> Maybe there is also a value in Ganesha that changes after a failover.
>>> Certainly since most sessions will be re-established after a failback.
>>> Maybe you see more debug information with tcpdump.
>>>
>>>
>>> Kind regards,
>>>
>>> Jaap Jan Ouwehand
>>> ICT Specialist (Storage & Linux)
>>> VUmc - ICT
>>> E: jj.ouwehand at vumc.nl<mailto:jj.ouwehand at vumc.nl>
>>> W: www.vumc.com<http://www.vumc.com>
>>>
>>>
>>>
>>> -----Oorspronkelijk bericht-----
>>> Van:
>>> gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces@
>>> spectrumscale.org>
>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-
>>> bounces at spectrumscale.org>] Namens Simon Thompson (IT Research Support)
>>> Verzonden: dinsdag 25 april 2017 13:21
>>> Aan:
>>> gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.or
>>> g>
>>> Onderwerp: [gpfsug-discuss] NFS issues
>>>
>>> Hi,
>>>
>>> We have recently started deploying NFS in addition our existing SMB
>>> exports on our protocol nodes.
>>>
>>> We use a RR DNS name that points to 4 VIPs for SMB services and
>>> failover seems to work fine with SMB clients. We figured we could use
>>> the same name and IPs and run Ganesha on the protocol servers, however
>>> we are seeing issues with NFS clients when IP failover occurs.
>>>
>>> In normal operation on a client, we might see several mounts from
>>> different IPs obviously due to the way the DNS RR is working, but it
>>> all works fine.
>>>
>>> In a failover situation, the IP will move to another node and some
>>> clients will carry on, others will hang IO to the mount points referred
>>> to by the IP which has moved. We can *sometimes* trigger this by
>>> manually suspending a CES node, but not always and some clients
>>> mounting from the IP moving will be fine, others won't.
>>>
>>> If we resume a node an it fails back, the clients that are hanging will
>>> usually recover fine. We can reboot a client prior to failback and it
>>> will be fine, stopping and starting the ganesha service on a protocol
>>> node will also sometimes resolve the issues.
>>>
>>> So, has anyone seen this sort of issue and any suggestions for how we
>>> could either debug more or workaround?
>>>
>>> We are currently running the packages
>>> nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
>>>
>>> At one point we were seeing it a lot, and could track it back to an
>>> underlying GPFS network issue that was causing protocol nodes to be
>>> expelled occasionally, we resolved that and the issues became less
>>> apparent, but maybe we just fixed one failure mode so see it less often.
>>>
>>> On the clients, we use -o sync,hard BTW as in the IBM docs.
>>>
>>> On a client showing the issues, we'll see in dmesg, NFS related
>>> messages
>>> like:
>>> [Wed Apr 12 16:59:53 2017] nfs: server
>>> MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not responding,
>>> timed out
>>>
>>> Which explains the client hang on certain mount points.
>>>
>>> The symptoms feel very much like those logged in this Gluster/ganesha
>>> bug:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1354439
>>>
>>>
>>> Thanks
>>>
>>> Simon
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From valdis.kletnieks at vt.edu  Thu Apr 27 00:44:44 2017
From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu)
Date: Wed, 26 Apr 2017 19:44:44 -0400
Subject: [gpfsug-discuss] NFS issues
In-Reply-To: <D52669CF.3AD6C%s.j.thompson@bham.ac.uk>
References: <D52515C4.3AB56%s.j.thompson@bham.ac.uk>
	<CAHwPatjOnYjn3igycRosUABF4xHxPYckWLP0SHaC7sRLs2T0Aw@mail.gmail.com>
	<CF45EE16DEF2FE4B9AA7FF2B6EE26545F58DEF87@EX13.adf.bham.ac.uk>
	<540d5b070cc8438ebe73df14a1ab619b@exch1-cdc.nexus.csiro.au>
	<D52669CF.3AD6C%s.j.thompson@bham.ac.uk>
Message-ID: <52226.1493250284@turing-police.cc.vt.edu>

On Wed, 26 Apr 2017 14:20:30 -0000, "Simon Thompson (IT Research Support)" said:

> We can't see in any of the logs WHY ganesha is going into grace. Any
> suggestions on how to debug this further? (I.e. If we can stop the grace
> issues, we can solve the problem mostly).

After over 3 decades of experience with 'exportfs' being totally safe to
run in real time with both userspace and kernel NFSD implementations, it came
as quite a surprise when we did 'mmnfs eport change --nfsadd='...

and it bounced the NFS server on all 4 protocol nodes.  At the same time.

Fortunately for us, the set of client nodes only changes once every 2-3 months.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170426/c828a7a8/attachment.sig>

From secretary at gpfsug.org  Thu Apr 27 09:29:41 2017
From: secretary at gpfsug.org (Secretary GPFS UG)
Date: Thu, 27 Apr 2017 09:29:41 +0100
Subject: [gpfsug-discuss] Meet other spectrum scale users in May
Message-ID: <1f483faa9cb61dcdc80afb187e908745@webmail.gpfsug.org>

 
Dear Members, 

Please join us and other spectrum scale users for 2 days of great talks
and networking! 

WHEN: 9-10th May 2017 

WHERE: Macdonald Manchester Hotel & Spa, Manchester, UK (right by
Manchester Piccadilly train station) 

WHO? The event is free to attend, is open to members from all industries
and welcomes users with a little and a lot of experience using Spectrum
Scale. 

The SSUG brings together the Spectrum Scale User Community including
Spectrum Scale developers and architects to share knowledge, experiences
and future plans. 

Topics include transparent cloud tiering, AFM, automation and security
best practices, Docker and HDFS support, problem determination, and an
update on Elastic Storage Server (ESS). Our popular forum includes
interactive problem solving, a best practices discussion and networking.
We're very excited to welcome back Doris Conti the Director for Spectrum
Scale (GPFS) and HPC SW Product Development at IBM. 

The May meeting is sponsored by IBM, DDN, Lenovo, Mellanox, Seagate,
Arcastream, Ellexus, and OCF. 

It is an excellent opportunity to learn more and get your questions
answered. Register your place today at the Eventbrite page
https://goo.gl/tRptru [1] 

We hope to see you there! 
-- 

Claire O'Toole
Spectrum Scale/GPFS User Group Secretary
+44 (0)7508 033896
www.spectrumscaleug.org
 

Links:
------
[1] https://goo.gl/tRptru
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170427/25130e04/attachment.htm>

From robert at strubi.ox.ac.uk  Thu Apr 27 12:46:09 2017
From: robert at strubi.ox.ac.uk (Robert Esnouf)
Date: Thu, 27 Apr 2017 12:46:09 +0100 (BST)
Subject: [gpfsug-discuss] Two high-performance research computing posts in
 Oxford University Medical Sciences
Message-ID: <201704271146.061978@mail.strubi.ox.ac.uk>


Dear All, 

I hope that it is allowed to put job postings on this discussion list... sorry if I've broken a rule but it does mention SpectrumScale!

I'd like to advertise the availability two exciting and challenging new opportunities to work in research computing/high-performance computing at Oxford University within the Nuffield Department of Medicine. 

The first is a Grade 8 position to expand the current Research Computing Core team at the Wellcome Trust Centre for Human Genetics. The Core now runs a cluster of about ~3800 high-memory compute cores, a further ~700 cores outside the cluster, a (growing) smattering of GPU-enabled and KNL nodes, 4PB high-performance SpectrumScale (GPFS) storage and about 4PB of lower grade (mostly XFS) storage. The facility has an FDR InfiniBand fabric providing for access to storage at up to 20GB/s and supporting MPI workloads. We mainly support the statistical genetics work of the Centre and other departments around Oxford, the work of the sequencing and bioinformatics cores and electron microscopy, but the workload is varied and interesting! Further significant update and expansion of this facility will occur during 2017 and beyond and means that we are expanding the team. 

http://www.well.ox.ac.uk/home 

http://www.well.ox.ac.uk/research-8 

https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.display_form?p_company=10&p_internal_external=E&p_display_in_irish=N&p_process_type=&p_applicant_no=&p_form_profile_detail=&p_display_apply_ind=Y&p_refresh_search=Y&p_recruitment_id=126748

The second is a Grade 9 post at the newly opened Big Data Institute next door to the WTCHG - to work with me to establish a brand new Research Computing facility. The Big Data Institute Building has 32 shiny new racks ready to be filled with up to 320kW of IT load - and we won't stop there! The current plans envisage a virtualized infrastructure for secure access, a high-performance cluster supporting traditional workloads and containers, high-performance filesystem storage, a hyperconverged infrastructure supporting (OpenStack, project VMs, containers and distributed computing plaforms such as Apache Spark), a significant GPU-based artificial intelligence/deep learning platform and a large, multisite object store for managing research data in the long term. 

https://www.bdi.ox.ac.uk/ 

https://www.ndm.ox.ac.uk/current-job-vacancies/vacancy/128486-BDI-Research-Computing-Manager 

https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.display_form?p_company=10&p_internal_external=E&p_display_in_irish=N&p_process_type=&p_applicant_no=&p_form_profile_detail=&p_display_apply_ind=Y&p_refresh_search=Y&p_recruitment_id=128486

It is expected that the Wellcome Trust Centre and Big Data Institute facilities will develop independently for now, but in a complementary and supportive fashion given the overlap in science and technology that is likely to exist. The Research Computing support teams will therefore work extremely closely together to address the challenges facing computing in the medical sciences. 

If either (or both) of these vacancies seem interesting then please feel free to contact the Head of the Research Computing Core at the WTCHG (me) or the Director of Research Computing at the BDI (me). Deadline for the WTCHG post is 31st May and for the BDI post is 24th May. 

Please feel free to circulate this email to anyone who might be interested and apologies for any cross postings! 

Regards, 
Robert 

--

Dr Robert Esnouf

University Research Lecturer,
Director of Research Computing BDI,
Head of Research Computing Core WTCHG,
NDM Research Computing Strategy Officer

Main office:
Room 10/028, Wellcome Trust Centre for Human Genetics,
Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK

Emails:
robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk

Tel:   (+44) - 1865 - 287783