From jonathan.buzzard at strath.ac.uk  Mon Mar  1 07:58:43 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 07:58:43 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk>
	<CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
Message-ID: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>

On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
> 
> I?ve tried benchmarking many vs. few vdisks per RG, and never could see 
> any performance difference.

That's encouraging.

> 
> Usually we create 1 vdisk per enclosure per RG, ? thinking this will 
> allow us to grow with same size vdisks when adding additional enclosures 
> in the future.
> 
> Don?t think mmvdisk can be told to create multiple vdisks per RG 
> directly, so you have to manually create multiple vdisk sets each with 
> the apropriate size.
> 

Thing is back in the day so GPFS v2.x/v3.x there where strict warnings 
that you needed a minimum of six NSD's for optimal performance. I have 
sat in presentations where IBM employees have said so. What we where 
told back then is that GPFS needs a minimum number of NSD's in order to 
be able to spread the I/O's out. So if an NSD is being pounded for reads 
and a write comes in it. can direct it to a less busy NSD.

Now I can imagine that in a ESS/DSS-G that as it's being scattered to 
the winds under the hood this is no longer relevant. But some notes to 
the effect for us old timers would be nice if that is the case to put 
our minds to rest.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Achim.Rehor at de.ibm.com  Mon Mar  1 08:16:43 2021
From: Achim.Rehor at de.ibm.com (Achim Rehor)
Date: Mon, 1 Mar 2021 09:16:43 +0100
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk><CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
	<0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
Message-ID: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>

The reason for having multiple NSDs in legacy NSD (non-GNR) handling is 
the increased parallelism, that gives you 'more spindles' and thus more 
performance.
In GNR the drives are used in parallel anyway through the GNRstriping. 
Therfore, you are using all drives of a ESS/GSS/DSS model under the hood 
in the vdisks anyway. 

The only reason for having more NSDs is for using them for different 
filesystems. 

 
Mit freundlichen Gr??en / Kind regards

Achim Rehor

IBM EMEA ESS/Spectrum Scale Support


gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43:

> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug-discuss at spectrumscale.org
> Date: 01/03/2021 08:58
> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
NSD's
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
> > 
> > I?ve tried benchmarking many vs. few vdisks per RG, and never could 
see 
> > any performance difference.
> 
> That's encouraging.
> 
> > 
> > Usually we create 1 vdisk per enclosure per RG,   thinking this will 
> > allow us to grow with same size vdisks when adding additional 
enclosures 
> > in the future.
> > 
> > Don?t think mmvdisk can be told to create multiple vdisks per RG 
> > directly, so you have to manually create multiple vdisk sets each with 

> > the apropriate size.
> > 
> 
> Thing is back in the day so GPFS v2.x/v3.x there where strict warnings 
> that you needed a minimum of six NSD's for optimal performance. I have 
> sat in presentations where IBM employees have said so. What we where 
> told back then is that GPFS needs a minimum number of NSD's in order to 
> be able to spread the I/O's out. So if an NSD is being pounded for reads 

> and a write comes in it. can direct it to a less busy NSD.
> 
> Now I can imagine that in a ESS/DSS-G that as it's being scattered to 
> the winds under the hood this is no longer relevant. But some notes to 
> the effect for us old timers would be nice if that is the case to put 
> our minds to rest.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
> M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
> IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
> 


From S.J.Thompson at bham.ac.uk  Mon Mar  1 09:06:07 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 1 Mar 2021 09:06:07 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk>
	<CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
	<0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
	<OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
Message-ID: <D0506AFA-A345-4101-BAD7-466B1FF2F3A1@bham.ac.uk>

Or for hedging your bets about how you might want to use it in future.

We are never quite sure if we want to do something different in the future with some of the storage, sure that might mean we want to steal some space from a file-system, but that is perfectly valid. And we have done this, both in temporary transient states (data migration between systems), or permanently (found we needed something on a separate file-system)

So yes whilst there might be no performance impact on doing this, we still do.

I vaguely recall some of the old reasoning was around IO queues in the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD server, you have 16 IO queues passing to multipath, which can help keep the data pipes full. I suspect there was some optimal number of NSDs for different storage controllers, but I don't know if anyone ever benchmarked that.

Simon

?On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com" <gpfsug-discuss-bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com> wrote:

    The reason for having multiple NSDs in legacy NSD (non-GNR) handling is 
    the increased parallelism, that gives you 'more spindles' and thus more 
    performance.
    In GNR the drives are used in parallel anyway through the GNRstriping. 
    Therfore, you are using all drives of a ESS/GSS/DSS model under the hood 
    in the vdisks anyway. 

    The only reason for having more NSDs is for using them for different 
    filesystems. 


    Mit freundlichen Gr??en / Kind regards

    Achim Rehor

    IBM EMEA ESS/Spectrum Scale Support


    gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43:

    > From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
    > To: gpfsug-discuss at spectrumscale.org
    > Date: 01/03/2021 08:58
    > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
    NSD's
    > Sent by: gpfsug-discuss-bounces at spectrumscale.org
    > 
    > On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
    > > 
    > > I?ve tried benchmarking many vs. few vdisks per RG, and never could 
    see 
    > > any performance difference.
    > 
    > That's encouraging.
    > 
    > > 
    > > Usually we create 1 vdisk per enclosure per RG,   thinking this will 
    > > allow us to grow with same size vdisks when adding additional 
    enclosures 
    > > in the future.
    > > 
    > > Don?t think mmvdisk can be told to create multiple vdisks per RG 
    > > directly, so you have to manually create multiple vdisk sets each with 

    > > the apropriate size.
    > > 
    > 
    > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings 
    > that you needed a minimum of six NSD's for optimal performance. I have 
    > sat in presentations where IBM employees have said so. What we where 
    > told back then is that GPFS needs a minimum number of NSD's in order to 
    > be able to spread the I/O's out. So if an NSD is being pounded for reads 

    > and a write comes in it. can direct it to a less busy NSD.
    > 
    > Now I can imagine that in a ESS/DSS-G that as it's being scattered to 
    > the winds under the hood this is no longer relevant. But some notes to 
    > the effect for us old timers would be nice if that is the case to put 
    > our minds to rest.
    > 
    > 
    > JAB.
    > 
    > -- 
    > Jonathan A. Buzzard                         Tel: +44141-5483420
    > HPC System Administrator, ARCHIE-WeSt.
    > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > https://urldefense.proofpoint.com/v2/url?
    > 
    u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
    > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
    > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
    > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
    > 


    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From luis.bolinches at fi.ibm.com  Mon Mar  1 09:08:20 2021
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Mon, 1 Mar 2021 09:08:20 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
Message-ID: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/297d1cf8/attachment.htm>

From olaf.weiser at de.ibm.com  Mon Mar  1 09:34:26 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 09:34:26 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
References: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
Message-ID: <OF62EA85E8.65496B3A-ON0025868B.00339C57-0025868B.0034979D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/b0134bbb/attachment.htm>

From Achim.Rehor at de.ibm.com  Mon Mar  1 09:46:06 2021
From: Achim.Rehor at de.ibm.com (Achim Rehor)
Date: Mon, 1 Mar 2021 10:46:06 +0100
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <D0506AFA-A345-4101-BAD7-466B1FF2F3A1@bham.ac.uk>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk><CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com><0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk><OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
	<D0506AFA-A345-4101-BAD7-466B1FF2F3A1@bham.ac.uk>
Message-ID: <OFFD16AB58.68169321-ONC125868B.00354699-C125868B.0035A890@notes.na.collabserv.com>

Correct, there was. 
The OS is dealing with pdisks, while GPFS is striping over Vdisks/NSDs.

For GNR there is a differetnt queuing setup in GPFS, than there was for 
NSDs.
See "mmfsadm dump nsd" and check for NsdQueueTraditional versus 
NsdQueueGNR 

And yes, i was too strict, with 
">     The only reason for having more NSDs is for using them for 
different 
>     filesystems."

there are other management reasons to run with a reasonable number of 
vdisks, just not performance reasons. 

    Mit freundlichen Gruessen / Kind regards

    Achim Rehor

    IBM EMEA ESS/Spectrum Scale Support


gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 10:06:07:

> From: Simon Thompson <S.J.Thompson at bham.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 01/03/2021 10:06
> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
NSD's
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> Or for hedging your bets about how you might want to use it in future.
> 
> We are never quite sure if we want to do something different in the 
> future with some of the storage, sure that might mean we want to 
> steal some space from a file-system, but that is perfectly valid. 
> And we have done this, both in temporary transient states (data 
> migration between systems), or permanently (found we needed 
> something on a separate file-system)
> 
> So yes whilst there might be no performance impact on doing this, 
westill do.
> 
> I vaguely recall some of the old reasoning was around IO queues in 
> the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD 
> server, you have 16 IO queues passing to multipath, which can help 
> keep the data pipes full. I suspect there was some optimal number of
> NSDs for different storage controllers, but I don't know if anyone 
> ever benchmarked that.
> 
> Simon
> 
> On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on 
> behalf of Achim.Rehor at de.ibm.com" <gpfsug-discuss-
> bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com> wrote:
> 
>     The reason for having multiple NSDs in legacy NSD (non-GNR) handling 
is 
>     the increased parallelism, that gives you 'more spindles' and thus 
more 
>     performance.
>     In GNR the drives are used in parallel anyway through the 
GNRstriping. 
>     Therfore, you are using all drives of a ESS/GSS/DSS model under the 
hood 
>     in the vdisks anyway. 
> 
>     The only reason for having more NSDs is for using them for different 

>     filesystems. 
> 
> 
>     Mit freundlichen Gr??en / Kind regards
> 
>     Achim Rehor
> 
>     IBM EMEA ESS/Spectrum Scale Support
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>     gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 
08:58:43:
> 
>     > From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
>     > To: gpfsug-discuss at spectrumscale.org
>     > Date: 01/03/2021 08:58
>     > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number 
of 
>     NSD's
>     > Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     > 
>     > On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
>     > > 
>     > > I?ve tried benchmarking many vs. few vdisks per RG, and never 
could 
>     see 
>     > > any performance difference.
>     > 
>     > That's encouraging.
>     > 
>     > > 
>     > > Usually we create 1 vdisk per enclosure per RG,   thinking this 
will 
>     > > allow us to grow with same size vdisks when adding additional 
>     enclosures 
>     > > in the future.
>     > > 
>     > > Don?t think mmvdisk can be told to create multiple vdisks per RG 

>     > > directly, so you have to manually create multiple vdisk setseach 
with 
> 
>     > > the apropriate size.
>     > > 
>     > 
>     > Thing is back in the day so GPFS v2.x/v3.x there where strict 
warnings 
>     > that you needed a minimum of six NSD's for optimal performance. I 
have 
>     > sat in presentations where IBM employees have said so. What we 
where 
>     > told back then is that GPFS needs a minimum number of NSD's 
inorder to 
>     > be able to spread the I/O's out. So if an NSD is being poundedfor 
reads 
> 
>     > and a write comes in it. can direct it to a less busy NSD.
>     > 
>     > Now I can imagine that in a ESS/DSS-G that as it's being scattered 
to 
>     > the winds under the hood this is no longer relevant. But some 
notes to 
>     > the effect for us old timers would be nice if that is the case to 
put 
>     > our minds to rest.
>     > 
>     > 
>     > JAB.
>     > 
>     > -- 
>     > Jonathan A. Buzzard                         Tel: +44141-5483420
>     > HPC System Administrator, ARCHIE-WeSt.
>     > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>     > _______________________________________________
>     > gpfsug-discuss mailing list
>     > gpfsug-discuss at spectrumscale.org
>     > https://urldefense.proofpoint.com/v2/url?
>     > 
> 
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
>     > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
>     > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
>     > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
>     > 
> 
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
> M&m=gU9xf_Z6rrdOa4-
> 
WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e=
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
> M&m=gU9xf_Z6rrdOa4-
> 
WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e=
> 


From jonathan.buzzard at strath.ac.uk  Mon Mar  1 11:45:45 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 11:45:45 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
References: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
Message-ID: <ab18ea94-e4ca-a4f5-2070-094acf4e08db@strath.ac.uk>

On 01/03/2021 09:08, Luis Bolinches wrote:
> Hi
 >
> There other reasons to have more than 1. It is management of those. When 
> you have to add or remove NSDs of a FS having more than 1 makes it 
> possible to empty some space and manage those in and out. Manually but 
> possible. If you have one big NSD or even 1 per enclosure it might 
> difficult or even not possible depending the number of enclosures and FS 
> utilization.
 >
> Starting some ESS version (not DSS, cant comment on that) that I do not 
> recall but in the last 6 months, we have change the default (for those 
> that use the default) to 4 NSDs per enclosure for ESS 5000. There is no 
> impact on performance either way on ESS, we tested it. But management of 
> those on the long run should be easier.
Question how does one create a none default number of vdisks per 
enclosure then?

I tried creating a stanza file and then doing mmcrvdisk but it was not 
happy, presumably because of the "new style" recovery group management

mmcrvdisk: [E] This command is not supported by recovery groups under 
management of mmvdisk.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From abeattie at au1.ibm.com  Mon Mar  1 11:53:32 2021
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Mon, 1 Mar 2021 11:53:32 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <ab18ea94-e4ca-a4f5-2070-094acf4e08db@strath.ac.uk>
Message-ID: <OF7915C61C.BB236456-ON0025868B.0041537A-1614599612656@notes.na.collabserv.com>

Jonathan,

You need to create vdisk sets which will create multiple vdisks, you can then assign vdisk sets to your filesystem. (Assigning multiple vdisks at a time)

Things to watch - free space calculations are more complex as it?s building multiple vdisks under the cover using multiple raid parameters

Also it?s worth assuming a 10% reserve or approx - drive per disk shelf for rebuild space 


Mmvdisk vdisk set ... insert parameters

https://www.ibm.com/support/knowledgecenter/mk/SSYSP8_5.3.2/com.ibm.spectrum.scale.raid.v5r02.adm.doc/bl8adm_mmvdisk.htm

Sent from my iPhone

> On 1 Mar 2021, at 21:45, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> ?On 01/03/2021 09:08, Luis Bolinches wrote:
>> Hi
>> 
>> There other reasons to have more than 1. It is management of those. When 
>> you have to add or remove NSDs of a FS having more than 1 makes it 
>> possible to empty some space and manage those in and out. Manually but 
>> possible. If you have one big NSD or even 1 per enclosure it might 
>> difficult or even not possible depending the number of enclosures and FS 
>> utilization.
>> 
>> Starting some ESS version (not DSS, cant comment on that) that I do not 
>> recall but in the last 6 months, we have change the default (for those 
>> that use the default) to 4 NSDs per enclosure for ESS 5000. There is no 
>> impact on performance either way on ESS, we tested it. But management of 
>> those on the long run should be easier.
> Question how does one create a none default number of vdisks per 
> enclosure then?
> 
> I tried creating a stanza file and then doing mmcrvdisk but it was not 
> happy, presumably because of the "new style" recovery group management
> 
> mmcrvdisk: [E] This command is not supported by recovery groups under 
> management of mmvdisk.
> 
> 
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=STXkGEO2XATS_s2pRCAAh2wXtuUgwVcx1XjUX7ELNdk&m=9HlRHByoByQcM0mY0elL-l4DgA6MzHkAGzE70Rl2p2E&s=eWRfWGpdZB-PZ_InCCjgmdQOCy6rgWj9Oi3TGGA38yY&e= 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/16786097/attachment.htm>

From scl at virginia.edu  Mon Mar  1 12:31:37 2021
From: scl at virginia.edu (Losen, Stephen C (scl))
Date: Mon, 1 Mar 2021 12:31:37 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
Message-ID: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>

Hi folks,
Experimenting with POSIX ACLs on GPFS 4.2 and noticed that the Linux command setfacl clears "c" permissions that were set with mmputacl. So if I have this:

...
group:group1:rwxc
mask::rwxc
...

and I modify a different entry with:

setfacl -m group:group2:r-x dirname

then the "c" permissions above get cleared and I end up with
...
group:group1:rwx-
mask::rwx-
...

I discovered that chmod does not clear the "c" mode. Is there any filesystem option to change this behavior to leave "c" modes in place? 

Steve Losen
Research Computing
University of Virginia
scl at virginia.edu   434-924-0640


From olaf.weiser at de.ibm.com  Mon Mar  1 12:45:44 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 12:45:44 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
Message-ID: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/e5bc3a9a/attachment.htm>

From stockf at us.ibm.com  Mon Mar  1 12:58:44 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 1 Mar 2021 12:58:44 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
References: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>,
	<488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
Message-ID: <OFA81D94F5.553D5F13-ON0025868B.00473C67-0025868B.00474BD5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/b76ec4b4/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  1 13:14:38 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 13:14:38 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
	<OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
Message-ID: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>

On 01/03/2021 12:45, Olaf Weiser wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> Hallo Stephen,
> behavior ... or better to say ... predicted behavior for chmod and ACLs 
> .. is not an easy thing or only? , if? you stay in either POSIX world or 
> NFSv4 world
> to be POSIX compliant, a chmod overwrites ACLs

One might argue that the general rubbishness of the mmputacl cammand, 
and if a mmsetfacl command (or similar) existed it would negate messing 
with Linux utilities to change ACL's on GPFS file systems

Only been bringing it up for over a decade now ;-)

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From olaf.weiser at de.ibm.com  Mon Mar  1 15:18:59 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 15:18:59 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>
References: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>,
	<488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu><OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
Message-ID: <OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/1d0fb999/attachment.htm>

From laurence at qsplace.co.uk  Mon Mar  1 08:59:35 2021
From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow)
Date: Mon, 01 Mar 2021 08:59:35 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk><CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
	<0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
	<OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
Message-ID: <6F478E88-E350-46BF-9993-82C21ADD2262@qsplace.co.uk>

Like Jan, I did some benchmarking a few years ago when the default recommended RG's dropped to 1 per DA to meet rebuild requirements. I couldn't see any discernable difference.

As Achim has also mentioned, I just use vdisks for creating additional filesystems. Unless there is going to be a lot of shuffling of space or future filesystem builds, then I divide the RG's into say 10 vdisks to give some flexibility and granularity

There is also a flag iirc that changes the gpfs magic to consider multiple under lying disks, when I find it again........ Which can provide increased performance on traditional RAID builds.

-- Lauz

On 1 March 2021 08:16:43 GMT, Achim Rehor <Achim.Rehor at de.ibm.com> wrote:
>The reason for having multiple NSDs in legacy NSD (non-GNR) handling is
>
>the increased parallelism, that gives you 'more spindles' and thus more
>
>performance.
>In GNR the drives are used in parallel anyway through the GNRstriping. 
>Therfore, you are using all drives of a ESS/GSS/DSS model under the
>hood 
>in the vdisks anyway. 
>
>The only reason for having more NSDs is for using them for different 
>filesystems. 
>
> 
>Mit freundlichen Gr??en / Kind regards
>
>Achim Rehor
>
>IBM EMEA ESS/Spectrum Scale Support
>
>
>
>
>
>
>
>
>
>
>
>
>gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43:
>
>> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
>> To: gpfsug-discuss at spectrumscale.org
>> Date: 01/03/2021 08:58
>> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
>NSD's
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> 
>> On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
>> > 
>> > I?ve tried benchmarking many vs. few vdisks per RG, and never could
>
>see 
>> > any performance difference.
>> 
>> That's encouraging.
>> 
>> > 
>> > Usually we create 1 vdisk per enclosure per RG,   thinking this
>will 
>> > allow us to grow with same size vdisks when adding additional 
>enclosures 
>> > in the future.
>> > 
>> > Don?t think mmvdisk can be told to create multiple vdisks per RG 
>> > directly, so you have to manually create multiple vdisk sets each
>with 
>
>> > the apropriate size.
>> > 
>> 
>> Thing is back in the day so GPFS v2.x/v3.x there where strict
>warnings 
>> that you needed a minimum of six NSD's for optimal performance. I
>have 
>> sat in presentations where IBM employees have said so. What we where 
>> told back then is that GPFS needs a minimum number of NSD's in order
>to 
>> be able to spread the I/O's out. So if an NSD is being pounded for
>reads 
>
>> and a write comes in it. can direct it to a less busy NSD.
>> 
>> Now I can imagine that in a ESS/DSS-G that as it's being scattered to
>
>> the winds under the hood this is no longer relevant. But some notes
>to 
>> the effect for us old timers would be nice if that is the case to put
>
>> our minds to rest.
>> 
>> 
>> JAB.
>> 
>> -- 
>> Jonathan A. Buzzard                         Tel: +44141-5483420
>> HPC System Administrator, ARCHIE-WeSt.
>> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> https://urldefense.proofpoint.com/v2/url?
>> 
>u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
>> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
>> M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
>> IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
>> 
>
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/91665775/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  1 16:50:31 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 16:50:31 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
References: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>
	<488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
	<OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
	<OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
Message-ID: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>

On 01/03/2021 15:18, Olaf Weiser wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> JAB,
> yes-this is in argument ;-) ... and personally I like the idea of having 
> smth like setfacl also for GPFS ..? for years...
> *but* it would not take away the generic challenge , what to do, if 
> there are competing standards / definitions to meet
> at least that is most likely just one reason, why there's no tool yet
> there are several hits on RFE page for "ACL".. some of them could be 
> also addressed with a (mm)setfacl tool
> but I was not able to find a request for a tool itself
> (I quickly? searched? public but? not found it there, maybe there is 
> already one in private...)
> So - dependent on how important this item for others? is? ... its time 
> to fire an RFE ?!? ...

Well when I asked I was told by an IBM representative that it was by 
design there was no proper way to set ACLs directly from Linux. The 
expectation was that you would do this over NFSv4 or Samba.

So filing an RFE would be pointless under those conditions and I have 
never bothered as a result. This was pre 2012 so IBM's outlook might 
have changed in the meantime.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From olaf.weiser at de.ibm.com  Mon Mar  1 17:57:11 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 17:57:11 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>
References: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>,
	<dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk><488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu><OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com><OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
Message-ID: <OFA0B8D206.C995DEC6-ON0025868B.0061EBF7-0025868B.00629E93@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/3642e19b/attachment.htm>

From A.Wolf-Reber at de.ibm.com  Tue Mar  2 09:36:48 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Tue, 2 Mar 2021 09:36:48 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OFA0B8D206.C995DEC6-ON0025868B.0061EBF7-0025868B.00629E93@notes.na.collabserv.com>
References: <OFA0B8D206.C995DEC6-ON0025868B.0061EBF7-0025868B.00629E93@notes.na.collabserv.com>,
	<122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>,
	<dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk><488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu><OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com><OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
Message-ID: <OFBB90AEFF.65465E44-ON0025868C.003436BA-0025868C.0034CED5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16146770920000.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16146770920001.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16146770920002.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0002.png>

From russell at nordquist.info  Tue Mar  2 19:31:24 2021
From: russell at nordquist.info (Russell Nordquist)
Date: Tue, 2 Mar 2021 14:31:24 -0500
Subject: [gpfsug-discuss] Self service creation of filesets
Message-ID: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>

Hi all

We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.

thanks
Russell


From anacreo at gmail.com  Tue Mar  2 20:58:29 2021
From: anacreo at gmail.com (Alec)
Date: Tue, 2 Mar 2021 12:58:29 -0800
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
Message-ID: <CAGhSTwjF1E0qvgCRphbrdWpikezWdjZ1CKtE=cVddeCbaFA4TA@mail.gmail.com>

This does feel like another situation where I may use a custom attribute
and a periodic script to do the fileset creation.  Honestly I would want
the change management around fileset creation.

But I could see a few custom attributes on a newly created user dir... Like
maybe just setting user.quota=10TB...  Then have a policy that discovers
these does the work of creating the fileset, setting the quotas, migrating
data to the fileset, and then mounting the fileset over the original
directory.  Honestly that sounds so nice I may have to implement this...
Lol.

Like I could see doing something like discovering directories that have
user.archive=true and automatically gzipping large files within. Would be
nice if GPFS policy engine could have a IF_ANCESTOR_ATTRIBUTE=.

Alec

On Tue, Mar 2, 2021, 11:40 AM Russell Nordquist <russell at nordquist.info>
wrote:

> Hi all
>
> We are trying to use filesets quite a bit, but it?s a hassle that only the
> admins can create them. To the users it?s just a directory so it slows
> things down. Has anyone deployed a self service model for creating
> filesets? Maybe using the API? This feels like shared pain that someone has
> already worked on?.
>
> thanks
> Russell
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/af5c169e/attachment.htm>

From S.J.Thompson at bham.ac.uk  Tue Mar  2 22:38:17 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 2 Mar 2021 22:38:17 +0000
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
Message-ID: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk>

Not quite user self-service ....

But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again.

Simon

?On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" <gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info> wrote:

    Hi all

    We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.

    thanks
    Russell


    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From ckerner at illinois.edu  Tue Mar  2 22:59:01 2021
From: ckerner at illinois.edu (Kerner, Chad A)
Date: Tue, 2 Mar 2021 22:59:01 +0000
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
	<3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk>
Message-ID: <52196DB3-E8D3-47F7-92F6-3A123B46F615@illinois.edu>

We have a similar process. One of our customers has a web app that their managers use to provision spaces. That web app drops a json file into a specific location and a cron job kicks off a python script every so often to process the files and provision the space(fileset creation, link, quota, owner, group, perms, etc). Failures are queued and a jira ticket opened. Successes update the database for the web app. They are not requiring instant processing, so we process hourly on the back end side of things.

Chad
--
Chad Kerner, Senior Storage Engineer
Storage Enabling Technologies
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign

?On 3/2/21, 4:38 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson" <gpfsug-discuss-bounces at spectrumscale.org on behalf of S.J.Thompson at bham.ac.uk> wrote:

    Not quite user self-service ....
    
    But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again.
    
    Simon
    
    On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" <gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info> wrote:
    
        Hi all
    
        We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.
    
        thanks
        Russell
    
    
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ 
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ 
    

From tortay at cc.in2p3.fr  Wed Mar  3 08:06:37 2021
From: tortay at cc.in2p3.fr (Loic Tortay)
Date: Wed, 3 Mar 2021 09:06:37 +0100
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
Message-ID: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>

On 02/03/2021 20:31, Russell Nordquist wrote:
> Hi all
> 
> We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.
> 
Hello,
We have a quota management delegation (CLI) tool that allows 
"power-users" (PI and such) to create and remove filesets and manage 
users quotas for the groups/projects they're heading.

Like someone else said, from their point of view they're just 
directories, so they create a "directory with quotas".
In our experience, "directories with quotas" are the most convenient way 
for end-users to understand and use quotas.

This is a tool written in C, about 13 years ago, using the GPFS API (and 
a few calls to GPFS commands where there is no API or it's lacking).

Delegation authorization (identifying "power-users") is external to the 
tool.

Permissions & ACLs are also set on the junction when a fileset is 
created so that it's both immediately usable ("instant processing") and 
accessible to "power-users" (for space management purposes).

There are extra features for staff to allow higher-level operations 
(e.g. create an independent fileset for a group/project, change the 
group/project quotas, etc.)

The dated looking user documentation is 
https://ccspsmon.in2p3.fr/spsquota.html

Both the tool and the documentation have a few site-specific things, so 
it's not open-source (and it has become a "legacy" tool in need of a 
rewrite/refactoring).


Lo?c.
-- 
|   Lo?c Tortay <tortay at cc.in2p3.fr>  -     IN2P3 Computing Centre     |


From russell at nordquist.info  Wed Mar  3 17:14:37 2021
From: russell at nordquist.info (Russell Nordquist)
Date: Wed, 3 Mar 2021 12:14:37 -0500
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
	<2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>
Message-ID: <EA2BB7A5-C44F-4F65-BBD6-98926ABC906D@nordquist.info>

Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :)  Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 <http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244> am I missing something. 

What I would want is to be able to grant the the following calls + maybe a few more. 

The related REST API calls.
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesets.htm <https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesets.htm>
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesetlink.htm <https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesetlink.htm>

Russell


> On Mar 3, 2021, at 3:06 AM, Loic Tortay <tortay at cc.in2p3.fr> wrote:
> 
> On 02/03/2021 20:31, Russell Nordquist wrote:
>> Hi all
>> We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.
> Hello,
> We have a quota management delegation (CLI) tool that allows "power-users" (PI and such) to create and remove filesets and manage users quotas for the groups/projects they're heading.
> 
> Like someone else said, from their point of view they're just directories, so they create a "directory with quotas".
> In our experience, "directories with quotas" are the most convenient way for end-users to understand and use quotas.
> 
> This is a tool written in C, about 13 years ago, using the GPFS API (and a few calls to GPFS commands where there is no API or it's lacking).
> 
> Delegation authorization (identifying "power-users") is external to the tool.
> 
> Permissions & ACLs are also set on the junction when a fileset is created so that it's both immediately usable ("instant processing") and accessible to "power-users" (for space management purposes).
> 
> There are extra features for staff to allow higher-level operations (e.g. create an independent fileset for a group/project, change the group/project quotas, etc.)
> 
> The dated looking user documentation is https://ccspsmon.in2p3.fr/spsquota.html
> 
> Both the tool and the documentation have a few site-specific things, so it's not open-source (and it has become a "legacy" tool in need of a rewrite/refactoring).
> 
> 
> Lo?c.
> -- 
> |   Lo?c Tortay <tortay at cc.in2p3.fr>  -     IN2P3 Computing Centre     |
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210303/ea0d47bb/attachment.htm>

From robert.horton at icr.ac.uk  Thu Mar  4 09:51:45 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Thu, 4 Mar 2021 09:51:45 +0000
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <EA2BB7A5-C44F-4F65-BBD6-98926ABC906D@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
	<2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>
	<EA2BB7A5-C44F-4F65-BBD6-98926ABC906D@nordquist.info>
Message-ID: <566f81f3bfd243f1b0258562b627e4e1b6869863.camel@icr.ac.uk>

On Wed, 2021-03-03 at 12:14 -0500, Russell Nordquist wrote:
CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe.

Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :)  Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 am I missing something.


That reminds me... We use a Python wrapper around the REST API to monitor usage against fileset quotas etc. In principle this will also set quotas (and create filesets) but it means giving it storage administrator access. It would be nice if the GUI had sufficiently fine grained permissions that you could set quotas without being able to delete the filesystem.

Rob

--

Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk | Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210304/7f0388dd/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Fri Mar  5 10:04:22 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 5 Mar 2021 10:04:22 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
Message-ID: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>


I am seeing that whenever I try and restore a file with an ACL I get the 
a ANS1589W error in /var/log/dsmerror.log

ANS1589W Unable to write extended attributes for ****** due to errno: 
13, reason: Permission denied

But bizarrely the ACL is actually restored. At least as far as I can 
tell. This is the 8.1.11-0 TSM client with GPFS version 5.0.5-1 against 
a 8.1.10-0 TSM server. Running on RHEL 7.7 to match the DSS-G 2.7b 
install. The backup node makes the third quorum node for the cluster 
being as that it runs genuine RHEL (unlike all the compute nodes which 
are running CentOS).

Googling I can't find any references to this being fixed in a later 
version of the GPFS software, though being on RHEL7 and it's derivatives 
I am stuck on 5.0.5

Surely root has permissions to write the extended attributes for anyone? 
It would seem perverse if you have to be the owner of a file to restore 
the ACL's.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From stockf at us.ibm.com  Fri Mar  5 12:15:38 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Fri, 5 Mar 2021 12:15:38 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
Message-ID: <OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210305/3e490cfc/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Fri Mar  5 13:07:56 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 5 Mar 2021 13:07:56 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
	<OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
Message-ID: <d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>

On 05/03/2021 12:15, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> Have you checked to see if Spectrum Protect (TSM) has addressed this 
> problem.? There recently was an issue with Protect and how it used the 
> GPFS API for ACLs.? If I recall Protect was not properly handling a 
> return code.? I do not know if it is relevant to your problem but? it 
> seemed worth mentioning.

As far as I am aware 8.1.11.0 is the most recent version of the Spectrum 
Protect/TSM client. There is nothing newer showing on the IBM FTP site

ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/

Checking on fix central also seems to show that 8.1.11.0 is the latest 
version, and the only fix over 8.1.10.0 is a security update to do with 
the client web user interface.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Renar.Grunenberg at huk-coburg.de  Fri Mar  5 18:06:43 2021
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Fri, 5 Mar 2021 18:06:43 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
	<OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
	<d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
Message-ID: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>

Hallo All,
thge mentioned problem with protect was this:
https://www.ibm.com/support/pages/node/6415985?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E
Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:  09561 96-44110
Telefax:  09561 96-44104
E-Mail:   Renar.Grunenberg at huk-coburg.de
Internet: www.huk.de
=======================================================================
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Thomas Sehn, Daniel Thomas.
=======================================================================
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
=======================================================================

-----Urspr?ngliche Nachricht-----
Von: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Jonathan Buzzard
Gesendet: Freitag, 5. M?rz 2021 14:08
An: gpfsug-discuss at spectrumscale.org
Betreff: Re: [gpfsug-discuss] TSM errors restoring files with ACL's

On 05/03/2021 12:15, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before
> clicking links or attachments.
> Have you checked to see if Spectrum Protect (TSM) has addressed this
> problem.  There recently was an issue with Protect and how it used the
> GPFS API for ACLs.  If I recall Protect was not properly handling a
> return code.  I do not know if it is relevant to your problem but  it
> seemed worth mentioning.

As far as I am aware 8.1.11.0 is the most recent version of the Spectrum
Protect/TSM client. There is nothing newer showing on the IBM FTP site

ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/

Checking on fix central also seems to show that 8.1.11.0 is the latest
version, and the only fix over 8.1.10.0 is a security update to do with
the client web user interface.


JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From stockf at us.ibm.com  Fri Mar  5 19:12:47 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Fri, 5 Mar 2021 19:12:47 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>
References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>,
	<1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk><OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com><d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
Message-ID: <OF08E960CD.84556D49-ON0025868F.00695AC7-0025868F.00698A79@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210305/3911592a/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Fri Mar  5 20:31:54 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 5 Mar 2021 20:31:54 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <OF08E960CD.84556D49-ON0025868F.00695AC7-0025868F.00698A79@notes.na.collabserv.com>
References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>
	<1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
	<OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
	<d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
	<OF08E960CD.84556D49-ON0025868F.00695AC7-0025868F.00698A79@notes.na.collabserv.com>
Message-ID: <696e96cc-da52-a24f-d53e-6510407e51e7@strath.ac.uk>

On 05/03/2021 19:12, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> I was referring to this flash, 
> https://www.ibm.com/support/pages/node/6381354?myns=swgtiv&mynp=OCSSEQVQ&mync=E&cm_sp=swgtiv-_-OCSSEQVQ-_-E 
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fpages%2Fnode%2F6381354%3Fmyns%3Dswgtiv%26mynp%3DOCSSEQVQ%26mync%3DE%26cm_sp%3Dswgtiv-_-OCSSEQVQ-_-E&data=04%7C01%7Cjonathan.buzzard%40strath.ac.uk%7C85cd5149f3f745b7137308d8e00ab18d%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C637505683823937774%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2b0LjE1Ycc3DPKto5kUTTLc0u5lG3DycsQGOjs%2BHgtw%3D&reserved=0> 
> 
> Spectrum Protect 8.1.11 client has the fix so this should not be an 
> issue for Jonathan.? Probably best to open a help case against Spectrum 
> Protect and begin the investigation there.
> 

Also the fix is to stop an unchanged file with an ACL from being backed 
up again, but only one more time.

I suspect we where hit with that issue, but given we only have ~90GB of 
files with ACL's on them I would not have noticed. That is significantly 
less than the normal daily churn.

This however is an issue with the *restore*.

Everything looks to get restored correctly even the ACL's. At the end of 
the restore all looks good given the headline report from dsmc. However 
there are ANS1589W warnings in dsmerror.log and dsmc exits with an error 
code of 8 rather than zero.

Will open a case against Spectrum Protect on Monday. I am pretty 
confident the warnings are false. The current plan is to do carefully 
curated hand restores of the three afflicted users when the rest of the 
restore if finished to double check the ACL's are the only issue.

Quite how the Spectrum Protect team have missed this bug is beyond me. 
Do they not have some unit tests to check this stuff before pushing out 
updates. I know in the past it worked, though that was many years ago 
now. However I restored many TB of data from backup with ACL's on them.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Robert.Oesterlin at nuance.com  Mon Mar  8 14:49:59 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 8 Mar 2021 14:49:59 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
Message-ID: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com>

Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance:

file1.py -> /fs1/patha/pathb/file1.py (I want to include these)
file2.py -> /fs2/patha/pathb/file2.py (exclude these)

The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution?


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/9fe0a20a/attachment.htm>

From stockf at us.ibm.com  Mon Mar  8 15:29:42 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 8 Mar 2021 15:29:42 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com>
References: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com>
Message-ID: <OF5A646652.698C93DD-ON00258692.00550C1D-00258692.00551E23@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/9eb9a52a/attachment.htm>

From Robert.Oesterlin at nuance.com  Mon Mar  8 15:34:21 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 8 Mar 2021 15:34:21 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
Message-ID: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>

Well - the case here is that the file system has, let?s say, 100M files. Some percentage of these are sym-links to a location that?s not in this file system. I want a report of all these off file system links. However, not all of the sym-links off file system are of interest, just some of them.

I can?t say for sure where in the file system they are (and I don?t care).


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Frederick Stock <stockf at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, March 8, 2021 at 9:29 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Policy scan of symbolic links with contents?

CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________
Could you use the PATHNAME LIKE statement to limit the location to the files of interest?

Fred
_______________________________________________________
Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821
stockf at us.ibm.com


----- Original message -----
From: "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Policy scan of symbolic links with contents?
Date: Mon, Mar 8, 2021 10:12 AM


Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance:


file1.py -> /fs1/patha/pathb/file1.py (I want to include these)

file2.py -> /fs2/patha/pathb/file2.py (exclude these)


The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution?


Bob Oesterlin

Sr Principal Storage Engineer, Nuance


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=i6m1zVXf4peZo0yo02IiRaQ_pUX95MN3wU53M0NiWcI&s=z-ibh2kAPHbehAsrGavNIg2AJdXmHkpUwy5YhZfUbpc&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/f62a22b8/attachment.htm>

From stockf at us.ibm.com  Mon Mar  8 16:07:48 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 8 Mar 2021 16:07:48 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
References: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
Message-ID: <OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/42133a8c/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  8 20:45:05 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 8 Mar 2021 20:45:05 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>
References: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
	<OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>
Message-ID: <ac22d795-833b-3f8a-9397-70609c20d3fe@strath.ac.uk>

On 08/03/2021 16:07, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> Presumably the only feature that would help here is if policy could 
> determine that the end location pointed to by a symbolic link is within 
> the current file system.? I am not aware of any such feature or 
> attribute which policy could check so I think all you can do is run 
> policy to find the symbolic links and then check each link to see if it 
> points into the same file system.? You might find the mmfind command 
> useful for this purpose.? I expect it would eliminate the need to create 
> a policy to find the symbolic links.
> 

Unless you are using bind mounts if the symbolic link points outside the 
mount point of the file system it is not within the current file system.

So noting that you can write very SQL like statements something like the 
following should in theory do it

RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND 
SUBSTR(PATH_NAME,0,4)='/fs1/'

Note the above is not checked in any way shape or form for working. Even 
if you do have bind mounts of other GPFS file systems you just need a 
more complicated WHERE statement.

When doing policy engine stuff I find having that section of the GPFS 
manual printed out and bound, along with an SQL book for reference is 
very helpful.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Mon Mar  8 21:00:04 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 8 Mar 2021 21:00:04 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <ac22d795-833b-3f8a-9397-70609c20d3fe@strath.ac.uk>
References: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
	<OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>
	<ac22d795-833b-3f8a-9397-70609c20d3fe@strath.ac.uk>
Message-ID: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>

On 08/03/2021 20:45, Jonathan Buzzard wrote:

[SNIP]

> So noting that you can write very SQL like statements something like the
> following should in theory do it
> 
> RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND
> SUBSTR(PATH_NAME,0,4)='/fs1/'
> 
> Note the above is not checked in any way shape or form for working. Even
> if you do have bind mounts of other GPFS file systems you just need a
> more complicated WHERE statement.

Duh, of course as soon as I sent it, I realized there is a missing SHOW

RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' 
AND SUBSTR(PATH_NAME,0,4)='/fs1/'

You could replace the SUBSTR with a REGEX if you prefer


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From ulmer at ulmer.org  Mon Mar  8 22:33:38 2021
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Mon, 8 Mar 2021 17:33:38 -0500
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>
References: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>
Message-ID: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org>

Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood).

-- 
Stephen Ulmer

Sent from a mobile device; please excuse auto-correct silliness.

> On Mar 8, 2021, at 3:34 PM, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> ?On 08/03/2021 20:45, Jonathan Buzzard wrote:
> 
> [SNIP]
> 
>> So noting that you can write very SQL like statements something like the
>> following should in theory do it
>> RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND
>> SUBSTR(PATH_NAME,0,4)='/fs1/'
>> Note the above is not checked in any way shape or form for working. Even
>> if you do have bind mounts of other GPFS file systems you just need a
>> more complicated WHERE statement.
> 
> Duh, of course as soon as I sent it, I realized there is a missing SHOW
> 
> RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/'
> 
> You could replace the SUBSTR with a REGEX if you prefer
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Robert.Oesterlin at nuance.com  Tue Mar  9 12:25:56 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Tue, 9 Mar 2021 12:25:56 +0000
Subject: [gpfsug-discuss] [EXTERNAL] Re: Policy scan of symbolic links
 with contents?
In-Reply-To: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org>
References: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>
	<23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org>
Message-ID: <3B0AD02E-335F-4540-B109-EC5301C3188A@nuance.com>

RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L'
AND SUBSTR(PATH_NAME,0,4)='/fs1/'

In this case PATH_NAME is the path within the GPFS file system, not the target of the link, correct? That's not what I want. I want the path of the *link target*.
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance


?On 3/8/21, 4:41 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Stephen Ulmer" <gpfsug-discuss-bounces at spectrumscale.org on behalf of ulmer at ulmer.org> wrote:

    CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.

    ----------------------------------------------------------------------
    Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood).

    
From bill.burke.860 at gmail.com  Wed Mar 10 02:19:02 2021
From: bill.burke.860 at gmail.com (William Burke)
Date: Tue, 9 Mar 2021 21:19:02 -0500
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
Message-ID: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>

 I would like to know what files were modified/created/deleted (only for
the current day) on the GPFS's file system so that I could rsync ONLY those
files to a predetermined external location. I am running GPFS 4.2.3.9

Is there a way to access the GPFS's metadata directly so that I do not have
to traverse the filesystem looking for these files? If i use the rsync tool
it will scan the file system which is 400+ million files.  Obviously this
will be problematic to complete a scan in a day, if it would ever complete
single-threaded. There are tools or scripts that run multithreaded rsync
but it's still a brute force attempt. and it would be nice to know where
the delta of files that have changed.

I began looking at Spectrum Scale Data Management (DM) API but I am not
sure if this is the best approach to looking at the GPFS metadata - inodes,
modify times, creation times, etc.


-- 

Best Regards,

William Burke (he/him)
Lead HPC Engineer
Advance Research Computing
860.255.8832 m | LinkedIn <http://LinkedIn.com/in/billcburke>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210309/3dfbe52a/attachment.htm>

From novosirj at rutgers.edu  Wed Mar 10 02:21:54 2021
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Wed, 10 Mar 2021 02:21:54 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
Message-ID: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>

Yup, you want to use the policy engine:

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync).

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> 
>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> 
> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed.
> 
> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc.
> 
> 
> 
> -- 
> 
> Best Regards,
> 
> William Burke (he/him)
> Lead HPC Engineer
> Advance Research Computing
> 860.255.8832 m | LinkedIn
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From anacreo at gmail.com  Wed Mar 10 02:59:18 2021
From: anacreo at gmail.com (Alec)
Date: Tue, 9 Mar 2021 18:59:18 -0800
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
	<05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
Message-ID: <CAGhSTwiQCiiagiEPh12pPu6Dp3oRTVWqwKz9ChvO9FYNomh8vQ@mail.gmail.com>

You would definitely be able to search by inode creation date and find the
files you want... our 1.25m file filesystem takes about 47 seconds to
query...  One thing I would worry about though is inode deletion and
inter-fileset file moves.   The SQL based engine wouldn't be able to
identify those changes and so you'd not be able to replicate deletes and
such.

Alternatively....
I have a script that runs in about 4 minutes and it pulls all the data out
of the backup indexes, and compares the pre-built hourly file index on our
system and identifies files that don't exist in the backup, so I have a
daily backup validation...  I filter the file list using ksh's printf date
manipulation to filter out files that are less than 2 days old, to reduce
the noise.  A modification to this could simply compare a daily file index
with the previous day's index, and send rsync a list of files (existing or
deleted) based on just a delta of the two indexes (sort|diff), then you
could properly account for all the changes.  If you don't care about file
modifications just produce both lists based on creation time instead of
modification time.  The mmfind command or GPFS policy engine should be able
to produce a full file list/index very rapidly.

In another thread there was a conversation with ACL's...  I don't think our
backup system backs up ACL's so I just have GPFS produce a list of all ACL
applied objects on the daily, and have a script that just makes a null
delimited backup file of every single ACL on our file system... and have a
script to apply the ACL's as a "restore".  It's a pretty simple thing to
write-up and keeping 90 day history on this lets me compare the ACL
evolution on a file very easily.

Alec

MVH
Most Victorious Hunting
(Why should Scandinavians own this cool sign off)

On Tue, Mar 9, 2021 at 6:22 PM Ryan Novosielski <novosirj at rutgers.edu>
wrote:

> Yup, you want to use the policy engine:
>
>
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>
> Something in here ought to help. We do something like this (but I?m
> reluctant to provide examples as I?m actually suspicious that we don?t have
> it quite right and are passing far too much stuff to rsync).
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS,     |---------------------------*O*---------------------------
> ||_// the State  |         Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
>      `'
>
> > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com>
> wrote:
> >
> >  I would like to know what files were modified/created/deleted (only for
> the current day) on the GPFS's file system so that I could rsync ONLY those
> files to a predetermined external location. I am running GPFS 4.2.3.9
> >
> > Is there a way to access the GPFS's metadata directly so that I do not
> have to traverse the filesystem looking for these files? If i use the rsync
> tool it will scan the file system which is 400+ million files.  Obviously
> this will be problematic to complete a scan in a day, if it would ever
> complete single-threaded. There are tools or scripts that run multithreaded
> rsync but it's still a brute force attempt. and it would be nice to know
> where the delta of files that have changed.
> >
> > I began looking at Spectrum Scale Data Management (DM) API but I am not
> sure if this is the best approach to looking at the GPFS metadata - inodes,
> modify times, creation times, etc.
> >
> >
> >
> > --
> >
> > Best Regards,
> >
> > William Burke (he/him)
> > Lead HPC Engineer
> > Advance Research Computing
> > 860.255.8832 m | LinkedIn
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210309/3dd0f70a/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Wed Mar 10 15:15:58 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 10 Mar 2021 15:15:58 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <CAGhSTwiQCiiagiEPh12pPu6Dp3oRTVWqwKz9ChvO9FYNomh8vQ@mail.gmail.com>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
	<05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
	<CAGhSTwiQCiiagiEPh12pPu6Dp3oRTVWqwKz9ChvO9FYNomh8vQ@mail.gmail.com>
Message-ID: <641ea714-579b-1d74-4b86-d0e0b2e8e9c3@strath.ac.uk>

On 10/03/2021 02:59, Alec wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> You would definitely be able to search by inode creation date and find 
> the files you want... our 1.25m file filesystem takes about 47 seconds 
> to query...? One thing I would worry about though is inode deletion and 
> inter-fileset file moves.? ?The SQL based engine wouldn't be able to 
> identify those changes and so you'd not be able to replicate deletes and 
> such.
> 

This is the problem with rsync "backups", you need to run it with 
--delete otherwise any restore will "upset" your users as they find 
large numbers of file they had deleted unhelpfully "restored"

> Alternatively....
> I have a script that runs in about 4 minutes and it pulls all the data 
> out of the backup indexes, and compares the pre-built hourly file index 
> on our system and identifies files that don't exist in the backup, so I 
> have a daily backup validation...? I filter the file list using 
> ksh's?printf date manipulation to filter out files that are less than 2 
> days old, to reduce the noise.? A modification to this could simply 
> compare a daily file index with the previous day's index, and send rsync 
> a list of files (existing or deleted) based on just a delta of the two 
> indexes (sort|diff), then you could properly account for all the 
> changes.? If you don't care about file modifications just produce both 
> lists based on creation time instead of modification time.? The mmfind 
> command or GPFS policy engine should be able to produce a full file 
> list/index very rapidly.
> 

My view would be somewhere along the lines of this is a lot of work and 
if you have the space to rsync your GPFS file system to, presumably with 
a server attached to said storage then for under 500 PVU of Spectrum 
Protect licensing you can have a fully supported client/server Spectrum 
Protect/TSM backup solution and just use mmbackup.

You need to play the game and use older hardware ;-) I use an ancient 
pimped out Dell PowerEdge R300 as my TSM client node. Why this old, well 
it has a dual core Xeon E3113 for only 100 PVU. Anything newer would be 
quad core and 70 PVU per core which would cost an additional ~$1000 in 
licensing.

If it breaks down they are under $100 on eBay. It's never skipped a beat 
and I have just finished a complete planned restore of our DSS-G using it.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Wed Mar 10 19:09:13 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 10 Mar 2021 19:09:13 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
	<05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
Message-ID: <CFFD9E75-92C8-428A-8B34-660C32562E47@bham.ac.uk>

I was looking for the original source for this, but it was on dev works ... which is now dead.

But you can use something like:

tsbuhelper clustermigdiff \
$migratePath/.mmmigrateCfg/mmmigrate.list.v${prevFileCount}.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.latest.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.changed.v${fileCount}.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.deleted.v${fileCount}.filelist

"mmmigrate.list.latest.filelist" would be the output of a policyscan of your files today
"mmmigrate.list.v${prevFileCount}.filelist" is yesterday's policyscan

This then generates the changed and deleted list of files for you. tsbuhelper is what is used internally in mmbackup, though is not very documented...

We actually used something along these lines to support migrating between file-systems (generate daily diffs and sync those). The policy scan uses:

RULE EXTERNAL LIST 'latest.filelist' EXEC '' \
 RULE 'FilesToMigrate' LIST 'latest.filelist' DIRECTORIES_PLUS \
 SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || \
 VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || \
 ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' \
 WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' \
 ELSE 'resdnt' END )) \
 WHERE \
 ( \
 NOT \
 ( (PATH_NAME LIKE '/%/.mmbackup%') OR \
 (PATH_NAME LIKE '/%/.mmmigrate%') OR \
 (PATH_NAME LIKE '/%/.afm%') OR \
 (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR \
 (PATH_NAME LIKE '/%/.mmLockDir/%') OR \
 (MODE LIKE 's%') \
 ) \
 ) \
 AND \
 (MISC_ATTRIBUTES LIKE '%u%') \
 AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) \
 AND (NOT (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.SpaceMan/%'))

On our file-system, both the scan and diff took a long time (hours), but hundreds of millions of files.

This comes with no warranty ...

We don't use this for backup, Spectrum Protect and mmbackup are our friends ...

Simon

?On 10/03/2021, 02:22, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski" <gpfsug-discuss-bounces at spectrumscale.org on behalf of novosirj at rutgers.edu> wrote:

    Yup, you want to use the policy engine:

    https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

    Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync).

    --
    #BlackLivesMatter
    ____
    || \\UTGERS,  	 |---------------------------*O*---------------------------
    ||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
    || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
    ||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
         `'

    > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
    > 
    >  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
    > 
    > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed.
    > 
    > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc.
    > 
    > 
    > 
    > -- 
    > 
    > Best Regards,
    > 
    > William Burke (he/him)
    > Lead HPC Engineer
    > Advance Research Computing
    > 860.255.8832 m | LinkedIn
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From enrico.tagliavini at fmi.ch  Thu Mar 11 09:22:46 2021
From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico)
Date: Thu, 11 Mar 2021 09:22:46 +0000
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
References: <8d58f5c6c8ee4f44a5e09c4f9e3a6dac@ex2013mbx2.fmi.ch>
Message-ID: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>

Hello William,

I've got your email forwarded my another user and I decided to subscribe to give you my two cents.

I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.

DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
enough not to go that route.

What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
the ctime changes in the last couple of days (to update metadata info).

Good luck.
Kind regards.

-- 

Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


-------- Forwarded Message --------
> 
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org?<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
> Sent: Wednesday, March 10, 2021 3:22 AM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
> 
> Yup, you want to use the policy engine:
> 
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
> 
> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
> don?t have it quite right and are passing far too much stuff to rsync).
> 
> --
> #BlackLivesMatter
> ____
> > > \\UTGERS,?? |---------------------------*O*---------------------------
> > > _// the State |???????? Ryan Novosielski - novosirj at rutgers.edu
> > > \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> > > ?\\??? of NJ | Office of Advanced Research Computing - MSB C630, Newark
> ???? `'
> 
> > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> > 
> > ?I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
> > could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> > 
> > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
> > i use the rsync tool it will scan the file system which is 400+ million files.? Obviously this will be problematic to complete a
> > scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
> > brute force attempt. and it would be nice to know where the delta of files that have changed.
> > 
> > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
> > metadata - inodes, modify times, creation times, etc.
> > 
> > 
> > 
> > --
> > 
> > Best Regards,
> > 
> > William Burke (he/him)
> > Lead HPC Engineer
> > Advance Research Computing
> > 860.255.8832 m | LinkedIn
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From ulmer at ulmer.org  Thu Mar 11 13:17:30 2021
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Thu, 11 Mar 2021 08:17:30 -0500
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>
References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>
Message-ID: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org>

I?m going to ask what may be a dumb question:

Given that you have GPFS on both ends, what made you decide to NOT use AFM?

 -- 
Stephen


> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ?Hello William,
> 
> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
> 
> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
> 
> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
> enough not to go that route.
> 
> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
> the ctime changes in the last couple of days (to update metadata info).
> 
> Good luck.
> Kind regards.
> 
> -- 
> 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> -------- Forwarded Message --------
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
>> Sent: Wednesday, March 10, 2021 3:22 AM
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
>> 
>> Yup, you want to use the policy engine:
>> 
>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>> 
>> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
>> don?t have it quite right and are passing far too much stuff to rsync).
>> 
>> --
>> #BlackLivesMatter
>> ____
>>>> \\UTGERS,   |---------------------------*O*---------------------------
>>>> _// the State |         Ryan Novosielski - novosirj at rutgers.edu
>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>>>  \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
>>      `'
>> 
>>>> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
>>> 
>>>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
>>> 
>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
>>> i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
>>> brute force attempt. and it would be nice to know where the delta of files that have changed.
>>> 
>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
>>> metadata - inodes, modify times, creation times, etc.
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best Regards,
>>> 
>>> William Burke (he/him)
>>> Lead HPC Engineer
>>> Advance Research Computing
>>> 860.255.8832 m | LinkedIn
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/410f7c04/attachment.htm>

From enrico.tagliavini at fmi.ch  Thu Mar 11 13:24:47 2021
From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico)
Date: Thu, 11 Mar 2021 13:24:47 +0000
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org>
References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>
	<9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org>
Message-ID: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>

Hello Stephen,

actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.

The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.

Kind regards.


--


Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
I?m going to ask what may be a dumb question:

Given that you have GPFS on both ends, what made you decide to NOT use AFM?

 --
Stephen


On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:

?Hello William,

I've got your email forwarded my another user and I decided to subscribe to give you my two cents.

I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.

DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
enough not to go that route.

What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
the ctime changes in the last couple of days (to update metadata info).

Good luck.
Kind regards.

--

Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


-------- Forwarded Message --------

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
Sent: Wednesday, March 10, 2021 3:22 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync

Yup, you want to use the policy engine:

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
don?t have it quite right and are passing far too much stuff to rsync).

--
#BlackLivesMatter
____
\\UTGERS,   |---------------------------*O*---------------------------
_// the State |         Ryan Novosielski - novosirj at rutgers.edu
\\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
 \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
     `'

On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:

 I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9

Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
brute force attempt. and it would be nice to know where the delta of files that have changed.

I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
metadata - inodes, modify times, creation times, etc.


--

Best Regards,

William Burke (he/him)
Lead HPC Engineer
Advance Research Computing
860.255.8832 m | LinkedIn
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/36d19e4a/attachment.htm>

From ulmer at ulmer.org  Thu Mar 11 13:47:44 2021
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Thu, 11 Mar 2021 08:47:44 -0500
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
Message-ID: <AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>

Thank you! Would you mind letting me know in what era you made your evaluation?

I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 -- 
Stephen


> On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ?
> Hello Stephen,
> 
> actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.
> 
> The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.
> 
> Kind regards.
> 
>  -- 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
>> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
>> I?m going to ask what may be a dumb question:
>> 
>> Given that you have GPFS on both ends, what made you decide to NOT use AFM?
>> 
>>  -- 
>> Stephen
>> 
>> 
>>> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
>>> 
>>> ?Hello William,
>>> 
>>> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
>>> 
>>> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
>>> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
>>> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
>>> 
>>> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
>>> enough not to go that route.
>>> 
>>> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
>>> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
>>> the ctime changes in the last couple of days (to update metadata info).
>>> 
>>> Good luck.
>>> Kind regards.
>>> 
>>> -- 
>>> 
>>> Enrico Tagliavini
>>> Systems / Software Engineer
>>> 
>>> enrico.tagliavini at fmi.ch
>>> 
>>> Friedrich Miescher Institute for Biomedical Research
>>> Infomatics
>>> 
>>> Maulbeerstrasse 66
>>> 4058 Basel
>>> Switzerland
>>> 
>>> 
>>> 
>>> 
>>> -------- Forwarded Message --------
>>>> 
>>>> -----Original Message-----
>>>> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
>>>> Sent: Wednesday, March 10, 2021 3:22 AM
>>>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
>>>> 
>>>> Yup, you want to use the policy engine:
>>>> 
>>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>>>> 
>>>> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
>>>> don?t have it quite right and are passing far too much stuff to rsync).
>>>> 
>>>> --
>>>> #BlackLivesMatter
>>>> ____
>>>>>> \\UTGERS,   |---------------------------*O*---------------------------
>>>>>> _// the State |         Ryan Novosielski - novosirj at rutgers.edu
>>>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>>>>>  \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
>>>>      `'
>>>> 
>>>>>> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
>>>>> 
>>>>>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
>>>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
>>>>> 
>>>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
>>>>> i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
>>>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
>>>>> brute force attempt. and it would be nice to know where the delta of files that have changed.
>>>>> 
>>>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
>>>>> metadata - inodes, modify times, creation times, etc.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best Regards,
>>>>> 
>>>>> William Burke (he/him)
>>>>> Lead HPC Engineer
>>>>> Advance Research Computing
>>>>> 860.255.8832 m | LinkedIn
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at spectrumscale.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>> 
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/b1e24f88/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Thu Mar 11 14:20:05 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 11 Mar 2021 14:20:05 +0000
Subject: [gpfsug-discuss] Synchronization/Restore of file systems
Message-ID: <f71b5cf2-fe87-121b-1df3-ab3b72dfed47@strath.ac.uk>


As promised last year I having just completed a storage upgrade, I have 
sanitized my scripts and put them up on Github for other people to have 
a look at the methodology I use in these sorts of scenarios.

This time the upgrade involved pulling out all the existing disks and 
fitting large ones then restoring from backup, rather than synchronizing 
to a new system, but the principles are the same.

Bear in mind the code is written in Perl because it's history is ancient 
now and with few opportunities to test it in anger, rewriting it in the 
latest fashionable scripting language is unappealing.

     https://github.com/digitalcabbage/syncrestore

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From enrico.tagliavini at fmi.ch  Thu Mar 11 14:24:43 2021
From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico)
Date: Thu, 11 Mar 2021 14:24:43 +0000
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
	<AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
Message-ID: <c86f822581dcec1cc9c4d68e7448bcbfc5154ad0.camel@fmi.ch>

We evaluated AFM multiple times. The first time was in 2017 with Spectrum Scale 4.2 . When we switched to Spectrum Scale 5 not long ago we also re-evaluated AFM.

The horror stories about data loss are becoming more rare with modern setups, especially in the non DR case scenario. However AFM is still a very complicated tool, way to complicated if what you are looking for is a "simple" rsync style backup (but faster). The 3000+ pages of documentation for GPFS do not help our small team and many of those pages are dedicated to just AFM.

The performance problem is also still a real issue with modern versions as far as I was told. We can have a quite erratic data turnover in our setup, tied to very big scientific instruments capable of generating many TB of data per hour. Having good performance is important. I used the same tool we use for backups also to migrate the data from the old storage to the new storage (and from GPFS 4 to GPFS 5), and I managed to reach speeds of 17 - 19 GB / s data transfer (when hitting big files that is) using only two servers equipped with Infiniband EDR. I made a simple script to parallelize rsync to make it faster: https://github.com/fmi-basel/splitrsync . Combined with another program using the policy engine to generate the file list to avoid the painful crawling.

As I said we are a small team, so we have to be efficient. Developing that tool costed me time, but the ROI is there as I can use the same tool with non GPFS powered storage system, and we had many occasions where this was the case, for example when moving data from old system to be decommissioned to the GPFS storage.

And I would like to finally mention another hot topic: who says we will be on GPFS forever? The recent licensing change would probably destroy our small IT budget and we would not be able to afford Spectrum Scale any longer. We might be forced to switch to a cheaper solution. At least this way we can carry some of the code we wrote with us. With AFM we would have to start from scratch. Originally we were not really planning to move as we didn't expect this change in licensing with the associated increased cost. But now, this turns out to be a small time saver if we indeed have to switch.

Kind regards.


--


Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


On Thu, 2021-03-11 at 08:47 -0500, Stephen Ulmer wrote:
Thank you! Would you mind letting me know in what era you made your evaluation?

I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:

?
Hello Stephen,

actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.

The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.

Kind regards.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/fe9b528f/attachment.htm>

From sadaniel at us.ibm.com  Thu Mar 11 16:08:11 2021
From: sadaniel at us.ibm.com (Steven Daniels)
Date: Thu, 11 Mar 2021 09:08:11 -0700
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
	<AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
Message-ID: <OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404@notes.na.collabserv.com>

Also, be aware there have been massive improvements in AFM, in terms of
usability, reliablity and performance.

I just completed a project where we moved about 3/4 PB during 7x24
operations to retire a very old storage system (1st Gen IBM GSS) to a new
ESS. We were able to get considerable performance but not without effort,
it allowed the client to continue operations and migrate to new hardware
seamlessly.

The new v5.1 AFM feature supports filesystem level AFM which would have
greatly simplified the effort and I believe will make AFM vastly easier to
implement in the general case.

I'll leave it to Venkat and others on the development team to share more
details about improvements.


Steven A. Daniels
Cross-brand Client Architect
Senior Certified IT Specialist
National Programs
Fax and Voice: 3038101229
sadaniel at us.ibm.com
http://www.ibm.com


From:	Stephen Ulmer <ulmer at ulmer.org>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:	bill.burke.860 at gmail.com
Date:	03/11/2021 06:47 AM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with
            Rsync
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Thank you! Would you mind letting me know in what era you made your
evaluation?

I?m not suggesting you should change anything at all, but when I make
recommendations for my own customers I like to be able to associate the
level of GPFS with the anecdotes. I view the software as more of a stream
of features and capabilities than as a set product.

Different clients have different requirements, so every implementation
could be different. When I add someone else?s judgement to my own, I just
like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


      On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico
      <enrico.tagliavini at fmi.ch> wrote:

      ?
      Hello Stephen,

      actually not a dumb question at all. We evaluated AFM quite a bit
      before turning it down.

      The horror stories about it and massive data loss are too scary. Plus
      we had actual reports of very bad performance. Personally I think AFM
      is very complicated, overcomplicated for what we need. We need the
      data safe, we don't need active / active DR or anything like that.
      While AFM can technically do what we need the complexity of its
      design makes it too easy to make a mistake and cause a service
      disruption or, even worst, data loss. We are a very small institute
      with a small IT team, so investing time in making it right was also
      not really worth it due to the high TCO.

      Kind regards.

      --
      Enrico Tagliavini
      Systems / Software Engineer

      enrico.tagliavini at fmi.ch

      Friedrich Miescher Institute for Biomedical Research
      Infomatics

      Maulbeerstrasse 66
      4058 Basel
      Switzerland


      On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
        I?m going to ask what may be a dumb question:

        Given that you have GPFS on both ends, what made you decide to NOT
        use AFM?

         --
        Stephen


         On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico
         <enrico.tagliavini at fmi.ch> wrote:

         ?Hello William,

         I've got your email forwarded my another user and I decided to
         subscribe to give you my two cents.

         I would like to warn you about the risk of dong what you have in
         mind. Using the GPFS policy engine to get a list of file to rsync
         is
         easily going to get you with missing data in the backup. The
         problem is that there are cases that are not covered by it. For
         example
         if you mv a folder with a lot of nested subfolders and files none
         of the subfolders would show up in your list of files to be
         updated.

         DM API would be the way to go, as you could replicate the mv on
         the backup side, but you must not miss any event, which scares me
         enough not to go that route.

         What I ended up doing instead: we run GPFS on both side, main and
         backup storage. So I use the policy engine on both sides and just
         build up the differences. We have about 250 million files and this
         is surprisingly fast. On top of that add all the files for which
         the ctime changes in the last couple of days (to update metadata
         info).

         Good luck.
         Kind regards.

         --

         Enrico Tagliavini
         Systems / Software Engineer

         enrico.tagliavini at fmi.ch

         Friedrich Miescher Institute for Biomedical Research
         Infomatics

         Maulbeerstrasse 66
         4058 Basel
         Switzerland


         -------- Forwarded Message --------

           -----Original Message-----
           From: gpfsug-discuss-bounces at spectrumscale.org
           <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan
           Novosielski
           Sent: Wednesday, March 10, 2021 3:22 AM
           To: gpfsug main discussion list
           <gpfsug-discuss at spectrumscale.org>
           Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync

           Yup, you want to use the policy engine:

           https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

           Something in here ought to help. We do something like this (but
           I?m reluctant to provide examples as I?m actually suspicious
           that we
           don?t have it quite right and are passing far too much stuff to
           rsync).

           --
           #BlackLivesMatter
           ____
              \\UTGERS,
              |---------------------------*O*---------------------------
              _// the State |         Ryan Novosielski -
              novosirj at rutgers.edu
              \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
              RBHS Campus
               \\    of NJ | Office of Advanced Research Computing - MSB
              C630, Newark
                `'

            On Mar 9, 2021, at 9:19 PM, William Burke
            <bill.burke.860 at gmail.com> wrote:

             I would like to know what files were modified/created/deleted
            (only for the current day) on the GPFS's file system so that I
            could rsync ONLY those files to a predetermined external
            location. I am running GPFS 4.2.3.9

            Is there a way to access the GPFS's metadata directly so that I
            do not have to traverse the filesystem looking for these files?
            If
            i use the rsync tool it will scan the file system which is 400+
            million files.  Obviously this will be problematic to complete
            a
            scan in a day, if it would ever complete single-threaded. There
            are tools or scripts that run multithreaded rsync but it's
            still a
            brute force attempt. and it would be nice to know where the
            delta of files that have changed.

            I began looking at Spectrum Scale Data Management (DM) API but
            I am not sure if this is the best approach to looking at the
            GPFS
            metadata - inodes, modify times, creation times, etc.


            --

            Best Regards,

            William Burke (he/him)
            Lead HPC Engineer
            Advance Research Computing
            860.255.8832 m | LinkedIn
            _______________________________________________
            gpfsug-discuss mailing list
            gpfsug-discuss at spectrumscale.org
            http://gpfsug.org/mailman/listinfo/gpfsug-discuss

           _______________________________________________
           gpfsug-discuss mailing list
           gpfsug-discuss at spectrumscale.org
           http://gpfsug.org/mailman/listinfo/gpfsug-discuss
         _______________________________________________
         gpfsug-discuss mailing list
         gpfsug-discuss at spectrumscale.org
         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/c7c77d22/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A816397.jpg
Type: image/jpeg
Size: 4919 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/c7c77d22/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/c7c77d22/attachment.gif>

From novosirj at rutgers.edu  Thu Mar 11 16:28:57 2021
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Thu, 11 Mar 2021 16:28:57 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404@notes.na.collabserv.com>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
	<AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
	<OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404@notes.na.collabserv.com>
Message-ID: <1298DFDD-9701-4FE4-9B06-1541455E0F52@rutgers.edu>

Agreed. Since 5.0.4.1 on the client side (we do rely on it for home directories that are geographically distributed), we have effectively not had any more problems. Our server side are all 5.0.3.2-3. 

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Mar 11, 2021, at 11:08 AM, Steven Daniels <sadaniel at us.ibm.com> wrote:
> 
> Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. 
> 
> I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly.
> 
> The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. 
> 
> I'll leave it to Venkat and others on the development team to share more details about improvements. 
> 
> 
> Steven A. Daniels
> Cross-brand Client Architect
> Senior Certified IT Specialist
> National Programs
> Fax and Voice: 3038101229
> sadaniel at us.ibm.com
> http://www.ibm.com
> <1A816397.jpg>
> 
> <graycol.gif>Stephen Ulmer ---03/11/2021 06:47:59 AM---Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting y
> 
> From:  Stephen Ulmer <ulmer at ulmer.org>
> To:  gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc:  bill.burke.860 at gmail.com
> Date:  03/11/2021 06:47 AM
> Subject:  [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync
> Sent by:  gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Thank you! Would you mind letting me know in what era you made your evaluation?
> 
> I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.
> 
> Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.
> 
> Your original post was very thoughtful, and I appreciate your time.
> 
> -- 
> Stephen
> 
> On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ? 
> Hello Stephen,
> 
> actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.
> 
> The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.
> 
> Kind regards.
> 
> -- 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> 
> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
> I?m going to ask what may be a dumb question:
> 
> Given that you have GPFS on both ends, what made you decide to NOT use AFM?
> 
> --  
> Stephen
> 
> 
> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ?Hello William,
> 
> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
> 
> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
> 
> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
> enough not to go that route.
> 
> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
> the ctime changes in the last couple of days (to update metadata info).
> 
> Good luck.
> Kind regards.
> 
> -- 
> 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> -------- Forwarded Message --------
> 
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
> Sent: Wednesday, March 10, 2021 3:22 AM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
> 
> Yup, you want to use the policy engine:
> 
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
> 
> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
> don?t have it quite right and are passing far too much stuff to rsync).
> 
> --
> #BlackLivesMatter
> ____
> \\UTGERS, |---------------------------*O*---------------------------
> _// the State | Ryan Novosielski - novosirj at rutgers.edu
> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
> `'
> 
> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> 
> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> 
> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a
> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
> brute force attempt. and it would be nice to know where the delta of files that have changed.
> 
> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
> metadata - inodes, modify times, creation times, etc.
> 
> 
> 
> --
> 
> Best Regards,
> 
> William Burke (he/him)
> Lead HPC Engineer
> Advance Research Computing
> 860.255.8832 m | LinkedIn
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From honwai.leong at sydney.edu.au  Thu Mar 11 22:28:57 2021
From: honwai.leong at sydney.edu.au (Honwai Leong)
Date: Thu, 11 Mar 2021 22:28:57 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
Message-ID: <SYBPR01MB477885175569465AF7B15139D1909@SYBPR01MB4778.ausprd01.prod.outlook.com>

This paper might provide some ideas, not the best solution but works fine 

https://github.com/HPCSYSPROS/Workshop20/blob/master/Parallelized_data_replication_of_multi-petabyte_storage_systems/ws_hpcsysp103s1-file1.pdf

It is a two-part workflow to replicate files from production to DR site. It leverages on snapshot ID to determine which files have been updated/modified after a snapshot was taken. It doesn't take care of deletion of files moved from one directory to another, so it uses dsync to take care of that part. 

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of gpfsug-discuss-request at spectrumscale.org
Sent: Friday, March 12, 2021 3:08 AM
To: gpfsug-discuss at spectrumscale.org
Subject: gpfsug-discuss Digest, Vol 110, Issue 20

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
	gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Fwd: FW:  Backing up GPFS with Rsync (Steven Daniels)


----------------------------------------------------------------------

Message: 1
Date: Thu, 11 Mar 2021 09:08:11 -0700
From: "Steven Daniels" <sadaniel at us.ibm.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc: gpfsug-discuss-bounces at spectrumscale.org, bill.burke.860 at gmail.com
Subject: Re: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
Message-ID:
	<OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404 at notes.na.collabserv.com>
	
Content-Type: text/plain; charset="utf-8"

Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance.

I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly.

The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case.

I'll leave it to Venkat and others on the development team to share more details about improvements.


Steven A. Daniels
Cross-brand Client Architect
Senior Certified IT Specialist
National Programs
Fax and Voice: 3038101229
sadaniel at us.ibm.com
https://protect-au.mimecast.com/s/ZnryCr81nyt88D8ZkuztwY-?domain=ibm.com


From:	Stephen Ulmer <ulmer at ulmer.org>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:	bill.burke.860 at gmail.com
Date:	03/11/2021 06:47 AM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with
            Rsync
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Thank you! Would you mind letting me know in what era you made your evaluation?

I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


      On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico
      <enrico.tagliavini at fmi.ch> wrote:

      ?
      Hello Stephen,

      actually not a dumb question at all. We evaluated AFM quite a bit
      before turning it down.

      The horror stories about it and massive data loss are too scary. Plus
      we had actual reports of very bad performance. Personally I think AFM
      is very complicated, overcomplicated for what we need. We need the
      data safe, we don't need active / active DR or anything like that.
      While AFM can technically do what we need the complexity of its
      design makes it too easy to make a mistake and cause a service
      disruption or, even worst, data loss. We are a very small institute
      with a small IT team, so investing time in making it right was also
      not really worth it due to the high TCO.

      Kind regards.

      --
      Enrico Tagliavini
      Systems / Software Engineer

      enrico.tagliavini at fmi.ch

      Friedrich Miescher Institute for Biomedical Research
      Infomatics

      Maulbeerstrasse 66
      4058 Basel
      Switzerland


      On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
        I?m going to ask what may be a dumb question:

        Given that you have GPFS on both ends, what made you decide to NOT
        use AFM?

         --
        Stephen


         On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico
         <enrico.tagliavini at fmi.ch> wrote:

         ?Hello William,

         I've got your email forwarded my another user and I decided to
         subscribe to give you my two cents.

         I would like to warn you about the risk of dong what you have in
         mind. Using the GPFS policy engine to get a list of file to rsync
         is
         easily going to get you with missing data in the backup. The
         problem is that there are cases that are not covered by it. For
         example
         if you mv a folder with a lot of nested subfolders and files none
         of the subfolders would show up in your list of files to be
         updated.

         DM API would be the way to go, as you could replicate the mv on
         the backup side, but you must not miss any event, which scares me
         enough not to go that route.

         What I ended up doing instead: we run GPFS on both side, main and
         backup storage. So I use the policy engine on both sides and just
         build up the differences. We have about 250 million files and this
         is surprisingly fast. On top of that add all the files for which
         the ctime changes in the last couple of days (to update metadata
         info).

         Good luck.
         Kind regards.

         --

         Enrico Tagliavini
         Systems / Software Engineer

         enrico.tagliavini at fmi.ch

         Friedrich Miescher Institute for Biomedical Research
         Infomatics

         Maulbeerstrasse 66
         4058 Basel
         Switzerland


         -------- Forwarded Message --------

           -----Original Message-----
           From: gpfsug-discuss-bounces at spectrumscale.org
           <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan
           Novosielski
           Sent: Wednesday, March 10, 2021 3:22 AM
           To: gpfsug main discussion list
           <gpfsug-discuss at spectrumscale.org>
           Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync

           Yup, you want to use the policy engine:

           https://protect-au.mimecast.com/s/5FXFCvl1rKi77y78YhzCNU5?domain=ibm.com

           Something in here ought to help. We do something like this (but
           I?m reluctant to provide examples as I?m actually suspicious
           that we
           don?t have it quite right and are passing far too much stuff to
           rsync).

           --
           #BlackLivesMatter
           ____
              \\UTGERS,
              |---------------------------*O*---------------------------
              _// the State |         Ryan Novosielski -
              novosirj at rutgers.edu
              \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
              RBHS Campus
               \\    of NJ | Office of Advanced Research Computing - MSB
              C630, Newark
                `'

            On Mar 9, 2021, at 9:19 PM, William Burke
            <bill.burke.860 at gmail.com> wrote:

             I would like to know what files were modified/created/deleted
            (only for the current day) on the GPFS's file system so that I
            could rsync ONLY those files to a predetermined external
            location. I am running GPFS 4.2.3.9

            Is there a way to access the GPFS's metadata directly so that I
            do not have to traverse the filesystem looking for these files?
            If
            i use the rsync tool it will scan the file system which is 400+
            million files.  Obviously this will be problematic to complete
            a
            scan in a day, if it would ever complete single-threaded. There
            are tools or scripts that run multithreaded rsync but it's
            still a
            brute force attempt. and it would be nice to know where the
            delta of files that have changed.

            I began looking at Spectrum Scale Data Management (DM) API but
            I am not sure if this is the best approach to looking at the
            GPFS
            metadata - inodes, modify times, creation times, etc.


            --

            Best Regards,

            William Burke (he/him)
            Lead HPC Engineer
            Advance Research Computing
            860.255.8832 m | LinkedIn
            _______________________________________________
            gpfsug-discuss mailing list
            gpfsug-discuss at spectrumscale.org
            https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org

           _______________________________________________
           gpfsug-discuss mailing list
           gpfsug-discuss at spectrumscale.org
           https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
         _______________________________________________
         gpfsug-discuss mailing list
         gpfsug-discuss at spectrumscale.org
         https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://protect-au.mimecast.com/s/uNqKCwV1vMfGGRGxqcKIIVS?domain=urldefense.proofpoint.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://protect-au.mimecast.com/s/bouzCxngwOf11Q1v7TRQ-qb?domain=gpfsug.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A816397.jpg
Type: image/jpeg
Size: 4919 bytes
Desc: not available
URL: <https://protect-au.mimecast.com/s/MVTSCyojxQTrryro8UA5AGt?domain=gpfsug.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <https://protect-au.mimecast.com/s/D4DACzvkyVCMMmMqkcB4NCX?domain=gpfsug.org>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org


End of gpfsug-discuss Digest, Vol 110, Issue 20
***********************************************


From juergen.hannappel at desy.de  Mon Mar 15 16:20:51 2021
From: juergen.hannappel at desy.de (Hannappel, Juergen)
Date: Mon, 15 Mar 2021 17:20:51 +0100 (CET)
Subject: [gpfsug-discuss] Detecting open files
Message-ID: <1985303510.24419797.1615825251660.JavaMail.zimbra@desy.de>

Hi,
when unlinking filesets that sometimes fails because some open files on that fileset still exist.

Is there a way to find which files are open, and from which node?
Without running a mmdsh -N all lsof  on serveral (big) remote clusters, that is. 

-- 
Dr. J?rgen Hannappel  DESY/IT    Tel.  : +49 40 8998-4616
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1711 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210315/e23fa39c/attachment.bin>

From Robert.Oesterlin at nuance.com  Wed Mar 17 11:59:57 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 17 Mar 2021 11:59:57 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key
 value violates unique constraint
Message-ID: <C774922F-FB37-43C7-A8E3-AE331108D4F2@nuance.com>

Anyone run into this error from the GUI task ?FILESYSTEM_MOUNT? or ideas on how to fix it?

Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 07:55:14.051000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists.

Call getNextException to see other errors in the batch.,Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg5_tools','ems1-hs','RO','2021-03-17 07:55:15.686000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg5_tools) already exists. Call getNextException to see other errors in the batch.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/f11b7bd9/attachment.htm>

From A.Wolf-Reber at de.ibm.com  Wed Mar 17 14:18:56 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 17 Mar 2021 14:18:56 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
In-Reply-To: <C774922F-FB37-43C7-A8E3-AE331108D4F2@nuance.com>
References: <C774922F-FB37-43C7-A8E3-AE331108D4F2@nuance.com>
Message-ID: <OF085714D2.A1FDDAF3-ON0025869B.004E3EEA-0025869B.004EA345@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898090.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898091.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898092.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0002.png>

From Robert.Oesterlin at nuance.com  Wed Mar 17 14:30:36 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 17 Mar 2021 14:30:36 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
Message-ID: <D28DC646-3309-477B-AA8F-FC0DEC7EC67F@nuance.com>

Can you give me details on how to do this? I tried this:

[root at ess1ems ~]# su postgres -c 'psql -d postgres -c "delete from fscc.filesystem_mounts"'
could not change directory to "/root"
psql: FATAL:  Peer authentication failed for user "postgres"


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Alexander Wolf <A.Wolf-Reber at de.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, March 17, 2021 at 9:19 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint

CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________
This is strange, the Java code should only try to insert rows that are not already there. If it was just the insert for the duplicate row we could ignore it. But this is a batch insert failing and therefore the FILESYSTEM_MOUNTS table does not get updated anymore. A quick fix is to launch the psql client and do a "delete from fscc.filesystem_mounts" to clear the table and run the FILESYSTEM_MOUNT task afterwards to repopulate it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/d2e9f442/attachment.htm>

From A.Wolf-Reber at de.ibm.com  Wed Mar 17 15:09:51 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 17 Mar 2021 15:09:51 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
In-Reply-To: <D28DC646-3309-477B-AA8F-FC0DEC7EC67F@nuance.com>
References: <D28DC646-3309-477B-AA8F-FC0DEC7EC67F@nuance.com>
Message-ID: <OFC1C7EC99.FD4CA2B7-ON0025869B.00516D01-0025869B.00534CC4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898093.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898094.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898095.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0002.png>

From Robert.Oesterlin at nuance.com  Wed Mar 17 15:33:54 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 17 Mar 2021 15:33:54 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
Message-ID: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com>

The command completed, and I re-ran the FILESYSTEM_MOUNT, but it failed the same way.

[root at ess1ems ~]# psql postgres postgres -c "delete from fscc.filesystem_mounts"
DELETE 20

/usr/lpp/mmfs/gui/cli/runtask FILESYSTEM_MOUNT -debug
10:32 AM
Operation Failed
10:32 AM
Error: debug: locale=en_US
debug: Running 'mmlsmount 'fs1' -Y ' on node localhost
debug: Running 'mmlsmount 'fs2' -Y ' on node localhost
debug: Running 'mmlsmount 'fs3' -Y ' on node localhost
debug: Running 'mmlsmount 'fs4' -Y ' on node localhost
debug: Running 'mmlsmount 'nrg1_tools' -Y ' on node localhost
debug: Running 'mmlsmount 'nrg5_tools' -Y ' on node localhost
err: java.sql.BatchUpdateException: Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 11:32:38.830000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk"
Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists. Call getNextException to see other errors in the batch.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Alexander Wolf <A.Wolf-Reber at de.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, March 17, 2021 at 10:10 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint

CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________
I think

    psql postgres postgres -c "delete from fscc.filesystem_mounts"'

ran as root should do the trick.

Mit freundlichen Gr??en / Kind regards

[cid:image001.png at 01D71B19.07732D00]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/aa71a5d5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1135 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/aa71a5d5/attachment.png>

From A.Wolf-Reber at de.ibm.com  Wed Mar 17 17:05:11 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 17 Mar 2021 17:05:11 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
In-Reply-To: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com>
References: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com>
Message-ID: <OF7AA3CED5.BCAF2104-ON0025869B.005CE45C-0025869B.005DDC1A@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898096.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898097.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.image001.png at 01D71B19.07732D00.png
Type: image/png
Size: 1135 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898098.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0003.png>

From robert.horton at icr.ac.uk  Thu Mar 18 15:47:07 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Thu, 18 Mar 2021 15:47:07 +0000
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
Message-ID: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>

Hello,

We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that?

Thanks,
Rob

 Call Trace:
? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26]
 _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26]
 _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca

--

Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk | Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210318/0adafef8/attachment.htm>

From vpuvvada at in.ibm.com  Fri Mar 19 06:32:00 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 19 Mar 2021 12:02:00 +0530
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
Message-ID: <OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>

Robert,

What is the scale version ? This issue may be related to these alerts.

https://www.ibm.com/support/pages/node/6355983
https://www.ibm.com/support/pages/node/6380740

These are the recommended steps to resolve the issue, but need more 
details on the scale version.

1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" 
command.
2. Perform rolling upgrade parallely at both cache and home clusters
    a. All nodes on home cluster to 5.0.5.6
    b. All gateway nodes in cache cluster to 5.0.5.6
 3. At home cluster, for each fileset target path, repeat below steps
      a. Remove .afmctl file
         mmafmlocal rm <fileset target path>/.afm/.afmctl
      b. Enable AFM
         mmafmconfig enable <fileset target path>
4. Start all AFM filesets at cache using "mmafmctl device start -j 
fileset" command. 

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/18/2021 09:17 PM
Subject:        [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,

We've recently started having an issue where processes running in a 
singularity container get stuck in a soft lockup and eventually the node 
needs to be forcibly rebooted. I have included a sample call trace below. 
Additionally, other (non-singularity) processes on other nodes accessing 
the same fileset seem to get into the same state. It's also an AFM IW 
fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? 
I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - 
although fact it seems to spread to other nodes would seem to suggest some 
GPFS involvement. It's possible the user is doing something inadvisable 
with Singularity (it's difficult to work out what's happening in the 
Nextflow pipeline) but even if they are it would be good to find a way of 
preventing them taking nodes down. I'm assuming the AFM is unlikely to be 
relevant - any views on that?

Thanks,
Rob

 Call Trace:
? 
_Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 
[mmfs26]
 
_ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 
[mmfs26]
 
_Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 
[mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? 
_ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 
[mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 
[mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 
_Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 
[mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? 
_Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 
[mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? 
_ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 
[mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | 
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210319/cf4e9cc2/attachment.htm>

From robert.horton at icr.ac.uk  Fri Mar 19 09:42:22 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Fri, 19 Mar 2021 09:42:22 +0000
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
	<OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
Message-ID: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk>

Hi Venkat,

Thanks for getting back to me.

On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 everywhere else, including gateway nodes.
The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the licensing but we're in the process of replacing that system.

The actual AFM seems to be behaving fine though so I'm not sure that's our issue. I guess our next job is to see if we can reproduce it in a non-AFM fileset.

Rob

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe.

Robert,

What is the scale version ? This issue may be related to these alerts.

https://www.ibm.com/support/pages/node/6355983
https://www.ibm.com/support/pages/node/6380740

These are the recommended steps to resolve the issue, but need more details on the scale version.

1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command.
2. Perform rolling upgrade parallely at both cache and home clusters
    a. All nodes on home cluster to 5.0.5.6
    b. All gateway nodes in cache cluster to 5.0.5.6
 3. At home cluster, for each fileset target path, repeat below steps
      a. Remove .afmctl file
         mmafmlocal rm <fileset target path>/.afm/.afmctl
      b. Enable AFM
         mmafmconfig enable <fileset target path>
4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command.

~Venkat (vpuvvada at in.ibm.com)


From:        Robert Horton <robert.horton at icr.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        03/18/2021 09:17 PM
Subject:        [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that?

Thanks,
Rob

 Call Trace:
? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26]
 _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26]
 _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
--
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk>| W www.icr.ac.uk| Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network._______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e=


--

Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk | Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210319/039db1ec/attachment.htm>

From vpuvvada at in.ibm.com  Fri Mar 19 09:50:04 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 19 Mar 2021 15:20:04 +0530
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk><OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
	<459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk>
Message-ID: <OFFEAA8537.FB8BE861-ON6525869D.0035B3A4-6525869D.003605D6@notes.na.collabserv.com>

Hi Robert,

So you might have started seeing problem after upgrading the gateway nodes 
to 5.0.5.2. Upgrading gateway nodes at cache cluster to 5.0.5.6 would 
resolve this problem.

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/19/2021 03:13 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Venkat,

Thanks for getting back to me.

On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 
everywhere else, including gateway nodes.
The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the 
licensing but we're in the process of replacing that system.

The actual AFM seems to be behaving fine though so I'm not sure that's our 
issue. I guess our next job is to see if we can reproduce it in a non-AFM 
fileset.

Rob

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
CAUTION: This email originated from outside of the ICR. Do not click links 
or open attachments unless you recognize the sender's email address and 
know the content is safe.

Robert,

What is the scale version ? This issue may be related to these alerts.

https://www.ibm.com/support/pages/node/6355983
https://www.ibm.com/support/pages/node/6380740

These are the recommended steps to resolve the issue, but need more 
details on the scale version.

1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" 
command.
2. Perform rolling upgrade parallely at both cache and home clusters
    a. All nodes on home cluster to 5.0.5.6
    b. All gateway nodes in cache cluster to 5.0.5.6
 3. At home cluster, for each fileset target path, repeat below steps
      a. Remove .afmctl file
         mmafmlocal rm <fileset target path>/.afm/.afmctl
      b. Enable AFM
         mmafmconfig enable <fileset target path>
4. Start all AFM filesets at cache using "mmafmctl device start -j 
fileset" command. 

~Venkat (vpuvvada at in.ibm.com)


From:        Robert Horton <robert.horton at icr.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:        03/18/2021 09:17 PM
Subject:        [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,

We've recently started having an issue where processes running in a 
singularity container get stuck in a soft lockup and eventually the node 
needs to be forcibly rebooted. I have included a sample call trace below. 
Additionally, other (non-singularity) processes on other nodes accessing 
the same fileset seem to get into the same state. It's also an AFM IW 
fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? 
I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - 
although fact it seems to spread to other nodes would seem to suggest some 
GPFS involvement. It's possible the user is doing something inadvisable 
with Singularity (it's difficult to work out what's happening in the 
Nextflow pipeline) but even if they are it would be good to find a way of 
preventing them taking nodes down. I'm assuming the AFM is unlikely to be 
relevant - any views on that?

Thanks,
Rob

 Call Trace:
? 
_Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 
[mmfs26]
 
_ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 
[mmfs26]
 
_Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 
[mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? 
_ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 
[mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 
[mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 
_Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 
[mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? 
_Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 
[mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? 
_ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 
[mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk| W www.icr.ac.uk| 
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e=


-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | 
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=KgYs-kXBKE5JoAaGYRiU9iIxNkJSZeicxpSTmL39_B8&s=6FodZ_EQ8VAOE_xoEkfoUzmJpaiF7bgbERvA9avLZfg&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210319/21e8dcdd/attachment.htm>

From u.sibiller at science-computing.de  Mon Mar 22 09:32:10 2021
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 22 Mar 2021 10:32:10 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
Message-ID: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>

Hello,

we usually create filesets for project dirs and homes.

Unfortunately we have discovered that this convention has been ignored for some dirs and their data 
no resides in the root fileset. We would like to move the data to independent filesets.

Is there a way to do this without having to schedule a downtime for the dirs in question?

I mean, is there a way to transparently move data to an independent fileset at the same path?


Kind regards,

Ulrich Sibiller
-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From janfrode at tanso.net  Mon Mar 22 09:54:28 2021
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Mon, 22 Mar 2021 10:54:28 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
Message-ID: <CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>

No ? all copying between filesets require full data copy. No simple rename.

This might be worthy of an RFE, as it?s a bit unexpected, and could
potentially work more efficiently..


  -jf

man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller <
u.sibiller at science-computing.de>:

> Hello,
>
> we usually create filesets for project dirs and homes.
>
> Unfortunately we have discovered that this convention has been ignored for
> some dirs and their data
> no resides in the root fileset. We would like to move the data to
> independent filesets.
>
> Is there a way to do this without having to schedule a downtime for the
> dirs in question?
>
> I mean, is there a way to transparently move data to an independent
> fileset at the same path?
>
>
> Kind regards,
>
> Ulrich Sibiller
> --
> Science + Computing AG
> Vorstandsvorsitzender/Chairman of the board of management:
> Dr. Martin Matzke
> Vorstand/Board of Management:
> Matthias Schempp, Sabine Hohenstein
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Philippe Miltin
> Aufsichtsrat/Supervisory Board:
> Martin Wibbe, Ursula Morgenstern
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210322/ecf10a03/attachment.htm>

From S.J.Thompson at bham.ac.uk  Mon Mar 22 12:24:59 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 22 Mar 2021 12:24:59 +0000
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
	<CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
Message-ID: <DAA1D98E-D106-4783-8C64-4E495570083B@bham.ac.uk>

You could maybe create the new file-set, link in a different place, copy the data ?

Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially reducing the time to do the copy.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "janfrode at tanso.net" <janfrode at tanso.net>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Monday, 22 March 2021 at 09:54
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Move data to fileset seamlessly

No ? all copying between filesets require full data copy. No simple rename.

This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently..


  -jf

man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller <u.sibiller at science-computing.de<mailto:u.sibiller at science-computing.de>>:
Hello,

we usually create filesets for project dirs and homes.

Unfortunately we have discovered that this convention has been ignored for some dirs and their data
no resides in the root fileset. We would like to move the data to independent filesets.

Is there a way to do this without having to schedule a downtime for the dirs in question?

I mean, is there a way to transparently move data to an independent fileset at the same path?


Kind regards,

Ulrich Sibiller
--
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210322/02a88c29/attachment.htm>

From u.sibiller at science-computing.de  Mon Mar 22 13:20:46 2021
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 22 Mar 2021 14:20:46 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <DAA1D98E-D106-4783-8C64-4E495570083B@bham.ac.uk>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
	<CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
	<DAA1D98E-D106-4783-8C64-4E495570083B@bham.ac.uk>
Message-ID: <a40d9dea-9247-db0d-6a7a-d9e6c8913435@science-computing.de>

On 22.03.21 13:24, Simon Thompson wrote:
> You could maybe create the new file-set, link in a different place, copy the data ?
> 
> Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially 
> reducing the time to do the copy.

Yes, but this does not help if a file is open all the time, e.g. during a long-running job.

Uli
-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From u.sibiller at science-computing.de  Mon Mar 22 13:41:39 2021
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 22 Mar 2021 14:41:39 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
	<CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
Message-ID: <6f626186-cb7a-46d5-781c-8f3a21b7e270@science-computing.de>

On 22.03.21 10:54, Jan-Frode Myklebust wrote:
> No ? all copying between filesets require full data copy. No simple rename.
> 
> This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently..

Yes, your are right. So please vote here:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=149429

Uli

-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From robert.horton at icr.ac.uk  Tue Mar 23 19:02:05 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Tue, 23 Mar 2021 19:02:05 +0000
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
	<OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
Message-ID: <a3ed2b7dcfb13def6281a8a8be214c14c68b780b.camel@icr.ac.uk>

Hi,

Sorry for the delay...

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
> 
...
> 1. Stop all AFM filesets at cache using "mmafmctl device stop -j
> fileset" command.
> 2. Perform rolling upgrade parallely at both cache and home clusters
>     a. All nodes on home cluster to 5.0.5.6
>     b. All gateway nodes in cache cluster to 5.0.5.6
>  3. At home cluster, for each fileset target path, repeat below steps
>       a. Remove .afmctl file
>          mmafmlocal rm <fileset target path>/.afm/.afmctl
>       b. Enable AFM

At point 3 I'm getting:

# mmafmlocal rm <target path>/.afm/.afmctl
/bin/rm: cannot remove ?<target path>/.afm/.afmctl?: Operation not permitted

afmconfig disable is the same.

Any idea what the issue is?

Thanks,
Rob
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

From vpuvvada at in.ibm.com  Wed Mar 24 02:36:31 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Wed, 24 Mar 2021 08:06:31 +0530
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <a3ed2b7dcfb13def6281a8a8be214c14c68b780b.camel@icr.ac.uk>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk><OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
	<a3ed2b7dcfb13def6281a8a8be214c14c68b780b.camel@icr.ac.uk>
Message-ID: <OF3845FB87.4819CCDC-ON652586A2.000E1B29-652586A2.000E5477@notes.na.collabserv.com>

># mmafmlocal rm <target path>/.afm/.afmctl
>/bin/rm: cannot remove ?<target path>/.afm/.afmctl?: Operation not 
permitted

This step is only required if home cluster is on 5.0.5.2/5.0.5.3. You can 
ignore this issue, and restart AFM filesets at cache.

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/24/2021 12:33 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,

Sorry for the delay...

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
> 
...
> 1. Stop all AFM filesets at cache using "mmafmctl device stop -j
> fileset" command.
> 2. Perform rolling upgrade parallely at both cache and home clusters
>     a. All nodes on home cluster to 5.0.5.6
>     b. All gateway nodes in cache cluster to 5.0.5.6
>  3. At home cluster, for each fileset target path, repeat below steps
>       a. Remove .afmctl file
>          mmafmlocal rm <fileset target path>/.afm/.afmctl
>       b. Enable AFM

At point 3 I'm getting:

# mmafmlocal rm <target path>/.afm/.afmctl
/bin/rm: cannot remove ?<target path>/.afm/.afmctl?: Operation not 
permitted

afmconfig disable is the same.

Any idea what the issue is?

Thanks,
Rob
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only.  If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=OLf3tBvTItpLRieM34xb8Xd69tBYbwTDYAecT0D_B7k&s=FCJEEoTWGIoM4eY4SMzE55qskwhAnxC_noZu7fJHoqw&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210324/cae042b1/attachment.htm>

From prasad.surampudi at theatsgroup.com  Wed Mar 24 14:32:30 2021
From: prasad.surampudi at theatsgroup.com (Prasad Surampudi)
Date: Wed, 24 Mar 2021 14:32:30 +0000
Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset
 statistics for some filesystems
Message-ID: <MN2PR13MB2976E1827F4FDDCD08FE9F0C9E639@MN2PR13MB2976.namprd13.prod.outlook.com>

Recently while checking fileset quotas in a ESS cluster, we noticed that the mmrepquota command is not reporting the root fileset quota and inode details for some filesystems. Does anyone else also saw this issue?

Please see the output below. The root fileset shows up for 'prod' filesystem and does not show up for 'prod-private'. I could not figure out why it does not show up for prod-private. Any ideas?

/usr/lpp/mmfs/bin/mmrepquota -j prod-private
                         Block Limits                                    |                     File Limits
Name       fileset    type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace
xFIN    root       FILESET    12028144          0          0          0     none |  4524237       0        0        0     none

/usr/lpp/mmfs/bin/mmrepquota -j prod
                         Block Limits                                    |                     File Limits
Name       fileset    type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace
root       root       FILESET     7106656          0          0 1273643728     none |        7       0        0      400     none
xxx_tick root       FILESET           0          0          0          0     none |        1       0        0        0     none

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210324/f6f9ad44/attachment.htm>

From scale at us.ibm.com  Thu Mar 25 16:33:48 2021
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Thu, 25 Mar 2021 11:33:48 -0500
Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset
 statistics for some filesystems
In-Reply-To: <MN2PR13MB2976E1827F4FDDCD08FE9F0C9E639@MN2PR13MB2976.namprd13.prod.outlook.com>
References: <MN2PR13MB2976E1827F4FDDCD08FE9F0C9E639@MN2PR13MB2976.namprd13.prod.outlook.com>
Message-ID: <OF34F58B90.44FECEB5-ON852586A3.005A27F3-852586A3.005AFC86@notes.na.collabserv.com>


Prasad,

This is unexpected.  Please open a PMR so that data can be collected and
looked at.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	Prasad Surampudi <prasad.surampudi at theatsgroup.com>
To:	"gpfsug-discuss at spectrumscale.org"
            <gpfsug-discuss at spectrumscale.org>
Date:	03/24/2021 10:32 AM
Subject:	[EXTERNAL] [gpfsug-discuss] mmrepquota is not reporting root
            fileset statistics for some filesystems
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Recently while checking fileset quotas in a ESS cluster, we noticed that
the mmrepquota command is not reporting the root fileset quota and inode
details for some filesystems. Does anyone else also saw this issue?

Please see the output below. The root fileset shows up for 'prod'
filesystem and does not show up for 'prod-private'. I could not figure out
why it does not show up for prod-private. Any ideas?

/usr/lpp/mmfs/bin/mmrepquota -j prod-private
                         Block Limits                                    |
File Limits
Name       fileset    type             KB      quota      limit   in_doubt
grace |    files   quota    limit in_doubt    grace
xFIN    root       FILESET    12028144          0          0          0
none |  4524237       0        0        0     none

/usr/lpp/mmfs/bin/mmrepquota -j prod
                         Block Limits                                    |
File Limits
Name       fileset    type             KB      quota      limit   in_doubt
grace |    files   quota    limit in_doubt    grace
root       root       FILESET     7106656          0          0 1273643728
none |        7       0        0      400     none
xxx_tick root       FILESET           0          0          0          0
none |        1       0        0        0     none
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210325/497c8710/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210325/497c8710/attachment.gif>

From oluwasijibomi.saula at ndsu.edu  Mon Mar 29 19:38:00 2021
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Mon, 29 Mar 2021 18:38:00 +0000
Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node
Message-ID: <PH0PR08MB6598F1A7BC557225D417D8C9987E9@PH0PR08MB6598.namprd08.prod.outlook.com>

Hello Folks,

So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset.

These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7.

Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted:


2021-03-29_12:47:37.343-0500: [N] mmfsd ready

2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all

2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1

2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1

2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor <c0n2>

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 <c0n2>

There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident.

I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly...

Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ?


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210329/4ce36267/attachment.htm>

From olaf.weiser at de.ibm.com  Tue Mar 30 07:06:54 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Tue, 30 Mar 2021 06:06:54 +0000
Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node
In-Reply-To: <PH0PR08MB6598F1A7BC557225D417D8C9987E9@PH0PR08MB6598.namprd08.prod.outlook.com>
References: <PH0PR08MB6598F1A7BC557225D417D8C9987E9@PH0PR08MB6598.namprd08.prod.outlook.com>
Message-ID: <OF4FF5120B.5E2B3DE7-ON002586A8.0021023A-002586A8.0021976C@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210330/ae3c3cdd/attachment.htm>

From oluwasijibomi.saula at ndsu.edu  Tue Mar 30 19:24:00 2021
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Tue, 30 Mar 2021 18:24:00 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 110, Issue 34
In-Reply-To: <mailman.61.1617084420.1331.gpfsug-discuss@spectrumscale.org>
References: <mailman.61.1617084420.1331.gpfsug-discuss@spectrumscale.org>
Message-ID: <CO1PR08MB659639A07B856B4185BF0F2E987D9@CO1PR08MB6596.namprd08.prod.outlook.com>

Hey Olaf,

We'll investigate as suggested. I'm hopeful the journald logs would provide some additional insight.

As for OFED versions, we use the same Mellanox version across the cluster and haven't seen any issues with working nodes that mount the filesystem.

We also have a PMR open with IBM but we'll send a follow-up if we discover something more for group discussion.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Tuesday, March 30, 2021 1:07 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 110, Issue 34

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Filesystem mount attempt hangs GPFS client node
      (Saula, Oluwasijibomi)
   2. Re: Filesystem mount attempt hangs GPFS client node (Olaf Weiser)


----------------------------------------------------------------------

Message: 1
Date: Mon, 29 Mar 2021 18:38:00 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client
        node
Message-ID:
        <PH0PR08MB6598F1A7BC557225D417D8C9987E9 at PH0PR08MB6598.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="utf-8"

Hello Folks,

So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset.

These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7.

Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted:


2021-03-29_12:47:37.343-0500: [N] mmfsd ready

2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all

2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1

2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1

2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor <c0n2>

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 <c0n2>

There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident.

I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly...

Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ?


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210329/4ce36267/attachment-0001.html>

------------------------------

Message: 2
Date: Tue, 30 Mar 2021 06:06:54 +0000
From: "Olaf Weiser" <olaf.weiser at de.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Filesystem mount attempt hangs GPFS
        client node
Message-ID:
        <OF4FF5120B.5E2B3DE7-ON002586A8.0021023A-002586A8.0021976C at notes.na.collabserv.com>

Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210330/ae3c3cdd/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 110, Issue 34
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210330/5df0235f/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  1 07:58:43 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 07:58:43 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk>
	<CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
Message-ID: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>

On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
> 
> I?ve tried benchmarking many vs. few vdisks per RG, and never could see 
> any performance difference.

That's encouraging.

> 
> Usually we create 1 vdisk per enclosure per RG, ? thinking this will 
> allow us to grow with same size vdisks when adding additional enclosures 
> in the future.
> 
> Don?t think mmvdisk can be told to create multiple vdisks per RG 
> directly, so you have to manually create multiple vdisk sets each with 
> the apropriate size.
> 

Thing is back in the day so GPFS v2.x/v3.x there where strict warnings 
that you needed a minimum of six NSD's for optimal performance. I have 
sat in presentations where IBM employees have said so. What we where 
told back then is that GPFS needs a minimum number of NSD's in order to 
be able to spread the I/O's out. So if an NSD is being pounded for reads 
and a write comes in it. can direct it to a less busy NSD.

Now I can imagine that in a ESS/DSS-G that as it's being scattered to 
the winds under the hood this is no longer relevant. But some notes to 
the effect for us old timers would be nice if that is the case to put 
our minds to rest.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Achim.Rehor at de.ibm.com  Mon Mar  1 08:16:43 2021
From: Achim.Rehor at de.ibm.com (Achim Rehor)
Date: Mon, 1 Mar 2021 09:16:43 +0100
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk><CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
	<0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
Message-ID: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>

The reason for having multiple NSDs in legacy NSD (non-GNR) handling is 
the increased parallelism, that gives you 'more spindles' and thus more 
performance.
In GNR the drives are used in parallel anyway through the GNRstriping. 
Therfore, you are using all drives of a ESS/GSS/DSS model under the hood 
in the vdisks anyway. 

The only reason for having more NSDs is for using them for different 
filesystems. 

 
Mit freundlichen Gr??en / Kind regards

Achim Rehor

IBM EMEA ESS/Spectrum Scale Support


gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43:

> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug-discuss at spectrumscale.org
> Date: 01/03/2021 08:58
> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
NSD's
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
> > 
> > I?ve tried benchmarking many vs. few vdisks per RG, and never could 
see 
> > any performance difference.
> 
> That's encouraging.
> 
> > 
> > Usually we create 1 vdisk per enclosure per RG,   thinking this will 
> > allow us to grow with same size vdisks when adding additional 
enclosures 
> > in the future.
> > 
> > Don?t think mmvdisk can be told to create multiple vdisks per RG 
> > directly, so you have to manually create multiple vdisk sets each with 

> > the apropriate size.
> > 
> 
> Thing is back in the day so GPFS v2.x/v3.x there where strict warnings 
> that you needed a minimum of six NSD's for optimal performance. I have 
> sat in presentations where IBM employees have said so. What we where 
> told back then is that GPFS needs a minimum number of NSD's in order to 
> be able to spread the I/O's out. So if an NSD is being pounded for reads 

> and a write comes in it. can direct it to a less busy NSD.
> 
> Now I can imagine that in a ESS/DSS-G that as it's being scattered to 
> the winds under the hood this is no longer relevant. But some notes to 
> the effect for us old timers would be nice if that is the case to put 
> our minds to rest.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
> M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
> IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
> 


From S.J.Thompson at bham.ac.uk  Mon Mar  1 09:06:07 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 1 Mar 2021 09:06:07 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk>
	<CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
	<0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
	<OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
Message-ID: <D0506AFA-A345-4101-BAD7-466B1FF2F3A1@bham.ac.uk>

Or for hedging your bets about how you might want to use it in future.

We are never quite sure if we want to do something different in the future with some of the storage, sure that might mean we want to steal some space from a file-system, but that is perfectly valid. And we have done this, both in temporary transient states (data migration between systems), or permanently (found we needed something on a separate file-system)

So yes whilst there might be no performance impact on doing this, we still do.

I vaguely recall some of the old reasoning was around IO queues in the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD server, you have 16 IO queues passing to multipath, which can help keep the data pipes full. I suspect there was some optimal number of NSDs for different storage controllers, but I don't know if anyone ever benchmarked that.

Simon

?On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com" <gpfsug-discuss-bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com> wrote:

    The reason for having multiple NSDs in legacy NSD (non-GNR) handling is 
    the increased parallelism, that gives you 'more spindles' and thus more 
    performance.
    In GNR the drives are used in parallel anyway through the GNRstriping. 
    Therfore, you are using all drives of a ESS/GSS/DSS model under the hood 
    in the vdisks anyway. 

    The only reason for having more NSDs is for using them for different 
    filesystems. 


    Mit freundlichen Gr??en / Kind regards

    Achim Rehor

    IBM EMEA ESS/Spectrum Scale Support


    gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43:

    > From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
    > To: gpfsug-discuss at spectrumscale.org
    > Date: 01/03/2021 08:58
    > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
    NSD's
    > Sent by: gpfsug-discuss-bounces at spectrumscale.org
    > 
    > On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
    > > 
    > > I?ve tried benchmarking many vs. few vdisks per RG, and never could 
    see 
    > > any performance difference.
    > 
    > That's encouraging.
    > 
    > > 
    > > Usually we create 1 vdisk per enclosure per RG,   thinking this will 
    > > allow us to grow with same size vdisks when adding additional 
    enclosures 
    > > in the future.
    > > 
    > > Don?t think mmvdisk can be told to create multiple vdisks per RG 
    > > directly, so you have to manually create multiple vdisk sets each with 

    > > the apropriate size.
    > > 
    > 
    > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings 
    > that you needed a minimum of six NSD's for optimal performance. I have 
    > sat in presentations where IBM employees have said so. What we where 
    > told back then is that GPFS needs a minimum number of NSD's in order to 
    > be able to spread the I/O's out. So if an NSD is being pounded for reads 

    > and a write comes in it. can direct it to a less busy NSD.
    > 
    > Now I can imagine that in a ESS/DSS-G that as it's being scattered to 
    > the winds under the hood this is no longer relevant. But some notes to 
    > the effect for us old timers would be nice if that is the case to put 
    > our minds to rest.
    > 
    > 
    > JAB.
    > 
    > -- 
    > Jonathan A. Buzzard                         Tel: +44141-5483420
    > HPC System Administrator, ARCHIE-WeSt.
    > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > https://urldefense.proofpoint.com/v2/url?
    > 
    u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
    > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
    > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
    > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
    > 


    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From luis.bolinches at fi.ibm.com  Mon Mar  1 09:08:20 2021
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Mon, 1 Mar 2021 09:08:20 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
Message-ID: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/297d1cf8/attachment-0001.htm>

From olaf.weiser at de.ibm.com  Mon Mar  1 09:34:26 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 09:34:26 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
References: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
Message-ID: <OF62EA85E8.65496B3A-ON0025868B.00339C57-0025868B.0034979D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/b0134bbb/attachment-0001.htm>

From Achim.Rehor at de.ibm.com  Mon Mar  1 09:46:06 2021
From: Achim.Rehor at de.ibm.com (Achim Rehor)
Date: Mon, 1 Mar 2021 10:46:06 +0100
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <D0506AFA-A345-4101-BAD7-466B1FF2F3A1@bham.ac.uk>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk><CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com><0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk><OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
	<D0506AFA-A345-4101-BAD7-466B1FF2F3A1@bham.ac.uk>
Message-ID: <OFFD16AB58.68169321-ONC125868B.00354699-C125868B.0035A890@notes.na.collabserv.com>

Correct, there was. 
The OS is dealing with pdisks, while GPFS is striping over Vdisks/NSDs.

For GNR there is a differetnt queuing setup in GPFS, than there was for 
NSDs.
See "mmfsadm dump nsd" and check for NsdQueueTraditional versus 
NsdQueueGNR 

And yes, i was too strict, with 
">     The only reason for having more NSDs is for using them for 
different 
>     filesystems."

there are other management reasons to run with a reasonable number of 
vdisks, just not performance reasons. 

    Mit freundlichen Gruessen / Kind regards

    Achim Rehor

    IBM EMEA ESS/Spectrum Scale Support


gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 10:06:07:

> From: Simon Thompson <S.J.Thompson at bham.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 01/03/2021 10:06
> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
NSD's
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> Or for hedging your bets about how you might want to use it in future.
> 
> We are never quite sure if we want to do something different in the 
> future with some of the storage, sure that might mean we want to 
> steal some space from a file-system, but that is perfectly valid. 
> And we have done this, both in temporary transient states (data 
> migration between systems), or permanently (found we needed 
> something on a separate file-system)
> 
> So yes whilst there might be no performance impact on doing this, 
westill do.
> 
> I vaguely recall some of the old reasoning was around IO queues in 
> the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD 
> server, you have 16 IO queues passing to multipath, which can help 
> keep the data pipes full. I suspect there was some optimal number of
> NSDs for different storage controllers, but I don't know if anyone 
> ever benchmarked that.
> 
> Simon
> 
> On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on 
> behalf of Achim.Rehor at de.ibm.com" <gpfsug-discuss-
> bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com> wrote:
> 
>     The reason for having multiple NSDs in legacy NSD (non-GNR) handling 
is 
>     the increased parallelism, that gives you 'more spindles' and thus 
more 
>     performance.
>     In GNR the drives are used in parallel anyway through the 
GNRstriping. 
>     Therfore, you are using all drives of a ESS/GSS/DSS model under the 
hood 
>     in the vdisks anyway. 
> 
>     The only reason for having more NSDs is for using them for different 

>     filesystems. 
> 
> 
>     Mit freundlichen Gr??en / Kind regards
> 
>     Achim Rehor
> 
>     IBM EMEA ESS/Spectrum Scale Support
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>     gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 
08:58:43:
> 
>     > From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
>     > To: gpfsug-discuss at spectrumscale.org
>     > Date: 01/03/2021 08:58
>     > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number 
of 
>     NSD's
>     > Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     > 
>     > On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
>     > > 
>     > > I?ve tried benchmarking many vs. few vdisks per RG, and never 
could 
>     see 
>     > > any performance difference.
>     > 
>     > That's encouraging.
>     > 
>     > > 
>     > > Usually we create 1 vdisk per enclosure per RG,   thinking this 
will 
>     > > allow us to grow with same size vdisks when adding additional 
>     enclosures 
>     > > in the future.
>     > > 
>     > > Don?t think mmvdisk can be told to create multiple vdisks per RG 

>     > > directly, so you have to manually create multiple vdisk setseach 
with 
> 
>     > > the apropriate size.
>     > > 
>     > 
>     > Thing is back in the day so GPFS v2.x/v3.x there where strict 
warnings 
>     > that you needed a minimum of six NSD's for optimal performance. I 
have 
>     > sat in presentations where IBM employees have said so. What we 
where 
>     > told back then is that GPFS needs a minimum number of NSD's 
inorder to 
>     > be able to spread the I/O's out. So if an NSD is being poundedfor 
reads 
> 
>     > and a write comes in it. can direct it to a less busy NSD.
>     > 
>     > Now I can imagine that in a ESS/DSS-G that as it's being scattered 
to 
>     > the winds under the hood this is no longer relevant. But some 
notes to 
>     > the effect for us old timers would be nice if that is the case to 
put 
>     > our minds to rest.
>     > 
>     > 
>     > JAB.
>     > 
>     > -- 
>     > Jonathan A. Buzzard                         Tel: +44141-5483420
>     > HPC System Administrator, ARCHIE-WeSt.
>     > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>     > _______________________________________________
>     > gpfsug-discuss mailing list
>     > gpfsug-discuss at spectrumscale.org
>     > https://urldefense.proofpoint.com/v2/url?
>     > 
> 
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
>     > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
>     > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
>     > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
>     > 
> 
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
> M&m=gU9xf_Z6rrdOa4-
> 
WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e=
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
> M&m=gU9xf_Z6rrdOa4-
> 
WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e=
> 


From jonathan.buzzard at strath.ac.uk  Mon Mar  1 11:45:45 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 11:45:45 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
References: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
Message-ID: <ab18ea94-e4ca-a4f5-2070-094acf4e08db@strath.ac.uk>

On 01/03/2021 09:08, Luis Bolinches wrote:
> Hi
 >
> There other reasons to have more than 1. It is management of those. When 
> you have to add or remove NSDs of a FS having more than 1 makes it 
> possible to empty some space and manage those in and out. Manually but 
> possible. If you have one big NSD or even 1 per enclosure it might 
> difficult or even not possible depending the number of enclosures and FS 
> utilization.
 >
> Starting some ESS version (not DSS, cant comment on that) that I do not 
> recall but in the last 6 months, we have change the default (for those 
> that use the default) to 4 NSDs per enclosure for ESS 5000. There is no 
> impact on performance either way on ESS, we tested it. But management of 
> those on the long run should be easier.
Question how does one create a none default number of vdisks per 
enclosure then?

I tried creating a stanza file and then doing mmcrvdisk but it was not 
happy, presumably because of the "new style" recovery group management

mmcrvdisk: [E] This command is not supported by recovery groups under 
management of mmvdisk.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From abeattie at au1.ibm.com  Mon Mar  1 11:53:32 2021
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Mon, 1 Mar 2021 11:53:32 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <ab18ea94-e4ca-a4f5-2070-094acf4e08db@strath.ac.uk>
Message-ID: <OF7915C61C.BB236456-ON0025868B.0041537A-1614599612656@notes.na.collabserv.com>

Jonathan,

You need to create vdisk sets which will create multiple vdisks, you can then assign vdisk sets to your filesystem. (Assigning multiple vdisks at a time)

Things to watch - free space calculations are more complex as it?s building multiple vdisks under the cover using multiple raid parameters

Also it?s worth assuming a 10% reserve or approx - drive per disk shelf for rebuild space 


Mmvdisk vdisk set ... insert parameters

https://www.ibm.com/support/knowledgecenter/mk/SSYSP8_5.3.2/com.ibm.spectrum.scale.raid.v5r02.adm.doc/bl8adm_mmvdisk.htm

Sent from my iPhone

> On 1 Mar 2021, at 21:45, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> ?On 01/03/2021 09:08, Luis Bolinches wrote:
>> Hi
>> 
>> There other reasons to have more than 1. It is management of those. When 
>> you have to add or remove NSDs of a FS having more than 1 makes it 
>> possible to empty some space and manage those in and out. Manually but 
>> possible. If you have one big NSD or even 1 per enclosure it might 
>> difficult or even not possible depending the number of enclosures and FS 
>> utilization.
>> 
>> Starting some ESS version (not DSS, cant comment on that) that I do not 
>> recall but in the last 6 months, we have change the default (for those 
>> that use the default) to 4 NSDs per enclosure for ESS 5000. There is no 
>> impact on performance either way on ESS, we tested it. But management of 
>> those on the long run should be easier.
> Question how does one create a none default number of vdisks per 
> enclosure then?
> 
> I tried creating a stanza file and then doing mmcrvdisk but it was not 
> happy, presumably because of the "new style" recovery group management
> 
> mmcrvdisk: [E] This command is not supported by recovery groups under 
> management of mmvdisk.
> 
> 
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=STXkGEO2XATS_s2pRCAAh2wXtuUgwVcx1XjUX7ELNdk&m=9HlRHByoByQcM0mY0elL-l4DgA6MzHkAGzE70Rl2p2E&s=eWRfWGpdZB-PZ_InCCjgmdQOCy6rgWj9Oi3TGGA38yY&e= 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/16786097/attachment-0001.htm>

From scl at virginia.edu  Mon Mar  1 12:31:37 2021
From: scl at virginia.edu (Losen, Stephen C (scl))
Date: Mon, 1 Mar 2021 12:31:37 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
Message-ID: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>

Hi folks,
Experimenting with POSIX ACLs on GPFS 4.2 and noticed that the Linux command setfacl clears "c" permissions that were set with mmputacl. So if I have this:

...
group:group1:rwxc
mask::rwxc
...

and I modify a different entry with:

setfacl -m group:group2:r-x dirname

then the "c" permissions above get cleared and I end up with
...
group:group1:rwx-
mask::rwx-
...

I discovered that chmod does not clear the "c" mode. Is there any filesystem option to change this behavior to leave "c" modes in place? 

Steve Losen
Research Computing
University of Virginia
scl at virginia.edu   434-924-0640


From olaf.weiser at de.ibm.com  Mon Mar  1 12:45:44 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 12:45:44 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
Message-ID: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/e5bc3a9a/attachment-0001.htm>

From stockf at us.ibm.com  Mon Mar  1 12:58:44 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 1 Mar 2021 12:58:44 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
References: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>,
	<488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
Message-ID: <OFA81D94F5.553D5F13-ON0025868B.00473C67-0025868B.00474BD5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/b76ec4b4/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  1 13:14:38 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 13:14:38 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
	<OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
Message-ID: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>

On 01/03/2021 12:45, Olaf Weiser wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> Hallo Stephen,
> behavior ... or better to say ... predicted behavior for chmod and ACLs 
> .. is not an easy thing or only? , if? you stay in either POSIX world or 
> NFSv4 world
> to be POSIX compliant, a chmod overwrites ACLs

One might argue that the general rubbishness of the mmputacl cammand, 
and if a mmsetfacl command (or similar) existed it would negate messing 
with Linux utilities to change ACL's on GPFS file systems

Only been bringing it up for over a decade now ;-)

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From olaf.weiser at de.ibm.com  Mon Mar  1 15:18:59 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 15:18:59 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>
References: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>,
	<488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu><OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
Message-ID: <OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/1d0fb999/attachment-0001.htm>

From laurence at qsplace.co.uk  Mon Mar  1 08:59:35 2021
From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow)
Date: Mon, 01 Mar 2021 08:59:35 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk><CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
	<0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
	<OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
Message-ID: <6F478E88-E350-46BF-9993-82C21ADD2262@qsplace.co.uk>

Like Jan, I did some benchmarking a few years ago when the default recommended RG's dropped to 1 per DA to meet rebuild requirements. I couldn't see any discernable difference.

As Achim has also mentioned, I just use vdisks for creating additional filesystems. Unless there is going to be a lot of shuffling of space or future filesystem builds, then I divide the RG's into say 10 vdisks to give some flexibility and granularity

There is also a flag iirc that changes the gpfs magic to consider multiple under lying disks, when I find it again........ Which can provide increased performance on traditional RAID builds.

-- Lauz

On 1 March 2021 08:16:43 GMT, Achim Rehor <Achim.Rehor at de.ibm.com> wrote:
>The reason for having multiple NSDs in legacy NSD (non-GNR) handling is
>
>the increased parallelism, that gives you 'more spindles' and thus more
>
>performance.
>In GNR the drives are used in parallel anyway through the GNRstriping. 
>Therfore, you are using all drives of a ESS/GSS/DSS model under the
>hood 
>in the vdisks anyway. 
>
>The only reason for having more NSDs is for using them for different 
>filesystems. 
>
> 
>Mit freundlichen Gr??en / Kind regards
>
>Achim Rehor
>
>IBM EMEA ESS/Spectrum Scale Support
>
>
>
>
>
>
>
>
>
>
>
>
>gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43:
>
>> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
>> To: gpfsug-discuss at spectrumscale.org
>> Date: 01/03/2021 08:58
>> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
>NSD's
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> 
>> On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
>> > 
>> > I?ve tried benchmarking many vs. few vdisks per RG, and never could
>
>see 
>> > any performance difference.
>> 
>> That's encouraging.
>> 
>> > 
>> > Usually we create 1 vdisk per enclosure per RG,   thinking this
>will 
>> > allow us to grow with same size vdisks when adding additional 
>enclosures 
>> > in the future.
>> > 
>> > Don?t think mmvdisk can be told to create multiple vdisks per RG 
>> > directly, so you have to manually create multiple vdisk sets each
>with 
>
>> > the apropriate size.
>> > 
>> 
>> Thing is back in the day so GPFS v2.x/v3.x there where strict
>warnings 
>> that you needed a minimum of six NSD's for optimal performance. I
>have 
>> sat in presentations where IBM employees have said so. What we where 
>> told back then is that GPFS needs a minimum number of NSD's in order
>to 
>> be able to spread the I/O's out. So if an NSD is being pounded for
>reads 
>
>> and a write comes in it. can direct it to a less busy NSD.
>> 
>> Now I can imagine that in a ESS/DSS-G that as it's being scattered to
>
>> the winds under the hood this is no longer relevant. But some notes
>to 
>> the effect for us old timers would be nice if that is the case to put
>
>> our minds to rest.
>> 
>> 
>> JAB.
>> 
>> -- 
>> Jonathan A. Buzzard                         Tel: +44141-5483420
>> HPC System Administrator, ARCHIE-WeSt.
>> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> https://urldefense.proofpoint.com/v2/url?
>> 
>u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
>> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
>> M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
>> IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
>> 
>
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/91665775/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  1 16:50:31 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 16:50:31 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
References: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>
	<488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
	<OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
	<OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
Message-ID: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>

On 01/03/2021 15:18, Olaf Weiser wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> JAB,
> yes-this is in argument ;-) ... and personally I like the idea of having 
> smth like setfacl also for GPFS ..? for years...
> *but* it would not take away the generic challenge , what to do, if 
> there are competing standards / definitions to meet
> at least that is most likely just one reason, why there's no tool yet
> there are several hits on RFE page for "ACL".. some of them could be 
> also addressed with a (mm)setfacl tool
> but I was not able to find a request for a tool itself
> (I quickly? searched? public but? not found it there, maybe there is 
> already one in private...)
> So - dependent on how important this item for others? is? ... its time 
> to fire an RFE ?!? ...

Well when I asked I was told by an IBM representative that it was by 
design there was no proper way to set ACLs directly from Linux. The 
expectation was that you would do this over NFSv4 or Samba.

So filing an RFE would be pointless under those conditions and I have 
never bothered as a result. This was pre 2012 so IBM's outlook might 
have changed in the meantime.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From olaf.weiser at de.ibm.com  Mon Mar  1 17:57:11 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 17:57:11 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>
References: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>,
	<dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk><488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu><OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com><OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
Message-ID: <OFA0B8D206.C995DEC6-ON0025868B.0061EBF7-0025868B.00629E93@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/3642e19b/attachment-0001.htm>

From A.Wolf-Reber at de.ibm.com  Tue Mar  2 09:36:48 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Tue, 2 Mar 2021 09:36:48 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OFA0B8D206.C995DEC6-ON0025868B.0061EBF7-0025868B.00629E93@notes.na.collabserv.com>
References: <OFA0B8D206.C995DEC6-ON0025868B.0061EBF7-0025868B.00629E93@notes.na.collabserv.com>,
	<122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>,
	<dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk><488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu><OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com><OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
Message-ID: <OFBB90AEFF.65465E44-ON0025868C.003436BA-0025868C.0034CED5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16146770920000.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16146770920001.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16146770920002.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0005.png>

From russell at nordquist.info  Tue Mar  2 19:31:24 2021
From: russell at nordquist.info (Russell Nordquist)
Date: Tue, 2 Mar 2021 14:31:24 -0500
Subject: [gpfsug-discuss] Self service creation of filesets
Message-ID: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>

Hi all

We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.

thanks
Russell


From anacreo at gmail.com  Tue Mar  2 20:58:29 2021
From: anacreo at gmail.com (Alec)
Date: Tue, 2 Mar 2021 12:58:29 -0800
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
Message-ID: <CAGhSTwjF1E0qvgCRphbrdWpikezWdjZ1CKtE=cVddeCbaFA4TA@mail.gmail.com>

This does feel like another situation where I may use a custom attribute
and a periodic script to do the fileset creation.  Honestly I would want
the change management around fileset creation.

But I could see a few custom attributes on a newly created user dir... Like
maybe just setting user.quota=10TB...  Then have a policy that discovers
these does the work of creating the fileset, setting the quotas, migrating
data to the fileset, and then mounting the fileset over the original
directory.  Honestly that sounds so nice I may have to implement this...
Lol.

Like I could see doing something like discovering directories that have
user.archive=true and automatically gzipping large files within. Would be
nice if GPFS policy engine could have a IF_ANCESTOR_ATTRIBUTE=.

Alec

On Tue, Mar 2, 2021, 11:40 AM Russell Nordquist <russell at nordquist.info>
wrote:

> Hi all
>
> We are trying to use filesets quite a bit, but it?s a hassle that only the
> admins can create them. To the users it?s just a directory so it slows
> things down. Has anyone deployed a self service model for creating
> filesets? Maybe using the API? This feels like shared pain that someone has
> already worked on?.
>
> thanks
> Russell
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/af5c169e/attachment-0001.htm>

From S.J.Thompson at bham.ac.uk  Tue Mar  2 22:38:17 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 2 Mar 2021 22:38:17 +0000
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
Message-ID: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk>

Not quite user self-service ....

But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again.

Simon

?On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" <gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info> wrote:

    Hi all

    We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.

    thanks
    Russell


    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From ckerner at illinois.edu  Tue Mar  2 22:59:01 2021
From: ckerner at illinois.edu (Kerner, Chad A)
Date: Tue, 2 Mar 2021 22:59:01 +0000
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
	<3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk>
Message-ID: <52196DB3-E8D3-47F7-92F6-3A123B46F615@illinois.edu>

We have a similar process. One of our customers has a web app that their managers use to provision spaces. That web app drops a json file into a specific location and a cron job kicks off a python script every so often to process the files and provision the space(fileset creation, link, quota, owner, group, perms, etc). Failures are queued and a jira ticket opened. Successes update the database for the web app. They are not requiring instant processing, so we process hourly on the back end side of things.

Chad
--
Chad Kerner, Senior Storage Engineer
Storage Enabling Technologies
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign

?On 3/2/21, 4:38 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson" <gpfsug-discuss-bounces at spectrumscale.org on behalf of S.J.Thompson at bham.ac.uk> wrote:

    Not quite user self-service ....
    
    But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again.
    
    Simon
    
    On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" <gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info> wrote:
    
        Hi all
    
        We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.
    
        thanks
        Russell
    
    
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ 
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ 
    

From tortay at cc.in2p3.fr  Wed Mar  3 08:06:37 2021
From: tortay at cc.in2p3.fr (Loic Tortay)
Date: Wed, 3 Mar 2021 09:06:37 +0100
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
Message-ID: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>

On 02/03/2021 20:31, Russell Nordquist wrote:
> Hi all
> 
> We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.
> 
Hello,
We have a quota management delegation (CLI) tool that allows 
"power-users" (PI and such) to create and remove filesets and manage 
users quotas for the groups/projects they're heading.

Like someone else said, from their point of view they're just 
directories, so they create a "directory with quotas".
In our experience, "directories with quotas" are the most convenient way 
for end-users to understand and use quotas.

This is a tool written in C, about 13 years ago, using the GPFS API (and 
a few calls to GPFS commands where there is no API or it's lacking).

Delegation authorization (identifying "power-users") is external to the 
tool.

Permissions & ACLs are also set on the junction when a fileset is 
created so that it's both immediately usable ("instant processing") and 
accessible to "power-users" (for space management purposes).

There are extra features for staff to allow higher-level operations 
(e.g. create an independent fileset for a group/project, change the 
group/project quotas, etc.)

The dated looking user documentation is 
https://ccspsmon.in2p3.fr/spsquota.html

Both the tool and the documentation have a few site-specific things, so 
it's not open-source (and it has become a "legacy" tool in need of a 
rewrite/refactoring).


Lo?c.
-- 
|   Lo?c Tortay <tortay at cc.in2p3.fr>  -     IN2P3 Computing Centre     |


From russell at nordquist.info  Wed Mar  3 17:14:37 2021
From: russell at nordquist.info (Russell Nordquist)
Date: Wed, 3 Mar 2021 12:14:37 -0500
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
	<2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>
Message-ID: <EA2BB7A5-C44F-4F65-BBD6-98926ABC906D@nordquist.info>

Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :)  Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 <http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244> am I missing something. 

What I would want is to be able to grant the the following calls + maybe a few more. 

The related REST API calls.
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesets.htm <https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesets.htm>
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesetlink.htm <https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesetlink.htm>

Russell


> On Mar 3, 2021, at 3:06 AM, Loic Tortay <tortay at cc.in2p3.fr> wrote:
> 
> On 02/03/2021 20:31, Russell Nordquist wrote:
>> Hi all
>> We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.
> Hello,
> We have a quota management delegation (CLI) tool that allows "power-users" (PI and such) to create and remove filesets and manage users quotas for the groups/projects they're heading.
> 
> Like someone else said, from their point of view they're just directories, so they create a "directory with quotas".
> In our experience, "directories with quotas" are the most convenient way for end-users to understand and use quotas.
> 
> This is a tool written in C, about 13 years ago, using the GPFS API (and a few calls to GPFS commands where there is no API or it's lacking).
> 
> Delegation authorization (identifying "power-users") is external to the tool.
> 
> Permissions & ACLs are also set on the junction when a fileset is created so that it's both immediately usable ("instant processing") and accessible to "power-users" (for space management purposes).
> 
> There are extra features for staff to allow higher-level operations (e.g. create an independent fileset for a group/project, change the group/project quotas, etc.)
> 
> The dated looking user documentation is https://ccspsmon.in2p3.fr/spsquota.html
> 
> Both the tool and the documentation have a few site-specific things, so it's not open-source (and it has become a "legacy" tool in need of a rewrite/refactoring).
> 
> 
> Lo?c.
> -- 
> |   Lo?c Tortay <tortay at cc.in2p3.fr>  -     IN2P3 Computing Centre     |
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210303/ea0d47bb/attachment-0001.htm>

From robert.horton at icr.ac.uk  Thu Mar  4 09:51:45 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Thu, 4 Mar 2021 09:51:45 +0000
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <EA2BB7A5-C44F-4F65-BBD6-98926ABC906D@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
	<2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>
	<EA2BB7A5-C44F-4F65-BBD6-98926ABC906D@nordquist.info>
Message-ID: <566f81f3bfd243f1b0258562b627e4e1b6869863.camel@icr.ac.uk>

On Wed, 2021-03-03 at 12:14 -0500, Russell Nordquist wrote:
CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe.

Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :)  Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 am I missing something.


That reminds me... We use a Python wrapper around the REST API to monitor usage against fileset quotas etc. In principle this will also set quotas (and create filesets) but it means giving it storage administrator access. It would be nice if the GUI had sufficiently fine grained permissions that you could set quotas without being able to delete the filesystem.

Rob

--

Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk | Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210304/7f0388dd/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Fri Mar  5 10:04:22 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 5 Mar 2021 10:04:22 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
Message-ID: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>


I am seeing that whenever I try and restore a file with an ACL I get the 
a ANS1589W error in /var/log/dsmerror.log

ANS1589W Unable to write extended attributes for ****** due to errno: 
13, reason: Permission denied

But bizarrely the ACL is actually restored. At least as far as I can 
tell. This is the 8.1.11-0 TSM client with GPFS version 5.0.5-1 against 
a 8.1.10-0 TSM server. Running on RHEL 7.7 to match the DSS-G 2.7b 
install. The backup node makes the third quorum node for the cluster 
being as that it runs genuine RHEL (unlike all the compute nodes which 
are running CentOS).

Googling I can't find any references to this being fixed in a later 
version of the GPFS software, though being on RHEL7 and it's derivatives 
I am stuck on 5.0.5

Surely root has permissions to write the extended attributes for anyone? 
It would seem perverse if you have to be the owner of a file to restore 
the ACL's.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From stockf at us.ibm.com  Fri Mar  5 12:15:38 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Fri, 5 Mar 2021 12:15:38 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
Message-ID: <OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210305/3e490cfc/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Fri Mar  5 13:07:56 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 5 Mar 2021 13:07:56 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
	<OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
Message-ID: <d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>

On 05/03/2021 12:15, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> Have you checked to see if Spectrum Protect (TSM) has addressed this 
> problem.? There recently was an issue with Protect and how it used the 
> GPFS API for ACLs.? If I recall Protect was not properly handling a 
> return code.? I do not know if it is relevant to your problem but? it 
> seemed worth mentioning.

As far as I am aware 8.1.11.0 is the most recent version of the Spectrum 
Protect/TSM client. There is nothing newer showing on the IBM FTP site

ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/

Checking on fix central also seems to show that 8.1.11.0 is the latest 
version, and the only fix over 8.1.10.0 is a security update to do with 
the client web user interface.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Renar.Grunenberg at huk-coburg.de  Fri Mar  5 18:06:43 2021
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Fri, 5 Mar 2021 18:06:43 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
	<OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
	<d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
Message-ID: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>

Hallo All,
thge mentioned problem with protect was this:
https://www.ibm.com/support/pages/node/6415985?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E
Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:  09561 96-44110
Telefax:  09561 96-44104
E-Mail:   Renar.Grunenberg at huk-coburg.de
Internet: www.huk.de
=======================================================================
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Thomas Sehn, Daniel Thomas.
=======================================================================
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
=======================================================================

-----Urspr?ngliche Nachricht-----
Von: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Jonathan Buzzard
Gesendet: Freitag, 5. M?rz 2021 14:08
An: gpfsug-discuss at spectrumscale.org
Betreff: Re: [gpfsug-discuss] TSM errors restoring files with ACL's

On 05/03/2021 12:15, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before
> clicking links or attachments.
> Have you checked to see if Spectrum Protect (TSM) has addressed this
> problem.  There recently was an issue with Protect and how it used the
> GPFS API for ACLs.  If I recall Protect was not properly handling a
> return code.  I do not know if it is relevant to your problem but  it
> seemed worth mentioning.

As far as I am aware 8.1.11.0 is the most recent version of the Spectrum
Protect/TSM client. There is nothing newer showing on the IBM FTP site

ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/

Checking on fix central also seems to show that 8.1.11.0 is the latest
version, and the only fix over 8.1.10.0 is a security update to do with
the client web user interface.


JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From stockf at us.ibm.com  Fri Mar  5 19:12:47 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Fri, 5 Mar 2021 19:12:47 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>
References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>,
	<1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk><OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com><d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
Message-ID: <OF08E960CD.84556D49-ON0025868F.00695AC7-0025868F.00698A79@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210305/3911592a/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Fri Mar  5 20:31:54 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 5 Mar 2021 20:31:54 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <OF08E960CD.84556D49-ON0025868F.00695AC7-0025868F.00698A79@notes.na.collabserv.com>
References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>
	<1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
	<OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
	<d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
	<OF08E960CD.84556D49-ON0025868F.00695AC7-0025868F.00698A79@notes.na.collabserv.com>
Message-ID: <696e96cc-da52-a24f-d53e-6510407e51e7@strath.ac.uk>

On 05/03/2021 19:12, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> I was referring to this flash, 
> https://www.ibm.com/support/pages/node/6381354?myns=swgtiv&mynp=OCSSEQVQ&mync=E&cm_sp=swgtiv-_-OCSSEQVQ-_-E 
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fpages%2Fnode%2F6381354%3Fmyns%3Dswgtiv%26mynp%3DOCSSEQVQ%26mync%3DE%26cm_sp%3Dswgtiv-_-OCSSEQVQ-_-E&data=04%7C01%7Cjonathan.buzzard%40strath.ac.uk%7C85cd5149f3f745b7137308d8e00ab18d%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C637505683823937774%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2b0LjE1Ycc3DPKto5kUTTLc0u5lG3DycsQGOjs%2BHgtw%3D&reserved=0> 
> 
> Spectrum Protect 8.1.11 client has the fix so this should not be an 
> issue for Jonathan.? Probably best to open a help case against Spectrum 
> Protect and begin the investigation there.
> 

Also the fix is to stop an unchanged file with an ACL from being backed 
up again, but only one more time.

I suspect we where hit with that issue, but given we only have ~90GB of 
files with ACL's on them I would not have noticed. That is significantly 
less than the normal daily churn.

This however is an issue with the *restore*.

Everything looks to get restored correctly even the ACL's. At the end of 
the restore all looks good given the headline report from dsmc. However 
there are ANS1589W warnings in dsmerror.log and dsmc exits with an error 
code of 8 rather than zero.

Will open a case against Spectrum Protect on Monday. I am pretty 
confident the warnings are false. The current plan is to do carefully 
curated hand restores of the three afflicted users when the rest of the 
restore if finished to double check the ACL's are the only issue.

Quite how the Spectrum Protect team have missed this bug is beyond me. 
Do they not have some unit tests to check this stuff before pushing out 
updates. I know in the past it worked, though that was many years ago 
now. However I restored many TB of data from backup with ACL's on them.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Robert.Oesterlin at nuance.com  Mon Mar  8 14:49:59 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 8 Mar 2021 14:49:59 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
Message-ID: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com>

Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance:

file1.py -> /fs1/patha/pathb/file1.py (I want to include these)
file2.py -> /fs2/patha/pathb/file2.py (exclude these)

The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution?


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/9fe0a20a/attachment-0001.htm>

From stockf at us.ibm.com  Mon Mar  8 15:29:42 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 8 Mar 2021 15:29:42 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com>
References: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com>
Message-ID: <OF5A646652.698C93DD-ON00258692.00550C1D-00258692.00551E23@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/9eb9a52a/attachment-0001.htm>

From Robert.Oesterlin at nuance.com  Mon Mar  8 15:34:21 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 8 Mar 2021 15:34:21 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
Message-ID: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>

Well - the case here is that the file system has, let?s say, 100M files. Some percentage of these are sym-links to a location that?s not in this file system. I want a report of all these off file system links. However, not all of the sym-links off file system are of interest, just some of them.

I can?t say for sure where in the file system they are (and I don?t care).


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Frederick Stock <stockf at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, March 8, 2021 at 9:29 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Policy scan of symbolic links with contents?

CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________
Could you use the PATHNAME LIKE statement to limit the location to the files of interest?

Fred
_______________________________________________________
Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821
stockf at us.ibm.com


----- Original message -----
From: "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Policy scan of symbolic links with contents?
Date: Mon, Mar 8, 2021 10:12 AM


Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance:


file1.py -> /fs1/patha/pathb/file1.py (I want to include these)

file2.py -> /fs2/patha/pathb/file2.py (exclude these)


The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution?


Bob Oesterlin

Sr Principal Storage Engineer, Nuance


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=i6m1zVXf4peZo0yo02IiRaQ_pUX95MN3wU53M0NiWcI&s=z-ibh2kAPHbehAsrGavNIg2AJdXmHkpUwy5YhZfUbpc&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/f62a22b8/attachment-0001.htm>

From stockf at us.ibm.com  Mon Mar  8 16:07:48 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 8 Mar 2021 16:07:48 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
References: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
Message-ID: <OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/42133a8c/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  8 20:45:05 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 8 Mar 2021 20:45:05 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>
References: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
	<OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>
Message-ID: <ac22d795-833b-3f8a-9397-70609c20d3fe@strath.ac.uk>

On 08/03/2021 16:07, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> Presumably the only feature that would help here is if policy could 
> determine that the end location pointed to by a symbolic link is within 
> the current file system.? I am not aware of any such feature or 
> attribute which policy could check so I think all you can do is run 
> policy to find the symbolic links and then check each link to see if it 
> points into the same file system.? You might find the mmfind command 
> useful for this purpose.? I expect it would eliminate the need to create 
> a policy to find the symbolic links.
> 

Unless you are using bind mounts if the symbolic link points outside the 
mount point of the file system it is not within the current file system.

So noting that you can write very SQL like statements something like the 
following should in theory do it

RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND 
SUBSTR(PATH_NAME,0,4)='/fs1/'

Note the above is not checked in any way shape or form for working. Even 
if you do have bind mounts of other GPFS file systems you just need a 
more complicated WHERE statement.

When doing policy engine stuff I find having that section of the GPFS 
manual printed out and bound, along with an SQL book for reference is 
very helpful.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Mon Mar  8 21:00:04 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 8 Mar 2021 21:00:04 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <ac22d795-833b-3f8a-9397-70609c20d3fe@strath.ac.uk>
References: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
	<OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>
	<ac22d795-833b-3f8a-9397-70609c20d3fe@strath.ac.uk>
Message-ID: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>

On 08/03/2021 20:45, Jonathan Buzzard wrote:

[SNIP]

> So noting that you can write very SQL like statements something like the
> following should in theory do it
> 
> RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND
> SUBSTR(PATH_NAME,0,4)='/fs1/'
> 
> Note the above is not checked in any way shape or form for working. Even
> if you do have bind mounts of other GPFS file systems you just need a
> more complicated WHERE statement.

Duh, of course as soon as I sent it, I realized there is a missing SHOW

RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' 
AND SUBSTR(PATH_NAME,0,4)='/fs1/'

You could replace the SUBSTR with a REGEX if you prefer


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From ulmer at ulmer.org  Mon Mar  8 22:33:38 2021
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Mon, 8 Mar 2021 17:33:38 -0500
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>
References: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>
Message-ID: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org>

Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood).

-- 
Stephen Ulmer

Sent from a mobile device; please excuse auto-correct silliness.

> On Mar 8, 2021, at 3:34 PM, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> ?On 08/03/2021 20:45, Jonathan Buzzard wrote:
> 
> [SNIP]
> 
>> So noting that you can write very SQL like statements something like the
>> following should in theory do it
>> RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND
>> SUBSTR(PATH_NAME,0,4)='/fs1/'
>> Note the above is not checked in any way shape or form for working. Even
>> if you do have bind mounts of other GPFS file systems you just need a
>> more complicated WHERE statement.
> 
> Duh, of course as soon as I sent it, I realized there is a missing SHOW
> 
> RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/'
> 
> You could replace the SUBSTR with a REGEX if you prefer
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Robert.Oesterlin at nuance.com  Tue Mar  9 12:25:56 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Tue, 9 Mar 2021 12:25:56 +0000
Subject: [gpfsug-discuss] [EXTERNAL] Re: Policy scan of symbolic links
 with contents?
In-Reply-To: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org>
References: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>
	<23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org>
Message-ID: <3B0AD02E-335F-4540-B109-EC5301C3188A@nuance.com>

RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L'
AND SUBSTR(PATH_NAME,0,4)='/fs1/'

In this case PATH_NAME is the path within the GPFS file system, not the target of the link, correct? That's not what I want. I want the path of the *link target*.
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance


?On 3/8/21, 4:41 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Stephen Ulmer" <gpfsug-discuss-bounces at spectrumscale.org on behalf of ulmer at ulmer.org> wrote:

    CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.

    ----------------------------------------------------------------------
    Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood).

    
From bill.burke.860 at gmail.com  Wed Mar 10 02:19:02 2021
From: bill.burke.860 at gmail.com (William Burke)
Date: Tue, 9 Mar 2021 21:19:02 -0500
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
Message-ID: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>

 I would like to know what files were modified/created/deleted (only for
the current day) on the GPFS's file system so that I could rsync ONLY those
files to a predetermined external location. I am running GPFS 4.2.3.9

Is there a way to access the GPFS's metadata directly so that I do not have
to traverse the filesystem looking for these files? If i use the rsync tool
it will scan the file system which is 400+ million files.  Obviously this
will be problematic to complete a scan in a day, if it would ever complete
single-threaded. There are tools or scripts that run multithreaded rsync
but it's still a brute force attempt. and it would be nice to know where
the delta of files that have changed.

I began looking at Spectrum Scale Data Management (DM) API but I am not
sure if this is the best approach to looking at the GPFS metadata - inodes,
modify times, creation times, etc.


-- 

Best Regards,

William Burke (he/him)
Lead HPC Engineer
Advance Research Computing
860.255.8832 m | LinkedIn <http://LinkedIn.com/in/billcburke>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210309/3dfbe52a/attachment-0001.htm>

From novosirj at rutgers.edu  Wed Mar 10 02:21:54 2021
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Wed, 10 Mar 2021 02:21:54 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
Message-ID: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>

Yup, you want to use the policy engine:

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync).

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> 
>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> 
> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed.
> 
> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc.
> 
> 
> 
> -- 
> 
> Best Regards,
> 
> William Burke (he/him)
> Lead HPC Engineer
> Advance Research Computing
> 860.255.8832 m | LinkedIn
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From anacreo at gmail.com  Wed Mar 10 02:59:18 2021
From: anacreo at gmail.com (Alec)
Date: Tue, 9 Mar 2021 18:59:18 -0800
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
	<05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
Message-ID: <CAGhSTwiQCiiagiEPh12pPu6Dp3oRTVWqwKz9ChvO9FYNomh8vQ@mail.gmail.com>

You would definitely be able to search by inode creation date and find the
files you want... our 1.25m file filesystem takes about 47 seconds to
query...  One thing I would worry about though is inode deletion and
inter-fileset file moves.   The SQL based engine wouldn't be able to
identify those changes and so you'd not be able to replicate deletes and
such.

Alternatively....
I have a script that runs in about 4 minutes and it pulls all the data out
of the backup indexes, and compares the pre-built hourly file index on our
system and identifies files that don't exist in the backup, so I have a
daily backup validation...  I filter the file list using ksh's printf date
manipulation to filter out files that are less than 2 days old, to reduce
the noise.  A modification to this could simply compare a daily file index
with the previous day's index, and send rsync a list of files (existing or
deleted) based on just a delta of the two indexes (sort|diff), then you
could properly account for all the changes.  If you don't care about file
modifications just produce both lists based on creation time instead of
modification time.  The mmfind command or GPFS policy engine should be able
to produce a full file list/index very rapidly.

In another thread there was a conversation with ACL's...  I don't think our
backup system backs up ACL's so I just have GPFS produce a list of all ACL
applied objects on the daily, and have a script that just makes a null
delimited backup file of every single ACL on our file system... and have a
script to apply the ACL's as a "restore".  It's a pretty simple thing to
write-up and keeping 90 day history on this lets me compare the ACL
evolution on a file very easily.

Alec

MVH
Most Victorious Hunting
(Why should Scandinavians own this cool sign off)

On Tue, Mar 9, 2021 at 6:22 PM Ryan Novosielski <novosirj at rutgers.edu>
wrote:

> Yup, you want to use the policy engine:
>
>
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>
> Something in here ought to help. We do something like this (but I?m
> reluctant to provide examples as I?m actually suspicious that we don?t have
> it quite right and are passing far too much stuff to rsync).
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS,     |---------------------------*O*---------------------------
> ||_// the State  |         Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
>      `'
>
> > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com>
> wrote:
> >
> >  I would like to know what files were modified/created/deleted (only for
> the current day) on the GPFS's file system so that I could rsync ONLY those
> files to a predetermined external location. I am running GPFS 4.2.3.9
> >
> > Is there a way to access the GPFS's metadata directly so that I do not
> have to traverse the filesystem looking for these files? If i use the rsync
> tool it will scan the file system which is 400+ million files.  Obviously
> this will be problematic to complete a scan in a day, if it would ever
> complete single-threaded. There are tools or scripts that run multithreaded
> rsync but it's still a brute force attempt. and it would be nice to know
> where the delta of files that have changed.
> >
> > I began looking at Spectrum Scale Data Management (DM) API but I am not
> sure if this is the best approach to looking at the GPFS metadata - inodes,
> modify times, creation times, etc.
> >
> >
> >
> > --
> >
> > Best Regards,
> >
> > William Burke (he/him)
> > Lead HPC Engineer
> > Advance Research Computing
> > 860.255.8832 m | LinkedIn
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210309/3dd0f70a/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Wed Mar 10 15:15:58 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 10 Mar 2021 15:15:58 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <CAGhSTwiQCiiagiEPh12pPu6Dp3oRTVWqwKz9ChvO9FYNomh8vQ@mail.gmail.com>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
	<05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
	<CAGhSTwiQCiiagiEPh12pPu6Dp3oRTVWqwKz9ChvO9FYNomh8vQ@mail.gmail.com>
Message-ID: <641ea714-579b-1d74-4b86-d0e0b2e8e9c3@strath.ac.uk>

On 10/03/2021 02:59, Alec wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> You would definitely be able to search by inode creation date and find 
> the files you want... our 1.25m file filesystem takes about 47 seconds 
> to query...? One thing I would worry about though is inode deletion and 
> inter-fileset file moves.? ?The SQL based engine wouldn't be able to 
> identify those changes and so you'd not be able to replicate deletes and 
> such.
> 

This is the problem with rsync "backups", you need to run it with 
--delete otherwise any restore will "upset" your users as they find 
large numbers of file they had deleted unhelpfully "restored"

> Alternatively....
> I have a script that runs in about 4 minutes and it pulls all the data 
> out of the backup indexes, and compares the pre-built hourly file index 
> on our system and identifies files that don't exist in the backup, so I 
> have a daily backup validation...? I filter the file list using 
> ksh's?printf date manipulation to filter out files that are less than 2 
> days old, to reduce the noise.? A modification to this could simply 
> compare a daily file index with the previous day's index, and send rsync 
> a list of files (existing or deleted) based on just a delta of the two 
> indexes (sort|diff), then you could properly account for all the 
> changes.? If you don't care about file modifications just produce both 
> lists based on creation time instead of modification time.? The mmfind 
> command or GPFS policy engine should be able to produce a full file 
> list/index very rapidly.
> 

My view would be somewhere along the lines of this is a lot of work and 
if you have the space to rsync your GPFS file system to, presumably with 
a server attached to said storage then for under 500 PVU of Spectrum 
Protect licensing you can have a fully supported client/server Spectrum 
Protect/TSM backup solution and just use mmbackup.

You need to play the game and use older hardware ;-) I use an ancient 
pimped out Dell PowerEdge R300 as my TSM client node. Why this old, well 
it has a dual core Xeon E3113 for only 100 PVU. Anything newer would be 
quad core and 70 PVU per core which would cost an additional ~$1000 in 
licensing.

If it breaks down they are under $100 on eBay. It's never skipped a beat 
and I have just finished a complete planned restore of our DSS-G using it.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Wed Mar 10 19:09:13 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 10 Mar 2021 19:09:13 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
	<05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
Message-ID: <CFFD9E75-92C8-428A-8B34-660C32562E47@bham.ac.uk>

I was looking for the original source for this, but it was on dev works ... which is now dead.

But you can use something like:

tsbuhelper clustermigdiff \
$migratePath/.mmmigrateCfg/mmmigrate.list.v${prevFileCount}.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.latest.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.changed.v${fileCount}.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.deleted.v${fileCount}.filelist

"mmmigrate.list.latest.filelist" would be the output of a policyscan of your files today
"mmmigrate.list.v${prevFileCount}.filelist" is yesterday's policyscan

This then generates the changed and deleted list of files for you. tsbuhelper is what is used internally in mmbackup, though is not very documented...

We actually used something along these lines to support migrating between file-systems (generate daily diffs and sync those). The policy scan uses:

RULE EXTERNAL LIST 'latest.filelist' EXEC '' \
 RULE 'FilesToMigrate' LIST 'latest.filelist' DIRECTORIES_PLUS \
 SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || \
 VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || \
 ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' \
 WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' \
 ELSE 'resdnt' END )) \
 WHERE \
 ( \
 NOT \
 ( (PATH_NAME LIKE '/%/.mmbackup%') OR \
 (PATH_NAME LIKE '/%/.mmmigrate%') OR \
 (PATH_NAME LIKE '/%/.afm%') OR \
 (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR \
 (PATH_NAME LIKE '/%/.mmLockDir/%') OR \
 (MODE LIKE 's%') \
 ) \
 ) \
 AND \
 (MISC_ATTRIBUTES LIKE '%u%') \
 AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) \
 AND (NOT (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.SpaceMan/%'))

On our file-system, both the scan and diff took a long time (hours), but hundreds of millions of files.

This comes with no warranty ...

We don't use this for backup, Spectrum Protect and mmbackup are our friends ...

Simon

?On 10/03/2021, 02:22, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski" <gpfsug-discuss-bounces at spectrumscale.org on behalf of novosirj at rutgers.edu> wrote:

    Yup, you want to use the policy engine:

    https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

    Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync).

    --
    #BlackLivesMatter
    ____
    || \\UTGERS,  	 |---------------------------*O*---------------------------
    ||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
    || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
    ||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
         `'

    > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
    > 
    >  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
    > 
    > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed.
    > 
    > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc.
    > 
    > 
    > 
    > -- 
    > 
    > Best Regards,
    > 
    > William Burke (he/him)
    > Lead HPC Engineer
    > Advance Research Computing
    > 860.255.8832 m | LinkedIn
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From enrico.tagliavini at fmi.ch  Thu Mar 11 09:22:46 2021
From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico)
Date: Thu, 11 Mar 2021 09:22:46 +0000
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
References: <8d58f5c6c8ee4f44a5e09c4f9e3a6dac@ex2013mbx2.fmi.ch>
Message-ID: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>

Hello William,

I've got your email forwarded my another user and I decided to subscribe to give you my two cents.

I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.

DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
enough not to go that route.

What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
the ctime changes in the last couple of days (to update metadata info).

Good luck.
Kind regards.

-- 

Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


-------- Forwarded Message --------
> 
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org?<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
> Sent: Wednesday, March 10, 2021 3:22 AM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
> 
> Yup, you want to use the policy engine:
> 
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
> 
> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
> don?t have it quite right and are passing far too much stuff to rsync).
> 
> --
> #BlackLivesMatter
> ____
> > > \\UTGERS,?? |---------------------------*O*---------------------------
> > > _// the State |???????? Ryan Novosielski - novosirj at rutgers.edu
> > > \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> > > ?\\??? of NJ | Office of Advanced Research Computing - MSB C630, Newark
> ???? `'
> 
> > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> > 
> > ?I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
> > could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> > 
> > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
> > i use the rsync tool it will scan the file system which is 400+ million files.? Obviously this will be problematic to complete a
> > scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
> > brute force attempt. and it would be nice to know where the delta of files that have changed.
> > 
> > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
> > metadata - inodes, modify times, creation times, etc.
> > 
> > 
> > 
> > --
> > 
> > Best Regards,
> > 
> > William Burke (he/him)
> > Lead HPC Engineer
> > Advance Research Computing
> > 860.255.8832 m | LinkedIn
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From ulmer at ulmer.org  Thu Mar 11 13:17:30 2021
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Thu, 11 Mar 2021 08:17:30 -0500
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>
References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>
Message-ID: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org>

I?m going to ask what may be a dumb question:

Given that you have GPFS on both ends, what made you decide to NOT use AFM?

 -- 
Stephen


> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ?Hello William,
> 
> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
> 
> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
> 
> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
> enough not to go that route.
> 
> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
> the ctime changes in the last couple of days (to update metadata info).
> 
> Good luck.
> Kind regards.
> 
> -- 
> 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> -------- Forwarded Message --------
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
>> Sent: Wednesday, March 10, 2021 3:22 AM
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
>> 
>> Yup, you want to use the policy engine:
>> 
>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>> 
>> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
>> don?t have it quite right and are passing far too much stuff to rsync).
>> 
>> --
>> #BlackLivesMatter
>> ____
>>>> \\UTGERS,   |---------------------------*O*---------------------------
>>>> _// the State |         Ryan Novosielski - novosirj at rutgers.edu
>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>>>  \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
>>      `'
>> 
>>>> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
>>> 
>>>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
>>> 
>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
>>> i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
>>> brute force attempt. and it would be nice to know where the delta of files that have changed.
>>> 
>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
>>> metadata - inodes, modify times, creation times, etc.
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best Regards,
>>> 
>>> William Burke (he/him)
>>> Lead HPC Engineer
>>> Advance Research Computing
>>> 860.255.8832 m | LinkedIn
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/410f7c04/attachment-0001.htm>

From enrico.tagliavini at fmi.ch  Thu Mar 11 13:24:47 2021
From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico)
Date: Thu, 11 Mar 2021 13:24:47 +0000
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org>
References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>
	<9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org>
Message-ID: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>

Hello Stephen,

actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.

The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.

Kind regards.


--


Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
I?m going to ask what may be a dumb question:

Given that you have GPFS on both ends, what made you decide to NOT use AFM?

 --
Stephen


On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:

?Hello William,

I've got your email forwarded my another user and I decided to subscribe to give you my two cents.

I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.

DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
enough not to go that route.

What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
the ctime changes in the last couple of days (to update metadata info).

Good luck.
Kind regards.

--

Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


-------- Forwarded Message --------

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
Sent: Wednesday, March 10, 2021 3:22 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync

Yup, you want to use the policy engine:

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
don?t have it quite right and are passing far too much stuff to rsync).

--
#BlackLivesMatter
____
\\UTGERS,   |---------------------------*O*---------------------------
_// the State |         Ryan Novosielski - novosirj at rutgers.edu
\\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
 \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
     `'

On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:

 I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9

Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
brute force attempt. and it would be nice to know where the delta of files that have changed.

I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
metadata - inodes, modify times, creation times, etc.


--

Best Regards,

William Burke (he/him)
Lead HPC Engineer
Advance Research Computing
860.255.8832 m | LinkedIn
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/36d19e4a/attachment-0001.htm>

From ulmer at ulmer.org  Thu Mar 11 13:47:44 2021
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Thu, 11 Mar 2021 08:47:44 -0500
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
Message-ID: <AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>

Thank you! Would you mind letting me know in what era you made your evaluation?

I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 -- 
Stephen


> On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ?
> Hello Stephen,
> 
> actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.
> 
> The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.
> 
> Kind regards.
> 
>  -- 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
>> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
>> I?m going to ask what may be a dumb question:
>> 
>> Given that you have GPFS on both ends, what made you decide to NOT use AFM?
>> 
>>  -- 
>> Stephen
>> 
>> 
>>> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
>>> 
>>> ?Hello William,
>>> 
>>> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
>>> 
>>> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
>>> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
>>> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
>>> 
>>> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
>>> enough not to go that route.
>>> 
>>> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
>>> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
>>> the ctime changes in the last couple of days (to update metadata info).
>>> 
>>> Good luck.
>>> Kind regards.
>>> 
>>> -- 
>>> 
>>> Enrico Tagliavini
>>> Systems / Software Engineer
>>> 
>>> enrico.tagliavini at fmi.ch
>>> 
>>> Friedrich Miescher Institute for Biomedical Research
>>> Infomatics
>>> 
>>> Maulbeerstrasse 66
>>> 4058 Basel
>>> Switzerland
>>> 
>>> 
>>> 
>>> 
>>> -------- Forwarded Message --------
>>>> 
>>>> -----Original Message-----
>>>> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
>>>> Sent: Wednesday, March 10, 2021 3:22 AM
>>>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
>>>> 
>>>> Yup, you want to use the policy engine:
>>>> 
>>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>>>> 
>>>> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
>>>> don?t have it quite right and are passing far too much stuff to rsync).
>>>> 
>>>> --
>>>> #BlackLivesMatter
>>>> ____
>>>>>> \\UTGERS,   |---------------------------*O*---------------------------
>>>>>> _// the State |         Ryan Novosielski - novosirj at rutgers.edu
>>>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>>>>>  \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
>>>>      `'
>>>> 
>>>>>> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
>>>>> 
>>>>>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
>>>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
>>>>> 
>>>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
>>>>> i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
>>>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
>>>>> brute force attempt. and it would be nice to know where the delta of files that have changed.
>>>>> 
>>>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
>>>>> metadata - inodes, modify times, creation times, etc.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best Regards,
>>>>> 
>>>>> William Burke (he/him)
>>>>> Lead HPC Engineer
>>>>> Advance Research Computing
>>>>> 860.255.8832 m | LinkedIn
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at spectrumscale.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>> 
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/b1e24f88/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Thu Mar 11 14:20:05 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 11 Mar 2021 14:20:05 +0000
Subject: [gpfsug-discuss] Synchronization/Restore of file systems
Message-ID: <f71b5cf2-fe87-121b-1df3-ab3b72dfed47@strath.ac.uk>


As promised last year I having just completed a storage upgrade, I have 
sanitized my scripts and put them up on Github for other people to have 
a look at the methodology I use in these sorts of scenarios.

This time the upgrade involved pulling out all the existing disks and 
fitting large ones then restoring from backup, rather than synchronizing 
to a new system, but the principles are the same.

Bear in mind the code is written in Perl because it's history is ancient 
now and with few opportunities to test it in anger, rewriting it in the 
latest fashionable scripting language is unappealing.

     https://github.com/digitalcabbage/syncrestore

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From enrico.tagliavini at fmi.ch  Thu Mar 11 14:24:43 2021
From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico)
Date: Thu, 11 Mar 2021 14:24:43 +0000
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
	<AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
Message-ID: <c86f822581dcec1cc9c4d68e7448bcbfc5154ad0.camel@fmi.ch>

We evaluated AFM multiple times. The first time was in 2017 with Spectrum Scale 4.2 . When we switched to Spectrum Scale 5 not long ago we also re-evaluated AFM.

The horror stories about data loss are becoming more rare with modern setups, especially in the non DR case scenario. However AFM is still a very complicated tool, way to complicated if what you are looking for is a "simple" rsync style backup (but faster). The 3000+ pages of documentation for GPFS do not help our small team and many of those pages are dedicated to just AFM.

The performance problem is also still a real issue with modern versions as far as I was told. We can have a quite erratic data turnover in our setup, tied to very big scientific instruments capable of generating many TB of data per hour. Having good performance is important. I used the same tool we use for backups also to migrate the data from the old storage to the new storage (and from GPFS 4 to GPFS 5), and I managed to reach speeds of 17 - 19 GB / s data transfer (when hitting big files that is) using only two servers equipped with Infiniband EDR. I made a simple script to parallelize rsync to make it faster: https://github.com/fmi-basel/splitrsync . Combined with another program using the policy engine to generate the file list to avoid the painful crawling.

As I said we are a small team, so we have to be efficient. Developing that tool costed me time, but the ROI is there as I can use the same tool with non GPFS powered storage system, and we had many occasions where this was the case, for example when moving data from old system to be decommissioned to the GPFS storage.

And I would like to finally mention another hot topic: who says we will be on GPFS forever? The recent licensing change would probably destroy our small IT budget and we would not be able to afford Spectrum Scale any longer. We might be forced to switch to a cheaper solution. At least this way we can carry some of the code we wrote with us. With AFM we would have to start from scratch. Originally we were not really planning to move as we didn't expect this change in licensing with the associated increased cost. But now, this turns out to be a small time saver if we indeed have to switch.

Kind regards.


--


Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


On Thu, 2021-03-11 at 08:47 -0500, Stephen Ulmer wrote:
Thank you! Would you mind letting me know in what era you made your evaluation?

I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:

?
Hello Stephen,

actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.

The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.

Kind regards.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/fe9b528f/attachment-0001.htm>

From sadaniel at us.ibm.com  Thu Mar 11 16:08:11 2021
From: sadaniel at us.ibm.com (Steven Daniels)
Date: Thu, 11 Mar 2021 09:08:11 -0700
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
	<AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
Message-ID: <OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404@notes.na.collabserv.com>

Also, be aware there have been massive improvements in AFM, in terms of
usability, reliablity and performance.

I just completed a project where we moved about 3/4 PB during 7x24
operations to retire a very old storage system (1st Gen IBM GSS) to a new
ESS. We were able to get considerable performance but not without effort,
it allowed the client to continue operations and migrate to new hardware
seamlessly.

The new v5.1 AFM feature supports filesystem level AFM which would have
greatly simplified the effort and I believe will make AFM vastly easier to
implement in the general case.

I'll leave it to Venkat and others on the development team to share more
details about improvements.


Steven A. Daniels
Cross-brand Client Architect
Senior Certified IT Specialist
National Programs
Fax and Voice: 3038101229
sadaniel at us.ibm.com
http://www.ibm.com


From:	Stephen Ulmer <ulmer at ulmer.org>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:	bill.burke.860 at gmail.com
Date:	03/11/2021 06:47 AM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with
            Rsync
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Thank you! Would you mind letting me know in what era you made your
evaluation?

I?m not suggesting you should change anything at all, but when I make
recommendations for my own customers I like to be able to associate the
level of GPFS with the anecdotes. I view the software as more of a stream
of features and capabilities than as a set product.

Different clients have different requirements, so every implementation
could be different. When I add someone else?s judgement to my own, I just
like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


      On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico
      <enrico.tagliavini at fmi.ch> wrote:

      ?
      Hello Stephen,

      actually not a dumb question at all. We evaluated AFM quite a bit
      before turning it down.

      The horror stories about it and massive data loss are too scary. Plus
      we had actual reports of very bad performance. Personally I think AFM
      is very complicated, overcomplicated for what we need. We need the
      data safe, we don't need active / active DR or anything like that.
      While AFM can technically do what we need the complexity of its
      design makes it too easy to make a mistake and cause a service
      disruption or, even worst, data loss. We are a very small institute
      with a small IT team, so investing time in making it right was also
      not really worth it due to the high TCO.

      Kind regards.

      --
      Enrico Tagliavini
      Systems / Software Engineer

      enrico.tagliavini at fmi.ch

      Friedrich Miescher Institute for Biomedical Research
      Infomatics

      Maulbeerstrasse 66
      4058 Basel
      Switzerland


      On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
        I?m going to ask what may be a dumb question:

        Given that you have GPFS on both ends, what made you decide to NOT
        use AFM?

         --
        Stephen


         On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico
         <enrico.tagliavini at fmi.ch> wrote:

         ?Hello William,

         I've got your email forwarded my another user and I decided to
         subscribe to give you my two cents.

         I would like to warn you about the risk of dong what you have in
         mind. Using the GPFS policy engine to get a list of file to rsync
         is
         easily going to get you with missing data in the backup. The
         problem is that there are cases that are not covered by it. For
         example
         if you mv a folder with a lot of nested subfolders and files none
         of the subfolders would show up in your list of files to be
         updated.

         DM API would be the way to go, as you could replicate the mv on
         the backup side, but you must not miss any event, which scares me
         enough not to go that route.

         What I ended up doing instead: we run GPFS on both side, main and
         backup storage. So I use the policy engine on both sides and just
         build up the differences. We have about 250 million files and this
         is surprisingly fast. On top of that add all the files for which
         the ctime changes in the last couple of days (to update metadata
         info).

         Good luck.
         Kind regards.

         --

         Enrico Tagliavini
         Systems / Software Engineer

         enrico.tagliavini at fmi.ch

         Friedrich Miescher Institute for Biomedical Research
         Infomatics

         Maulbeerstrasse 66
         4058 Basel
         Switzerland


         -------- Forwarded Message --------

           -----Original Message-----
           From: gpfsug-discuss-bounces at spectrumscale.org
           <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan
           Novosielski
           Sent: Wednesday, March 10, 2021 3:22 AM
           To: gpfsug main discussion list
           <gpfsug-discuss at spectrumscale.org>
           Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync

           Yup, you want to use the policy engine:

           https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

           Something in here ought to help. We do something like this (but
           I?m reluctant to provide examples as I?m actually suspicious
           that we
           don?t have it quite right and are passing far too much stuff to
           rsync).

           --
           #BlackLivesMatter
           ____
              \\UTGERS,
              |---------------------------*O*---------------------------
              _// the State |         Ryan Novosielski -
              novosirj at rutgers.edu
              \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
              RBHS Campus
               \\    of NJ | Office of Advanced Research Computing - MSB
              C630, Newark
                `'

            On Mar 9, 2021, at 9:19 PM, William Burke
            <bill.burke.860 at gmail.com> wrote:

             I would like to know what files were modified/created/deleted
            (only for the current day) on the GPFS's file system so that I
            could rsync ONLY those files to a predetermined external
            location. I am running GPFS 4.2.3.9

            Is there a way to access the GPFS's metadata directly so that I
            do not have to traverse the filesystem looking for these files?
            If
            i use the rsync tool it will scan the file system which is 400+
            million files.  Obviously this will be problematic to complete
            a
            scan in a day, if it would ever complete single-threaded. There
            are tools or scripts that run multithreaded rsync but it's
            still a
            brute force attempt. and it would be nice to know where the
            delta of files that have changed.

            I began looking at Spectrum Scale Data Management (DM) API but
            I am not sure if this is the best approach to looking at the
            GPFS
            metadata - inodes, modify times, creation times, etc.


            --

            Best Regards,

            William Burke (he/him)
            Lead HPC Engineer
            Advance Research Computing
            860.255.8832 m | LinkedIn
            _______________________________________________
            gpfsug-discuss mailing list
            gpfsug-discuss at spectrumscale.org
            http://gpfsug.org/mailman/listinfo/gpfsug-discuss

           _______________________________________________
           gpfsug-discuss mailing list
           gpfsug-discuss at spectrumscale.org
           http://gpfsug.org/mailman/listinfo/gpfsug-discuss
         _______________________________________________
         gpfsug-discuss mailing list
         gpfsug-discuss at spectrumscale.org
         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/c7c77d22/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A816397.jpg
Type: image/jpeg
Size: 4919 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/c7c77d22/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/c7c77d22/attachment-0001.gif>

From novosirj at rutgers.edu  Thu Mar 11 16:28:57 2021
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Thu, 11 Mar 2021 16:28:57 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404@notes.na.collabserv.com>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
	<AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
	<OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404@notes.na.collabserv.com>
Message-ID: <1298DFDD-9701-4FE4-9B06-1541455E0F52@rutgers.edu>

Agreed. Since 5.0.4.1 on the client side (we do rely on it for home directories that are geographically distributed), we have effectively not had any more problems. Our server side are all 5.0.3.2-3. 

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Mar 11, 2021, at 11:08 AM, Steven Daniels <sadaniel at us.ibm.com> wrote:
> 
> Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. 
> 
> I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly.
> 
> The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. 
> 
> I'll leave it to Venkat and others on the development team to share more details about improvements. 
> 
> 
> Steven A. Daniels
> Cross-brand Client Architect
> Senior Certified IT Specialist
> National Programs
> Fax and Voice: 3038101229
> sadaniel at us.ibm.com
> http://www.ibm.com
> <1A816397.jpg>
> 
> <graycol.gif>Stephen Ulmer ---03/11/2021 06:47:59 AM---Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting y
> 
> From:  Stephen Ulmer <ulmer at ulmer.org>
> To:  gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc:  bill.burke.860 at gmail.com
> Date:  03/11/2021 06:47 AM
> Subject:  [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync
> Sent by:  gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Thank you! Would you mind letting me know in what era you made your evaluation?
> 
> I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.
> 
> Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.
> 
> Your original post was very thoughtful, and I appreciate your time.
> 
> -- 
> Stephen
> 
> On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ? 
> Hello Stephen,
> 
> actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.
> 
> The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.
> 
> Kind regards.
> 
> -- 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> 
> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
> I?m going to ask what may be a dumb question:
> 
> Given that you have GPFS on both ends, what made you decide to NOT use AFM?
> 
> --  
> Stephen
> 
> 
> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ?Hello William,
> 
> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
> 
> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
> 
> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
> enough not to go that route.
> 
> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
> the ctime changes in the last couple of days (to update metadata info).
> 
> Good luck.
> Kind regards.
> 
> -- 
> 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> -------- Forwarded Message --------
> 
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
> Sent: Wednesday, March 10, 2021 3:22 AM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
> 
> Yup, you want to use the policy engine:
> 
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
> 
> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
> don?t have it quite right and are passing far too much stuff to rsync).
> 
> --
> #BlackLivesMatter
> ____
> \\UTGERS, |---------------------------*O*---------------------------
> _// the State | Ryan Novosielski - novosirj at rutgers.edu
> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
> `'
> 
> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> 
> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> 
> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a
> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
> brute force attempt. and it would be nice to know where the delta of files that have changed.
> 
> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
> metadata - inodes, modify times, creation times, etc.
> 
> 
> 
> --
> 
> Best Regards,
> 
> William Burke (he/him)
> Lead HPC Engineer
> Advance Research Computing
> 860.255.8832 m | LinkedIn
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From honwai.leong at sydney.edu.au  Thu Mar 11 22:28:57 2021
From: honwai.leong at sydney.edu.au (Honwai Leong)
Date: Thu, 11 Mar 2021 22:28:57 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
Message-ID: <SYBPR01MB477885175569465AF7B15139D1909@SYBPR01MB4778.ausprd01.prod.outlook.com>

This paper might provide some ideas, not the best solution but works fine 

https://github.com/HPCSYSPROS/Workshop20/blob/master/Parallelized_data_replication_of_multi-petabyte_storage_systems/ws_hpcsysp103s1-file1.pdf

It is a two-part workflow to replicate files from production to DR site. It leverages on snapshot ID to determine which files have been updated/modified after a snapshot was taken. It doesn't take care of deletion of files moved from one directory to another, so it uses dsync to take care of that part. 

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of gpfsug-discuss-request at spectrumscale.org
Sent: Friday, March 12, 2021 3:08 AM
To: gpfsug-discuss at spectrumscale.org
Subject: gpfsug-discuss Digest, Vol 110, Issue 20

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
	gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Fwd: FW:  Backing up GPFS with Rsync (Steven Daniels)


----------------------------------------------------------------------

Message: 1
Date: Thu, 11 Mar 2021 09:08:11 -0700
From: "Steven Daniels" <sadaniel at us.ibm.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc: gpfsug-discuss-bounces at spectrumscale.org, bill.burke.860 at gmail.com
Subject: Re: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
Message-ID:
	<OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404 at notes.na.collabserv.com>
	
Content-Type: text/plain; charset="utf-8"

Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance.

I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly.

The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case.

I'll leave it to Venkat and others on the development team to share more details about improvements.


Steven A. Daniels
Cross-brand Client Architect
Senior Certified IT Specialist
National Programs
Fax and Voice: 3038101229
sadaniel at us.ibm.com
https://protect-au.mimecast.com/s/ZnryCr81nyt88D8ZkuztwY-?domain=ibm.com


From:	Stephen Ulmer <ulmer at ulmer.org>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:	bill.burke.860 at gmail.com
Date:	03/11/2021 06:47 AM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with
            Rsync
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Thank you! Would you mind letting me know in what era you made your evaluation?

I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


      On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico
      <enrico.tagliavini at fmi.ch> wrote:

      ?
      Hello Stephen,

      actually not a dumb question at all. We evaluated AFM quite a bit
      before turning it down.

      The horror stories about it and massive data loss are too scary. Plus
      we had actual reports of very bad performance. Personally I think AFM
      is very complicated, overcomplicated for what we need. We need the
      data safe, we don't need active / active DR or anything like that.
      While AFM can technically do what we need the complexity of its
      design makes it too easy to make a mistake and cause a service
      disruption or, even worst, data loss. We are a very small institute
      with a small IT team, so investing time in making it right was also
      not really worth it due to the high TCO.

      Kind regards.

      --
      Enrico Tagliavini
      Systems / Software Engineer

      enrico.tagliavini at fmi.ch

      Friedrich Miescher Institute for Biomedical Research
      Infomatics

      Maulbeerstrasse 66
      4058 Basel
      Switzerland


      On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
        I?m going to ask what may be a dumb question:

        Given that you have GPFS on both ends, what made you decide to NOT
        use AFM?

         --
        Stephen


         On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico
         <enrico.tagliavini at fmi.ch> wrote:

         ?Hello William,

         I've got your email forwarded my another user and I decided to
         subscribe to give you my two cents.

         I would like to warn you about the risk of dong what you have in
         mind. Using the GPFS policy engine to get a list of file to rsync
         is
         easily going to get you with missing data in the backup. The
         problem is that there are cases that are not covered by it. For
         example
         if you mv a folder with a lot of nested subfolders and files none
         of the subfolders would show up in your list of files to be
         updated.

         DM API would be the way to go, as you could replicate the mv on
         the backup side, but you must not miss any event, which scares me
         enough not to go that route.

         What I ended up doing instead: we run GPFS on both side, main and
         backup storage. So I use the policy engine on both sides and just
         build up the differences. We have about 250 million files and this
         is surprisingly fast. On top of that add all the files for which
         the ctime changes in the last couple of days (to update metadata
         info).

         Good luck.
         Kind regards.

         --

         Enrico Tagliavini
         Systems / Software Engineer

         enrico.tagliavini at fmi.ch

         Friedrich Miescher Institute for Biomedical Research
         Infomatics

         Maulbeerstrasse 66
         4058 Basel
         Switzerland


         -------- Forwarded Message --------

           -----Original Message-----
           From: gpfsug-discuss-bounces at spectrumscale.org
           <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan
           Novosielski
           Sent: Wednesday, March 10, 2021 3:22 AM
           To: gpfsug main discussion list
           <gpfsug-discuss at spectrumscale.org>
           Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync

           Yup, you want to use the policy engine:

           https://protect-au.mimecast.com/s/5FXFCvl1rKi77y78YhzCNU5?domain=ibm.com

           Something in here ought to help. We do something like this (but
           I?m reluctant to provide examples as I?m actually suspicious
           that we
           don?t have it quite right and are passing far too much stuff to
           rsync).

           --
           #BlackLivesMatter
           ____
              \\UTGERS,
              |---------------------------*O*---------------------------
              _// the State |         Ryan Novosielski -
              novosirj at rutgers.edu
              \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
              RBHS Campus
               \\    of NJ | Office of Advanced Research Computing - MSB
              C630, Newark
                `'

            On Mar 9, 2021, at 9:19 PM, William Burke
            <bill.burke.860 at gmail.com> wrote:

             I would like to know what files were modified/created/deleted
            (only for the current day) on the GPFS's file system so that I
            could rsync ONLY those files to a predetermined external
            location. I am running GPFS 4.2.3.9

            Is there a way to access the GPFS's metadata directly so that I
            do not have to traverse the filesystem looking for these files?
            If
            i use the rsync tool it will scan the file system which is 400+
            million files.  Obviously this will be problematic to complete
            a
            scan in a day, if it would ever complete single-threaded. There
            are tools or scripts that run multithreaded rsync but it's
            still a
            brute force attempt. and it would be nice to know where the
            delta of files that have changed.

            I began looking at Spectrum Scale Data Management (DM) API but
            I am not sure if this is the best approach to looking at the
            GPFS
            metadata - inodes, modify times, creation times, etc.


            --

            Best Regards,

            William Burke (he/him)
            Lead HPC Engineer
            Advance Research Computing
            860.255.8832 m | LinkedIn
            _______________________________________________
            gpfsug-discuss mailing list
            gpfsug-discuss at spectrumscale.org
            https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org

           _______________________________________________
           gpfsug-discuss mailing list
           gpfsug-discuss at spectrumscale.org
           https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
         _______________________________________________
         gpfsug-discuss mailing list
         gpfsug-discuss at spectrumscale.org
         https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://protect-au.mimecast.com/s/uNqKCwV1vMfGGRGxqcKIIVS?domain=urldefense.proofpoint.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://protect-au.mimecast.com/s/bouzCxngwOf11Q1v7TRQ-qb?domain=gpfsug.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A816397.jpg
Type: image/jpeg
Size: 4919 bytes
Desc: not available
URL: <https://protect-au.mimecast.com/s/MVTSCyojxQTrryro8UA5AGt?domain=gpfsug.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <https://protect-au.mimecast.com/s/D4DACzvkyVCMMmMqkcB4NCX?domain=gpfsug.org>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org


End of gpfsug-discuss Digest, Vol 110, Issue 20
***********************************************


From juergen.hannappel at desy.de  Mon Mar 15 16:20:51 2021
From: juergen.hannappel at desy.de (Hannappel, Juergen)
Date: Mon, 15 Mar 2021 17:20:51 +0100 (CET)
Subject: [gpfsug-discuss] Detecting open files
Message-ID: <1985303510.24419797.1615825251660.JavaMail.zimbra@desy.de>

Hi,
when unlinking filesets that sometimes fails because some open files on that fileset still exist.

Is there a way to find which files are open, and from which node?
Without running a mmdsh -N all lsof  on serveral (big) remote clusters, that is. 

-- 
Dr. J?rgen Hannappel  DESY/IT    Tel.  : +49 40 8998-4616
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1711 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210315/e23fa39c/attachment-0001.bin>

From Robert.Oesterlin at nuance.com  Wed Mar 17 11:59:57 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 17 Mar 2021 11:59:57 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key
 value violates unique constraint
Message-ID: <C774922F-FB37-43C7-A8E3-AE331108D4F2@nuance.com>

Anyone run into this error from the GUI task ?FILESYSTEM_MOUNT? or ideas on how to fix it?

Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 07:55:14.051000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists.

Call getNextException to see other errors in the batch.,Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg5_tools','ems1-hs','RO','2021-03-17 07:55:15.686000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg5_tools) already exists. Call getNextException to see other errors in the batch.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/f11b7bd9/attachment-0001.htm>

From A.Wolf-Reber at de.ibm.com  Wed Mar 17 14:18:56 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 17 Mar 2021 14:18:56 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
In-Reply-To: <C774922F-FB37-43C7-A8E3-AE331108D4F2@nuance.com>
References: <C774922F-FB37-43C7-A8E3-AE331108D4F2@nuance.com>
Message-ID: <OF085714D2.A1FDDAF3-ON0025869B.004E3EEA-0025869B.004EA345@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898090.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898091.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898092.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0005.png>

From Robert.Oesterlin at nuance.com  Wed Mar 17 14:30:36 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 17 Mar 2021 14:30:36 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
Message-ID: <D28DC646-3309-477B-AA8F-FC0DEC7EC67F@nuance.com>

Can you give me details on how to do this? I tried this:

[root at ess1ems ~]# su postgres -c 'psql -d postgres -c "delete from fscc.filesystem_mounts"'
could not change directory to "/root"
psql: FATAL:  Peer authentication failed for user "postgres"


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Alexander Wolf <A.Wolf-Reber at de.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, March 17, 2021 at 9:19 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint

CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________
This is strange, the Java code should only try to insert rows that are not already there. If it was just the insert for the duplicate row we could ignore it. But this is a batch insert failing and therefore the FILESYSTEM_MOUNTS table does not get updated anymore. A quick fix is to launch the psql client and do a "delete from fscc.filesystem_mounts" to clear the table and run the FILESYSTEM_MOUNT task afterwards to repopulate it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/d2e9f442/attachment-0001.htm>

From A.Wolf-Reber at de.ibm.com  Wed Mar 17 15:09:51 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 17 Mar 2021 15:09:51 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
In-Reply-To: <D28DC646-3309-477B-AA8F-FC0DEC7EC67F@nuance.com>
References: <D28DC646-3309-477B-AA8F-FC0DEC7EC67F@nuance.com>
Message-ID: <OFC1C7EC99.FD4CA2B7-ON0025869B.00516D01-0025869B.00534CC4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898093.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898094.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898095.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0005.png>

From Robert.Oesterlin at nuance.com  Wed Mar 17 15:33:54 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 17 Mar 2021 15:33:54 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
Message-ID: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com>

The command completed, and I re-ran the FILESYSTEM_MOUNT, but it failed the same way.

[root at ess1ems ~]# psql postgres postgres -c "delete from fscc.filesystem_mounts"
DELETE 20

/usr/lpp/mmfs/gui/cli/runtask FILESYSTEM_MOUNT -debug
10:32 AM
Operation Failed
10:32 AM
Error: debug: locale=en_US
debug: Running 'mmlsmount 'fs1' -Y ' on node localhost
debug: Running 'mmlsmount 'fs2' -Y ' on node localhost
debug: Running 'mmlsmount 'fs3' -Y ' on node localhost
debug: Running 'mmlsmount 'fs4' -Y ' on node localhost
debug: Running 'mmlsmount 'nrg1_tools' -Y ' on node localhost
debug: Running 'mmlsmount 'nrg5_tools' -Y ' on node localhost
err: java.sql.BatchUpdateException: Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 11:32:38.830000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk"
Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists. Call getNextException to see other errors in the batch.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Alexander Wolf <A.Wolf-Reber at de.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, March 17, 2021 at 10:10 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint

CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________
I think

    psql postgres postgres -c "delete from fscc.filesystem_mounts"'

ran as root should do the trick.

Mit freundlichen Gr??en / Kind regards

[cid:image001.png at 01D71B19.07732D00]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/aa71a5d5/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1135 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/aa71a5d5/attachment-0001.png>

From A.Wolf-Reber at de.ibm.com  Wed Mar 17 17:05:11 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 17 Mar 2021 17:05:11 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
In-Reply-To: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com>
References: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com>
Message-ID: <OF7AA3CED5.BCAF2104-ON0025869B.005CE45C-0025869B.005DDC1A@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898096.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898097.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.image001.png at 01D71B19.07732D00.png
Type: image/png
Size: 1135 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898098.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0007.png>

From robert.horton at icr.ac.uk  Thu Mar 18 15:47:07 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Thu, 18 Mar 2021 15:47:07 +0000
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
Message-ID: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>

Hello,

We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that?

Thanks,
Rob

 Call Trace:
? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26]
 _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26]
 _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca

--

Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk | Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210318/0adafef8/attachment-0001.htm>

From vpuvvada at in.ibm.com  Fri Mar 19 06:32:00 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 19 Mar 2021 12:02:00 +0530
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
Message-ID: <OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>

Robert,

What is the scale version ? This issue may be related to these alerts.

https://www.ibm.com/support/pages/node/6355983
https://www.ibm.com/support/pages/node/6380740

These are the recommended steps to resolve the issue, but need more 
details on the scale version.

1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" 
command.
2. Perform rolling upgrade parallely at both cache and home clusters
    a. All nodes on home cluster to 5.0.5.6
    b. All gateway nodes in cache cluster to 5.0.5.6
 3. At home cluster, for each fileset target path, repeat below steps
      a. Remove .afmctl file
         mmafmlocal rm <fileset target path>/.afm/.afmctl
      b. Enable AFM
         mmafmconfig enable <fileset target path>
4. Start all AFM filesets at cache using "mmafmctl device start -j 
fileset" command. 

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/18/2021 09:17 PM
Subject:        [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,

We've recently started having an issue where processes running in a 
singularity container get stuck in a soft lockup and eventually the node 
needs to be forcibly rebooted. I have included a sample call trace below. 
Additionally, other (non-singularity) processes on other nodes accessing 
the same fileset seem to get into the same state. It's also an AFM IW 
fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? 
I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - 
although fact it seems to spread to other nodes would seem to suggest some 
GPFS involvement. It's possible the user is doing something inadvisable 
with Singularity (it's difficult to work out what's happening in the 
Nextflow pipeline) but even if they are it would be good to find a way of 
preventing them taking nodes down. I'm assuming the AFM is unlikely to be 
relevant - any views on that?

Thanks,
Rob

 Call Trace:
? 
_Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 
[mmfs26]
 
_ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 
[mmfs26]
 
_Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 
[mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? 
_ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 
[mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 
[mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 
_Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 
[mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? 
_Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 
[mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? 
_ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 
[mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | 
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210319/cf4e9cc2/attachment-0001.htm>

From robert.horton at icr.ac.uk  Fri Mar 19 09:42:22 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Fri, 19 Mar 2021 09:42:22 +0000
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
	<OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
Message-ID: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk>

Hi Venkat,

Thanks for getting back to me.

On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 everywhere else, including gateway nodes.
The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the licensing but we're in the process of replacing that system.

The actual AFM seems to be behaving fine though so I'm not sure that's our issue. I guess our next job is to see if we can reproduce it in a non-AFM fileset.

Rob

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe.

Robert,

What is the scale version ? This issue may be related to these alerts.

https://www.ibm.com/support/pages/node/6355983
https://www.ibm.com/support/pages/node/6380740

These are the recommended steps to resolve the issue, but need more details on the scale version.

1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command.
2. Perform rolling upgrade parallely at both cache and home clusters
    a. All nodes on home cluster to 5.0.5.6
    b. All gateway nodes in cache cluster to 5.0.5.6
 3. At home cluster, for each fileset target path, repeat below steps
      a. Remove .afmctl file
         mmafmlocal rm <fileset target path>/.afm/.afmctl
      b. Enable AFM
         mmafmconfig enable <fileset target path>
4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command.

~Venkat (vpuvvada at in.ibm.com)


From:        Robert Horton <robert.horton at icr.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        03/18/2021 09:17 PM
Subject:        [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that?

Thanks,
Rob

 Call Trace:
? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26]
 _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26]
 _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
--
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk>| W www.icr.ac.uk| Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network._______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e=


--

Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk | Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210319/039db1ec/attachment-0001.htm>

From vpuvvada at in.ibm.com  Fri Mar 19 09:50:04 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 19 Mar 2021 15:20:04 +0530
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk><OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
	<459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk>
Message-ID: <OFFEAA8537.FB8BE861-ON6525869D.0035B3A4-6525869D.003605D6@notes.na.collabserv.com>

Hi Robert,

So you might have started seeing problem after upgrading the gateway nodes 
to 5.0.5.2. Upgrading gateway nodes at cache cluster to 5.0.5.6 would 
resolve this problem.

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/19/2021 03:13 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Venkat,

Thanks for getting back to me.

On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 
everywhere else, including gateway nodes.
The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the 
licensing but we're in the process of replacing that system.

The actual AFM seems to be behaving fine though so I'm not sure that's our 
issue. I guess our next job is to see if we can reproduce it in a non-AFM 
fileset.

Rob

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
CAUTION: This email originated from outside of the ICR. Do not click links 
or open attachments unless you recognize the sender's email address and 
know the content is safe.

Robert,

What is the scale version ? This issue may be related to these alerts.

https://www.ibm.com/support/pages/node/6355983
https://www.ibm.com/support/pages/node/6380740

These are the recommended steps to resolve the issue, but need more 
details on the scale version.

1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" 
command.
2. Perform rolling upgrade parallely at both cache and home clusters
    a. All nodes on home cluster to 5.0.5.6
    b. All gateway nodes in cache cluster to 5.0.5.6
 3. At home cluster, for each fileset target path, repeat below steps
      a. Remove .afmctl file
         mmafmlocal rm <fileset target path>/.afm/.afmctl
      b. Enable AFM
         mmafmconfig enable <fileset target path>
4. Start all AFM filesets at cache using "mmafmctl device start -j 
fileset" command. 

~Venkat (vpuvvada at in.ibm.com)


From:        Robert Horton <robert.horton at icr.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:        03/18/2021 09:17 PM
Subject:        [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,

We've recently started having an issue where processes running in a 
singularity container get stuck in a soft lockup and eventually the node 
needs to be forcibly rebooted. I have included a sample call trace below. 
Additionally, other (non-singularity) processes on other nodes accessing 
the same fileset seem to get into the same state. It's also an AFM IW 
fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? 
I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - 
although fact it seems to spread to other nodes would seem to suggest some 
GPFS involvement. It's possible the user is doing something inadvisable 
with Singularity (it's difficult to work out what's happening in the 
Nextflow pipeline) but even if they are it would be good to find a way of 
preventing them taking nodes down. I'm assuming the AFM is unlikely to be 
relevant - any views on that?

Thanks,
Rob

 Call Trace:
? 
_Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 
[mmfs26]
 
_ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 
[mmfs26]
 
_Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 
[mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? 
_ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 
[mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 
[mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 
_Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 
[mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? 
_Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 
[mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? 
_ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 
[mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk| W www.icr.ac.uk| 
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e=


-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | 
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=KgYs-kXBKE5JoAaGYRiU9iIxNkJSZeicxpSTmL39_B8&s=6FodZ_EQ8VAOE_xoEkfoUzmJpaiF7bgbERvA9avLZfg&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210319/21e8dcdd/attachment-0001.htm>

From u.sibiller at science-computing.de  Mon Mar 22 09:32:10 2021
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 22 Mar 2021 10:32:10 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
Message-ID: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>

Hello,

we usually create filesets for project dirs and homes.

Unfortunately we have discovered that this convention has been ignored for some dirs and their data 
no resides in the root fileset. We would like to move the data to independent filesets.

Is there a way to do this without having to schedule a downtime for the dirs in question?

I mean, is there a way to transparently move data to an independent fileset at the same path?


Kind regards,

Ulrich Sibiller
-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From janfrode at tanso.net  Mon Mar 22 09:54:28 2021
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Mon, 22 Mar 2021 10:54:28 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
Message-ID: <CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>

No ? all copying between filesets require full data copy. No simple rename.

This might be worthy of an RFE, as it?s a bit unexpected, and could
potentially work more efficiently..


  -jf

man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller <
u.sibiller at science-computing.de>:

> Hello,
>
> we usually create filesets for project dirs and homes.
>
> Unfortunately we have discovered that this convention has been ignored for
> some dirs and their data
> no resides in the root fileset. We would like to move the data to
> independent filesets.
>
> Is there a way to do this without having to schedule a downtime for the
> dirs in question?
>
> I mean, is there a way to transparently move data to an independent
> fileset at the same path?
>
>
> Kind regards,
>
> Ulrich Sibiller
> --
> Science + Computing AG
> Vorstandsvorsitzender/Chairman of the board of management:
> Dr. Martin Matzke
> Vorstand/Board of Management:
> Matthias Schempp, Sabine Hohenstein
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Philippe Miltin
> Aufsichtsrat/Supervisory Board:
> Martin Wibbe, Ursula Morgenstern
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210322/ecf10a03/attachment-0001.htm>

From S.J.Thompson at bham.ac.uk  Mon Mar 22 12:24:59 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 22 Mar 2021 12:24:59 +0000
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
	<CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
Message-ID: <DAA1D98E-D106-4783-8C64-4E495570083B@bham.ac.uk>

You could maybe create the new file-set, link in a different place, copy the data ?

Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially reducing the time to do the copy.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "janfrode at tanso.net" <janfrode at tanso.net>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Monday, 22 March 2021 at 09:54
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Move data to fileset seamlessly

No ? all copying between filesets require full data copy. No simple rename.

This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently..


  -jf

man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller <u.sibiller at science-computing.de<mailto:u.sibiller at science-computing.de>>:
Hello,

we usually create filesets for project dirs and homes.

Unfortunately we have discovered that this convention has been ignored for some dirs and their data
no resides in the root fileset. We would like to move the data to independent filesets.

Is there a way to do this without having to schedule a downtime for the dirs in question?

I mean, is there a way to transparently move data to an independent fileset at the same path?


Kind regards,

Ulrich Sibiller
--
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210322/02a88c29/attachment-0001.htm>

From u.sibiller at science-computing.de  Mon Mar 22 13:20:46 2021
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 22 Mar 2021 14:20:46 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <DAA1D98E-D106-4783-8C64-4E495570083B@bham.ac.uk>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
	<CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
	<DAA1D98E-D106-4783-8C64-4E495570083B@bham.ac.uk>
Message-ID: <a40d9dea-9247-db0d-6a7a-d9e6c8913435@science-computing.de>

On 22.03.21 13:24, Simon Thompson wrote:
> You could maybe create the new file-set, link in a different place, copy the data ?
> 
> Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially 
> reducing the time to do the copy.

Yes, but this does not help if a file is open all the time, e.g. during a long-running job.

Uli
-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From u.sibiller at science-computing.de  Mon Mar 22 13:41:39 2021
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 22 Mar 2021 14:41:39 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
	<CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
Message-ID: <6f626186-cb7a-46d5-781c-8f3a21b7e270@science-computing.de>

On 22.03.21 10:54, Jan-Frode Myklebust wrote:
> No ? all copying between filesets require full data copy. No simple rename.
> 
> This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently..

Yes, your are right. So please vote here:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=149429

Uli

-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From robert.horton at icr.ac.uk  Tue Mar 23 19:02:05 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Tue, 23 Mar 2021 19:02:05 +0000
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
	<OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
Message-ID: <a3ed2b7dcfb13def6281a8a8be214c14c68b780b.camel@icr.ac.uk>

Hi,

Sorry for the delay...

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
> 
...
> 1. Stop all AFM filesets at cache using "mmafmctl device stop -j
> fileset" command.
> 2. Perform rolling upgrade parallely at both cache and home clusters
>     a. All nodes on home cluster to 5.0.5.6
>     b. All gateway nodes in cache cluster to 5.0.5.6
>  3. At home cluster, for each fileset target path, repeat below steps
>       a. Remove .afmctl file
>          mmafmlocal rm <fileset target path>/.afm/.afmctl
>       b. Enable AFM

At point 3 I'm getting:

# mmafmlocal rm <target path>/.afm/.afmctl
/bin/rm: cannot remove ?<target path>/.afm/.afmctl?: Operation not permitted

afmconfig disable is the same.

Any idea what the issue is?

Thanks,
Rob
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

From vpuvvada at in.ibm.com  Wed Mar 24 02:36:31 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Wed, 24 Mar 2021 08:06:31 +0530
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <a3ed2b7dcfb13def6281a8a8be214c14c68b780b.camel@icr.ac.uk>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk><OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
	<a3ed2b7dcfb13def6281a8a8be214c14c68b780b.camel@icr.ac.uk>
Message-ID: <OF3845FB87.4819CCDC-ON652586A2.000E1B29-652586A2.000E5477@notes.na.collabserv.com>

># mmafmlocal rm <target path>/.afm/.afmctl
>/bin/rm: cannot remove ?<target path>/.afm/.afmctl?: Operation not 
permitted

This step is only required if home cluster is on 5.0.5.2/5.0.5.3. You can 
ignore this issue, and restart AFM filesets at cache.

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/24/2021 12:33 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,

Sorry for the delay...

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
> 
...
> 1. Stop all AFM filesets at cache using "mmafmctl device stop -j
> fileset" command.
> 2. Perform rolling upgrade parallely at both cache and home clusters
>     a. All nodes on home cluster to 5.0.5.6
>     b. All gateway nodes in cache cluster to 5.0.5.6
>  3. At home cluster, for each fileset target path, repeat below steps
>       a. Remove .afmctl file
>          mmafmlocal rm <fileset target path>/.afm/.afmctl
>       b. Enable AFM

At point 3 I'm getting:

# mmafmlocal rm <target path>/.afm/.afmctl
/bin/rm: cannot remove ?<target path>/.afm/.afmctl?: Operation not 
permitted

afmconfig disable is the same.

Any idea what the issue is?

Thanks,
Rob
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only.  If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=OLf3tBvTItpLRieM34xb8Xd69tBYbwTDYAecT0D_B7k&s=FCJEEoTWGIoM4eY4SMzE55qskwhAnxC_noZu7fJHoqw&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210324/cae042b1/attachment-0001.htm>

From prasad.surampudi at theatsgroup.com  Wed Mar 24 14:32:30 2021
From: prasad.surampudi at theatsgroup.com (Prasad Surampudi)
Date: Wed, 24 Mar 2021 14:32:30 +0000
Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset
 statistics for some filesystems
Message-ID: <MN2PR13MB2976E1827F4FDDCD08FE9F0C9E639@MN2PR13MB2976.namprd13.prod.outlook.com>

Recently while checking fileset quotas in a ESS cluster, we noticed that the mmrepquota command is not reporting the root fileset quota and inode details for some filesystems. Does anyone else also saw this issue?

Please see the output below. The root fileset shows up for 'prod' filesystem and does not show up for 'prod-private'. I could not figure out why it does not show up for prod-private. Any ideas?

/usr/lpp/mmfs/bin/mmrepquota -j prod-private
                         Block Limits                                    |                     File Limits
Name       fileset    type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace
xFIN    root       FILESET    12028144          0          0          0     none |  4524237       0        0        0     none

/usr/lpp/mmfs/bin/mmrepquota -j prod
                         Block Limits                                    |                     File Limits
Name       fileset    type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace
root       root       FILESET     7106656          0          0 1273643728     none |        7       0        0      400     none
xxx_tick root       FILESET           0          0          0          0     none |        1       0        0        0     none

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210324/f6f9ad44/attachment-0001.htm>

From scale at us.ibm.com  Thu Mar 25 16:33:48 2021
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Thu, 25 Mar 2021 11:33:48 -0500
Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset
 statistics for some filesystems
In-Reply-To: <MN2PR13MB2976E1827F4FDDCD08FE9F0C9E639@MN2PR13MB2976.namprd13.prod.outlook.com>
References: <MN2PR13MB2976E1827F4FDDCD08FE9F0C9E639@MN2PR13MB2976.namprd13.prod.outlook.com>
Message-ID: <OF34F58B90.44FECEB5-ON852586A3.005A27F3-852586A3.005AFC86@notes.na.collabserv.com>


Prasad,

This is unexpected.  Please open a PMR so that data can be collected and
looked at.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	Prasad Surampudi <prasad.surampudi at theatsgroup.com>
To:	"gpfsug-discuss at spectrumscale.org"
            <gpfsug-discuss at spectrumscale.org>
Date:	03/24/2021 10:32 AM
Subject:	[EXTERNAL] [gpfsug-discuss] mmrepquota is not reporting root
            fileset statistics for some filesystems
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Recently while checking fileset quotas in a ESS cluster, we noticed that
the mmrepquota command is not reporting the root fileset quota and inode
details for some filesystems. Does anyone else also saw this issue?

Please see the output below. The root fileset shows up for 'prod'
filesystem and does not show up for 'prod-private'. I could not figure out
why it does not show up for prod-private. Any ideas?

/usr/lpp/mmfs/bin/mmrepquota -j prod-private
                         Block Limits                                    |
File Limits
Name       fileset    type             KB      quota      limit   in_doubt
grace |    files   quota    limit in_doubt    grace
xFIN    root       FILESET    12028144          0          0          0
none |  4524237       0        0        0     none

/usr/lpp/mmfs/bin/mmrepquota -j prod
                         Block Limits                                    |
File Limits
Name       fileset    type             KB      quota      limit   in_doubt
grace |    files   quota    limit in_doubt    grace
root       root       FILESET     7106656          0          0 1273643728
none |        7       0        0      400     none
xxx_tick root       FILESET           0          0          0          0
none |        1       0        0        0     none
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210325/497c8710/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210325/497c8710/attachment-0001.gif>

From oluwasijibomi.saula at ndsu.edu  Mon Mar 29 19:38:00 2021
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Mon, 29 Mar 2021 18:38:00 +0000
Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node
Message-ID: <PH0PR08MB6598F1A7BC557225D417D8C9987E9@PH0PR08MB6598.namprd08.prod.outlook.com>

Hello Folks,

So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset.

These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7.

Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted:


2021-03-29_12:47:37.343-0500: [N] mmfsd ready

2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all

2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1

2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1

2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor <c0n2>

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 <c0n2>

There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident.

I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly...

Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ?


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210329/4ce36267/attachment-0001.htm>

From olaf.weiser at de.ibm.com  Tue Mar 30 07:06:54 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Tue, 30 Mar 2021 06:06:54 +0000
Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node
In-Reply-To: <PH0PR08MB6598F1A7BC557225D417D8C9987E9@PH0PR08MB6598.namprd08.prod.outlook.com>
References: <PH0PR08MB6598F1A7BC557225D417D8C9987E9@PH0PR08MB6598.namprd08.prod.outlook.com>
Message-ID: <OF4FF5120B.5E2B3DE7-ON002586A8.0021023A-002586A8.0021976C@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210330/ae3c3cdd/attachment-0001.htm>

From oluwasijibomi.saula at ndsu.edu  Tue Mar 30 19:24:00 2021
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Tue, 30 Mar 2021 18:24:00 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 110, Issue 34
In-Reply-To: <mailman.61.1617084420.1331.gpfsug-discuss@spectrumscale.org>
References: <mailman.61.1617084420.1331.gpfsug-discuss@spectrumscale.org>
Message-ID: <CO1PR08MB659639A07B856B4185BF0F2E987D9@CO1PR08MB6596.namprd08.prod.outlook.com>

Hey Olaf,

We'll investigate as suggested. I'm hopeful the journald logs would provide some additional insight.

As for OFED versions, we use the same Mellanox version across the cluster and haven't seen any issues with working nodes that mount the filesystem.

We also have a PMR open with IBM but we'll send a follow-up if we discover something more for group discussion.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Tuesday, March 30, 2021 1:07 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 110, Issue 34

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Filesystem mount attempt hangs GPFS client node
      (Saula, Oluwasijibomi)
   2. Re: Filesystem mount attempt hangs GPFS client node (Olaf Weiser)


----------------------------------------------------------------------

Message: 1
Date: Mon, 29 Mar 2021 18:38:00 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client
        node
Message-ID:
        <PH0PR08MB6598F1A7BC557225D417D8C9987E9 at PH0PR08MB6598.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="utf-8"

Hello Folks,

So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset.

These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7.

Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted:


2021-03-29_12:47:37.343-0500: [N] mmfsd ready

2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all

2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1

2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1

2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor <c0n2>

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 <c0n2>

There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident.

I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly...

Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ?


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210329/4ce36267/attachment-0001.html>

------------------------------

Message: 2
Date: Tue, 30 Mar 2021 06:06:54 +0000
From: "Olaf Weiser" <olaf.weiser at de.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Filesystem mount attempt hangs GPFS
        client node
Message-ID:
        <OF4FF5120B.5E2B3DE7-ON002586A8.0021023A-002586A8.0021976C at notes.na.collabserv.com>

Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210330/ae3c3cdd/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 110, Issue 34
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210330/5df0235f/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  1 07:58:43 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 07:58:43 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk>
	<CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
Message-ID: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>

On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
> 
> I?ve tried benchmarking many vs. few vdisks per RG, and never could see 
> any performance difference.

That's encouraging.

> 
> Usually we create 1 vdisk per enclosure per RG, ? thinking this will 
> allow us to grow with same size vdisks when adding additional enclosures 
> in the future.
> 
> Don?t think mmvdisk can be told to create multiple vdisks per RG 
> directly, so you have to manually create multiple vdisk sets each with 
> the apropriate size.
> 

Thing is back in the day so GPFS v2.x/v3.x there where strict warnings 
that you needed a minimum of six NSD's for optimal performance. I have 
sat in presentations where IBM employees have said so. What we where 
told back then is that GPFS needs a minimum number of NSD's in order to 
be able to spread the I/O's out. So if an NSD is being pounded for reads 
and a write comes in it. can direct it to a less busy NSD.

Now I can imagine that in a ESS/DSS-G that as it's being scattered to 
the winds under the hood this is no longer relevant. But some notes to 
the effect for us old timers would be nice if that is the case to put 
our minds to rest.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Achim.Rehor at de.ibm.com  Mon Mar  1 08:16:43 2021
From: Achim.Rehor at de.ibm.com (Achim Rehor)
Date: Mon, 1 Mar 2021 09:16:43 +0100
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk><CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
	<0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
Message-ID: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>

The reason for having multiple NSDs in legacy NSD (non-GNR) handling is 
the increased parallelism, that gives you 'more spindles' and thus more 
performance.
In GNR the drives are used in parallel anyway through the GNRstriping. 
Therfore, you are using all drives of a ESS/GSS/DSS model under the hood 
in the vdisks anyway. 

The only reason for having more NSDs is for using them for different 
filesystems. 

 
Mit freundlichen Gr??en / Kind regards

Achim Rehor

IBM EMEA ESS/Spectrum Scale Support


gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43:

> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug-discuss at spectrumscale.org
> Date: 01/03/2021 08:58
> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
NSD's
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
> > 
> > I?ve tried benchmarking many vs. few vdisks per RG, and never could 
see 
> > any performance difference.
> 
> That's encouraging.
> 
> > 
> > Usually we create 1 vdisk per enclosure per RG,   thinking this will 
> > allow us to grow with same size vdisks when adding additional 
enclosures 
> > in the future.
> > 
> > Don?t think mmvdisk can be told to create multiple vdisks per RG 
> > directly, so you have to manually create multiple vdisk sets each with 

> > the apropriate size.
> > 
> 
> Thing is back in the day so GPFS v2.x/v3.x there where strict warnings 
> that you needed a minimum of six NSD's for optimal performance. I have 
> sat in presentations where IBM employees have said so. What we where 
> told back then is that GPFS needs a minimum number of NSD's in order to 
> be able to spread the I/O's out. So if an NSD is being pounded for reads 

> and a write comes in it. can direct it to a less busy NSD.
> 
> Now I can imagine that in a ESS/DSS-G that as it's being scattered to 
> the winds under the hood this is no longer relevant. But some notes to 
> the effect for us old timers would be nice if that is the case to put 
> our minds to rest.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
> M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
> IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
> 


From S.J.Thompson at bham.ac.uk  Mon Mar  1 09:06:07 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 1 Mar 2021 09:06:07 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk>
	<CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
	<0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
	<OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
Message-ID: <D0506AFA-A345-4101-BAD7-466B1FF2F3A1@bham.ac.uk>

Or for hedging your bets about how you might want to use it in future.

We are never quite sure if we want to do something different in the future with some of the storage, sure that might mean we want to steal some space from a file-system, but that is perfectly valid. And we have done this, both in temporary transient states (data migration between systems), or permanently (found we needed something on a separate file-system)

So yes whilst there might be no performance impact on doing this, we still do.

I vaguely recall some of the old reasoning was around IO queues in the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD server, you have 16 IO queues passing to multipath, which can help keep the data pipes full. I suspect there was some optimal number of NSDs for different storage controllers, but I don't know if anyone ever benchmarked that.

Simon

?On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com" <gpfsug-discuss-bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com> wrote:

    The reason for having multiple NSDs in legacy NSD (non-GNR) handling is 
    the increased parallelism, that gives you 'more spindles' and thus more 
    performance.
    In GNR the drives are used in parallel anyway through the GNRstriping. 
    Therfore, you are using all drives of a ESS/GSS/DSS model under the hood 
    in the vdisks anyway. 

    The only reason for having more NSDs is for using them for different 
    filesystems. 


    Mit freundlichen Gr??en / Kind regards

    Achim Rehor

    IBM EMEA ESS/Spectrum Scale Support


    gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43:

    > From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
    > To: gpfsug-discuss at spectrumscale.org
    > Date: 01/03/2021 08:58
    > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
    NSD's
    > Sent by: gpfsug-discuss-bounces at spectrumscale.org
    > 
    > On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
    > > 
    > > I?ve tried benchmarking many vs. few vdisks per RG, and never could 
    see 
    > > any performance difference.
    > 
    > That's encouraging.
    > 
    > > 
    > > Usually we create 1 vdisk per enclosure per RG,   thinking this will 
    > > allow us to grow with same size vdisks when adding additional 
    enclosures 
    > > in the future.
    > > 
    > > Don?t think mmvdisk can be told to create multiple vdisks per RG 
    > > directly, so you have to manually create multiple vdisk sets each with 

    > > the apropriate size.
    > > 
    > 
    > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings 
    > that you needed a minimum of six NSD's for optimal performance. I have 
    > sat in presentations where IBM employees have said so. What we where 
    > told back then is that GPFS needs a minimum number of NSD's in order to 
    > be able to spread the I/O's out. So if an NSD is being pounded for reads 

    > and a write comes in it. can direct it to a less busy NSD.
    > 
    > Now I can imagine that in a ESS/DSS-G that as it's being scattered to 
    > the winds under the hood this is no longer relevant. But some notes to 
    > the effect for us old timers would be nice if that is the case to put 
    > our minds to rest.
    > 
    > 
    > JAB.
    > 
    > -- 
    > Jonathan A. Buzzard                         Tel: +44141-5483420
    > HPC System Administrator, ARCHIE-WeSt.
    > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > https://urldefense.proofpoint.com/v2/url?
    > 
    u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
    > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
    > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
    > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
    > 


    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From luis.bolinches at fi.ibm.com  Mon Mar  1 09:08:20 2021
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Mon, 1 Mar 2021 09:08:20 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
Message-ID: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/297d1cf8/attachment-0002.htm>

From olaf.weiser at de.ibm.com  Mon Mar  1 09:34:26 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 09:34:26 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
References: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
Message-ID: <OF62EA85E8.65496B3A-ON0025868B.00339C57-0025868B.0034979D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/b0134bbb/attachment-0002.htm>

From Achim.Rehor at de.ibm.com  Mon Mar  1 09:46:06 2021
From: Achim.Rehor at de.ibm.com (Achim Rehor)
Date: Mon, 1 Mar 2021 10:46:06 +0100
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <D0506AFA-A345-4101-BAD7-466B1FF2F3A1@bham.ac.uk>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk><CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com><0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk><OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
	<D0506AFA-A345-4101-BAD7-466B1FF2F3A1@bham.ac.uk>
Message-ID: <OFFD16AB58.68169321-ONC125868B.00354699-C125868B.0035A890@notes.na.collabserv.com>

Correct, there was. 
The OS is dealing with pdisks, while GPFS is striping over Vdisks/NSDs.

For GNR there is a differetnt queuing setup in GPFS, than there was for 
NSDs.
See "mmfsadm dump nsd" and check for NsdQueueTraditional versus 
NsdQueueGNR 

And yes, i was too strict, with 
">     The only reason for having more NSDs is for using them for 
different 
>     filesystems."

there are other management reasons to run with a reasonable number of 
vdisks, just not performance reasons. 

    Mit freundlichen Gruessen / Kind regards

    Achim Rehor

    IBM EMEA ESS/Spectrum Scale Support


gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 10:06:07:

> From: Simon Thompson <S.J.Thompson at bham.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 01/03/2021 10:06
> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
NSD's
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> Or for hedging your bets about how you might want to use it in future.
> 
> We are never quite sure if we want to do something different in the 
> future with some of the storage, sure that might mean we want to 
> steal some space from a file-system, but that is perfectly valid. 
> And we have done this, both in temporary transient states (data 
> migration between systems), or permanently (found we needed 
> something on a separate file-system)
> 
> So yes whilst there might be no performance impact on doing this, 
westill do.
> 
> I vaguely recall some of the old reasoning was around IO queues in 
> the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD 
> server, you have 16 IO queues passing to multipath, which can help 
> keep the data pipes full. I suspect there was some optimal number of
> NSDs for different storage controllers, but I don't know if anyone 
> ever benchmarked that.
> 
> Simon
> 
> On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on 
> behalf of Achim.Rehor at de.ibm.com" <gpfsug-discuss-
> bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com> wrote:
> 
>     The reason for having multiple NSDs in legacy NSD (non-GNR) handling 
is 
>     the increased parallelism, that gives you 'more spindles' and thus 
more 
>     performance.
>     In GNR the drives are used in parallel anyway through the 
GNRstriping. 
>     Therfore, you are using all drives of a ESS/GSS/DSS model under the 
hood 
>     in the vdisks anyway. 
> 
>     The only reason for having more NSDs is for using them for different 

>     filesystems. 
> 
> 
>     Mit freundlichen Gr??en / Kind regards
> 
>     Achim Rehor
> 
>     IBM EMEA ESS/Spectrum Scale Support
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>     gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 
08:58:43:
> 
>     > From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
>     > To: gpfsug-discuss at spectrumscale.org
>     > Date: 01/03/2021 08:58
>     > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number 
of 
>     NSD's
>     > Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     > 
>     > On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
>     > > 
>     > > I?ve tried benchmarking many vs. few vdisks per RG, and never 
could 
>     see 
>     > > any performance difference.
>     > 
>     > That's encouraging.
>     > 
>     > > 
>     > > Usually we create 1 vdisk per enclosure per RG,   thinking this 
will 
>     > > allow us to grow with same size vdisks when adding additional 
>     enclosures 
>     > > in the future.
>     > > 
>     > > Don?t think mmvdisk can be told to create multiple vdisks per RG 

>     > > directly, so you have to manually create multiple vdisk setseach 
with 
> 
>     > > the apropriate size.
>     > > 
>     > 
>     > Thing is back in the day so GPFS v2.x/v3.x there where strict 
warnings 
>     > that you needed a minimum of six NSD's for optimal performance. I 
have 
>     > sat in presentations where IBM employees have said so. What we 
where 
>     > told back then is that GPFS needs a minimum number of NSD's 
inorder to 
>     > be able to spread the I/O's out. So if an NSD is being poundedfor 
reads 
> 
>     > and a write comes in it. can direct it to a less busy NSD.
>     > 
>     > Now I can imagine that in a ESS/DSS-G that as it's being scattered 
to 
>     > the winds under the hood this is no longer relevant. But some 
notes to 
>     > the effect for us old timers would be nice if that is the case to 
put 
>     > our minds to rest.
>     > 
>     > 
>     > JAB.
>     > 
>     > -- 
>     > Jonathan A. Buzzard                         Tel: +44141-5483420
>     > HPC System Administrator, ARCHIE-WeSt.
>     > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>     > _______________________________________________
>     > gpfsug-discuss mailing list
>     > gpfsug-discuss at spectrumscale.org
>     > https://urldefense.proofpoint.com/v2/url?
>     > 
> 
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
>     > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
>     > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
>     > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
>     > 
> 
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
> M&m=gU9xf_Z6rrdOa4-
> 
WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e=
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
> M&m=gU9xf_Z6rrdOa4-
> 
WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e=
> 


From jonathan.buzzard at strath.ac.uk  Mon Mar  1 11:45:45 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 11:45:45 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
References: <OFF77E3C9F.57253AF7-ON0025868B.002EF416-0025868B.003233D4@notes.na.collabserv.com>
Message-ID: <ab18ea94-e4ca-a4f5-2070-094acf4e08db@strath.ac.uk>

On 01/03/2021 09:08, Luis Bolinches wrote:
> Hi
 >
> There other reasons to have more than 1. It is management of those. When 
> you have to add or remove NSDs of a FS having more than 1 makes it 
> possible to empty some space and manage those in and out. Manually but 
> possible. If you have one big NSD or even 1 per enclosure it might 
> difficult or even not possible depending the number of enclosures and FS 
> utilization.
 >
> Starting some ESS version (not DSS, cant comment on that) that I do not 
> recall but in the last 6 months, we have change the default (for those 
> that use the default) to 4 NSDs per enclosure for ESS 5000. There is no 
> impact on performance either way on ESS, we tested it. But management of 
> those on the long run should be easier.
Question how does one create a none default number of vdisks per 
enclosure then?

I tried creating a stanza file and then doing mmcrvdisk but it was not 
happy, presumably because of the "new style" recovery group management

mmcrvdisk: [E] This command is not supported by recovery groups under 
management of mmvdisk.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From abeattie at au1.ibm.com  Mon Mar  1 11:53:32 2021
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Mon, 1 Mar 2021 11:53:32 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <ab18ea94-e4ca-a4f5-2070-094acf4e08db@strath.ac.uk>
Message-ID: <OF7915C61C.BB236456-ON0025868B.0041537A-1614599612656@notes.na.collabserv.com>

Jonathan,

You need to create vdisk sets which will create multiple vdisks, you can then assign vdisk sets to your filesystem. (Assigning multiple vdisks at a time)

Things to watch - free space calculations are more complex as it?s building multiple vdisks under the cover using multiple raid parameters

Also it?s worth assuming a 10% reserve or approx - drive per disk shelf for rebuild space 


Mmvdisk vdisk set ... insert parameters

https://www.ibm.com/support/knowledgecenter/mk/SSYSP8_5.3.2/com.ibm.spectrum.scale.raid.v5r02.adm.doc/bl8adm_mmvdisk.htm

Sent from my iPhone

> On 1 Mar 2021, at 21:45, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> ?On 01/03/2021 09:08, Luis Bolinches wrote:
>> Hi
>> 
>> There other reasons to have more than 1. It is management of those. When 
>> you have to add or remove NSDs of a FS having more than 1 makes it 
>> possible to empty some space and manage those in and out. Manually but 
>> possible. If you have one big NSD or even 1 per enclosure it might 
>> difficult or even not possible depending the number of enclosures and FS 
>> utilization.
>> 
>> Starting some ESS version (not DSS, cant comment on that) that I do not 
>> recall but in the last 6 months, we have change the default (for those 
>> that use the default) to 4 NSDs per enclosure for ESS 5000. There is no 
>> impact on performance either way on ESS, we tested it. But management of 
>> those on the long run should be easier.
> Question how does one create a none default number of vdisks per 
> enclosure then?
> 
> I tried creating a stanza file and then doing mmcrvdisk but it was not 
> happy, presumably because of the "new style" recovery group management
> 
> mmcrvdisk: [E] This command is not supported by recovery groups under 
> management of mmvdisk.
> 
> 
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=STXkGEO2XATS_s2pRCAAh2wXtuUgwVcx1XjUX7ELNdk&m=9HlRHByoByQcM0mY0elL-l4DgA6MzHkAGzE70Rl2p2E&s=eWRfWGpdZB-PZ_InCCjgmdQOCy6rgWj9Oi3TGGA38yY&e= 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/16786097/attachment-0002.htm>

From scl at virginia.edu  Mon Mar  1 12:31:37 2021
From: scl at virginia.edu (Losen, Stephen C (scl))
Date: Mon, 1 Mar 2021 12:31:37 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
Message-ID: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>

Hi folks,
Experimenting with POSIX ACLs on GPFS 4.2 and noticed that the Linux command setfacl clears "c" permissions that were set with mmputacl. So if I have this:

...
group:group1:rwxc
mask::rwxc
...

and I modify a different entry with:

setfacl -m group:group2:r-x dirname

then the "c" permissions above get cleared and I end up with
...
group:group1:rwx-
mask::rwx-
...

I discovered that chmod does not clear the "c" mode. Is there any filesystem option to change this behavior to leave "c" modes in place? 

Steve Losen
Research Computing
University of Virginia
scl at virginia.edu   434-924-0640


From olaf.weiser at de.ibm.com  Mon Mar  1 12:45:44 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 12:45:44 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
Message-ID: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/e5bc3a9a/attachment-0002.htm>

From stockf at us.ibm.com  Mon Mar  1 12:58:44 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 1 Mar 2021 12:58:44 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
References: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>,
	<488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
Message-ID: <OFA81D94F5.553D5F13-ON0025868B.00473C67-0025868B.00474BD5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/b76ec4b4/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  1 13:14:38 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 13:14:38 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
	<OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
Message-ID: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>

On 01/03/2021 12:45, Olaf Weiser wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> Hallo Stephen,
> behavior ... or better to say ... predicted behavior for chmod and ACLs 
> .. is not an easy thing or only? , if? you stay in either POSIX world or 
> NFSv4 world
> to be POSIX compliant, a chmod overwrites ACLs

One might argue that the general rubbishness of the mmputacl cammand, 
and if a mmsetfacl command (or similar) existed it would negate messing 
with Linux utilities to change ACL's on GPFS file systems

Only been bringing it up for over a decade now ;-)

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From olaf.weiser at de.ibm.com  Mon Mar  1 15:18:59 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 15:18:59 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>
References: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>,
	<488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu><OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
Message-ID: <OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/1d0fb999/attachment-0002.htm>

From laurence at qsplace.co.uk  Mon Mar  1 08:59:35 2021
From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow)
Date: Mon, 01 Mar 2021 08:59:35 +0000
Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's
In-Reply-To: <OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
References: <b042ebb8-63bf-e84a-63fe-08882c73c871@strath.ac.uk><CAHwPathDZjZuSo9pZc-hDMWMBbeqp6PgB4NJAXUwrwbKvgZLXA@mail.gmail.com>
	<0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk>
	<OF63FF441E.A3D223CE-ONC125868B.002D2092-C125868B.002D79F9@notes.na.collabserv.com>
Message-ID: <6F478E88-E350-46BF-9993-82C21ADD2262@qsplace.co.uk>

Like Jan, I did some benchmarking a few years ago when the default recommended RG's dropped to 1 per DA to meet rebuild requirements. I couldn't see any discernable difference.

As Achim has also mentioned, I just use vdisks for creating additional filesystems. Unless there is going to be a lot of shuffling of space or future filesystem builds, then I divide the RG's into say 10 vdisks to give some flexibility and granularity

There is also a flag iirc that changes the gpfs magic to consider multiple under lying disks, when I find it again........ Which can provide increased performance on traditional RAID builds.

-- Lauz

On 1 March 2021 08:16:43 GMT, Achim Rehor <Achim.Rehor at de.ibm.com> wrote:
>The reason for having multiple NSDs in legacy NSD (non-GNR) handling is
>
>the increased parallelism, that gives you 'more spindles' and thus more
>
>performance.
>In GNR the drives are used in parallel anyway through the GNRstriping. 
>Therfore, you are using all drives of a ESS/GSS/DSS model under the
>hood 
>in the vdisks anyway. 
>
>The only reason for having more NSDs is for using them for different 
>filesystems. 
>
> 
>Mit freundlichen Gr??en / Kind regards
>
>Achim Rehor
>
>IBM EMEA ESS/Spectrum Scale Support
>
>
>
>
>
>
>
>
>
>
>
>
>gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43:
>
>> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
>> To: gpfsug-discuss at spectrumscale.org
>> Date: 01/03/2021 08:58
>> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of 
>NSD's
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> 
>> On 28/02/2021 09:31, Jan-Frode Myklebust wrote:
>> > 
>> > I?ve tried benchmarking many vs. few vdisks per RG, and never could
>
>see 
>> > any performance difference.
>> 
>> That's encouraging.
>> 
>> > 
>> > Usually we create 1 vdisk per enclosure per RG,   thinking this
>will 
>> > allow us to grow with same size vdisks when adding additional 
>enclosures 
>> > in the future.
>> > 
>> > Don?t think mmvdisk can be told to create multiple vdisks per RG 
>> > directly, so you have to manually create multiple vdisk sets each
>with 
>
>> > the apropriate size.
>> > 
>> 
>> Thing is back in the day so GPFS v2.x/v3.x there where strict
>warnings 
>> that you needed a minimum of six NSD's for optimal performance. I
>have 
>> sat in presentations where IBM employees have said so. What we where 
>> told back then is that GPFS needs a minimum number of NSD's in order
>to 
>> be able to spread the I/O's out. So if an NSD is being pounded for
>reads 
>
>> and a write comes in it. can direct it to a less busy NSD.
>> 
>> Now I can imagine that in a ESS/DSS-G that as it's being scattered to
>
>> the winds under the hood this is no longer relevant. But some notes
>to 
>> the effect for us old timers would be nice if that is the case to put
>
>> our minds to rest.
>> 
>> 
>> JAB.
>> 
>> -- 
>> Jonathan A. Buzzard                         Tel: +44141-5483420
>> HPC System Administrator, ARCHIE-WeSt.
>> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> https://urldefense.proofpoint.com/v2/url?
>> 
>u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-
>> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-
>> M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH-
>> IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= 
>> 
>
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/91665775/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  1 16:50:31 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 1 Mar 2021 16:50:31 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
References: <dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk>
	<488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu>
	<OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com>
	<OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
Message-ID: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>

On 01/03/2021 15:18, Olaf Weiser wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> JAB,
> yes-this is in argument ;-) ... and personally I like the idea of having 
> smth like setfacl also for GPFS ..? for years...
> *but* it would not take away the generic challenge , what to do, if 
> there are competing standards / definitions to meet
> at least that is most likely just one reason, why there's no tool yet
> there are several hits on RFE page for "ACL".. some of them could be 
> also addressed with a (mm)setfacl tool
> but I was not able to find a request for a tool itself
> (I quickly? searched? public but? not found it there, maybe there is 
> already one in private...)
> So - dependent on how important this item for others? is? ... its time 
> to fire an RFE ?!? ...

Well when I asked I was told by an IBM representative that it was by 
design there was no proper way to set ACLs directly from Linux. The 
expectation was that you would do this over NFSv4 or Samba.

So filing an RFE would be pointless under those conditions and I have 
never bothered as a result. This was pre 2012 so IBM's outlook might 
have changed in the meantime.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From olaf.weiser at de.ibm.com  Mon Mar  1 17:57:11 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 1 Mar 2021 17:57:11 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>
References: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>,
	<dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk><488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu><OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com><OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
Message-ID: <OFA0B8D206.C995DEC6-ON0025868B.0061EBF7-0025868B.00629E93@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210301/3642e19b/attachment-0002.htm>

From A.Wolf-Reber at de.ibm.com  Tue Mar  2 09:36:48 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Tue, 2 Mar 2021 09:36:48 +0000
Subject: [gpfsug-discuss] Using setfacl vs. mmputacl
In-Reply-To: <OFA0B8D206.C995DEC6-ON0025868B.0061EBF7-0025868B.00629E93@notes.na.collabserv.com>
References: <OFA0B8D206.C995DEC6-ON0025868B.0061EBF7-0025868B.00629E93@notes.na.collabserv.com>,
	<122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>,
	<dab0799b-abe2-28a5-d59c-351bb45e90ba@strath.ac.uk><488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu><OF26E449BB.8CE12B36-ON0025868B.00456E22-0025868B.00461B09@notes.na.collabserv.com><OF51B488DF.3D727736-ON0025868B.0052DAA2-0025868B.00542307@notes.na.collabserv.com>
Message-ID: <OFBB90AEFF.65465E44-ON0025868C.003436BA-0025868C.0034CED5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16146770920000.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16146770920001.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16146770920002.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/da0307cf/attachment-0008.png>

From russell at nordquist.info  Tue Mar  2 19:31:24 2021
From: russell at nordquist.info (Russell Nordquist)
Date: Tue, 2 Mar 2021 14:31:24 -0500
Subject: [gpfsug-discuss] Self service creation of filesets
Message-ID: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>

Hi all

We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.

thanks
Russell


From anacreo at gmail.com  Tue Mar  2 20:58:29 2021
From: anacreo at gmail.com (Alec)
Date: Tue, 2 Mar 2021 12:58:29 -0800
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
Message-ID: <CAGhSTwjF1E0qvgCRphbrdWpikezWdjZ1CKtE=cVddeCbaFA4TA@mail.gmail.com>

This does feel like another situation where I may use a custom attribute
and a periodic script to do the fileset creation.  Honestly I would want
the change management around fileset creation.

But I could see a few custom attributes on a newly created user dir... Like
maybe just setting user.quota=10TB...  Then have a policy that discovers
these does the work of creating the fileset, setting the quotas, migrating
data to the fileset, and then mounting the fileset over the original
directory.  Honestly that sounds so nice I may have to implement this...
Lol.

Like I could see doing something like discovering directories that have
user.archive=true and automatically gzipping large files within. Would be
nice if GPFS policy engine could have a IF_ANCESTOR_ATTRIBUTE=.

Alec

On Tue, Mar 2, 2021, 11:40 AM Russell Nordquist <russell at nordquist.info>
wrote:

> Hi all
>
> We are trying to use filesets quite a bit, but it?s a hassle that only the
> admins can create them. To the users it?s just a directory so it slows
> things down. Has anyone deployed a self service model for creating
> filesets? Maybe using the API? This feels like shared pain that someone has
> already worked on?.
>
> thanks
> Russell
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210302/af5c169e/attachment-0002.htm>

From S.J.Thompson at bham.ac.uk  Tue Mar  2 22:38:17 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 2 Mar 2021 22:38:17 +0000
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
Message-ID: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk>

Not quite user self-service ....

But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again.

Simon

?On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" <gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info> wrote:

    Hi all

    We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.

    thanks
    Russell


    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From ckerner at illinois.edu  Tue Mar  2 22:59:01 2021
From: ckerner at illinois.edu (Kerner, Chad A)
Date: Tue, 2 Mar 2021 22:59:01 +0000
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
	<3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk>
Message-ID: <52196DB3-E8D3-47F7-92F6-3A123B46F615@illinois.edu>

We have a similar process. One of our customers has a web app that their managers use to provision spaces. That web app drops a json file into a specific location and a cron job kicks off a python script every so often to process the files and provision the space(fileset creation, link, quota, owner, group, perms, etc). Failures are queued and a jira ticket opened. Successes update the database for the web app. They are not requiring instant processing, so we process hourly on the back end side of things.

Chad
--
Chad Kerner, Senior Storage Engineer
Storage Enabling Technologies
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign

?On 3/2/21, 4:38 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson" <gpfsug-discuss-bounces at spectrumscale.org on behalf of S.J.Thompson at bham.ac.uk> wrote:

    Not quite user self-service ....
    
    But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again.
    
    Simon
    
    On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" <gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info> wrote:
    
        Hi all
    
        We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.
    
        thanks
        Russell
    
    
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ 
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ 
    

From tortay at cc.in2p3.fr  Wed Mar  3 08:06:37 2021
From: tortay at cc.in2p3.fr (Loic Tortay)
Date: Wed, 3 Mar 2021 09:06:37 +0100
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
Message-ID: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>

On 02/03/2021 20:31, Russell Nordquist wrote:
> Hi all
> 
> We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.
> 
Hello,
We have a quota management delegation (CLI) tool that allows 
"power-users" (PI and such) to create and remove filesets and manage 
users quotas for the groups/projects they're heading.

Like someone else said, from their point of view they're just 
directories, so they create a "directory with quotas".
In our experience, "directories with quotas" are the most convenient way 
for end-users to understand and use quotas.

This is a tool written in C, about 13 years ago, using the GPFS API (and 
a few calls to GPFS commands where there is no API or it's lacking).

Delegation authorization (identifying "power-users") is external to the 
tool.

Permissions & ACLs are also set on the junction when a fileset is 
created so that it's both immediately usable ("instant processing") and 
accessible to "power-users" (for space management purposes).

There are extra features for staff to allow higher-level operations 
(e.g. create an independent fileset for a group/project, change the 
group/project quotas, etc.)

The dated looking user documentation is 
https://ccspsmon.in2p3.fr/spsquota.html

Both the tool and the documentation have a few site-specific things, so 
it's not open-source (and it has become a "legacy" tool in need of a 
rewrite/refactoring).


Lo?c.
-- 
|   Lo?c Tortay <tortay at cc.in2p3.fr>  -     IN2P3 Computing Centre     |


From russell at nordquist.info  Wed Mar  3 17:14:37 2021
From: russell at nordquist.info (Russell Nordquist)
Date: Wed, 3 Mar 2021 12:14:37 -0500
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
	<2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>
Message-ID: <EA2BB7A5-C44F-4F65-BBD6-98926ABC906D@nordquist.info>

Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :)  Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 <http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244> am I missing something. 

What I would want is to be able to grant the the following calls + maybe a few more. 

The related REST API calls.
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesets.htm <https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesets.htm>
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesetlink.htm <https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesetlink.htm>

Russell


> On Mar 3, 2021, at 3:06 AM, Loic Tortay <tortay at cc.in2p3.fr> wrote:
> 
> On 02/03/2021 20:31, Russell Nordquist wrote:
>> Hi all
>> We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?.
> Hello,
> We have a quota management delegation (CLI) tool that allows "power-users" (PI and such) to create and remove filesets and manage users quotas for the groups/projects they're heading.
> 
> Like someone else said, from their point of view they're just directories, so they create a "directory with quotas".
> In our experience, "directories with quotas" are the most convenient way for end-users to understand and use quotas.
> 
> This is a tool written in C, about 13 years ago, using the GPFS API (and a few calls to GPFS commands where there is no API or it's lacking).
> 
> Delegation authorization (identifying "power-users") is external to the tool.
> 
> Permissions & ACLs are also set on the junction when a fileset is created so that it's both immediately usable ("instant processing") and accessible to "power-users" (for space management purposes).
> 
> There are extra features for staff to allow higher-level operations (e.g. create an independent fileset for a group/project, change the group/project quotas, etc.)
> 
> The dated looking user documentation is https://ccspsmon.in2p3.fr/spsquota.html
> 
> Both the tool and the documentation have a few site-specific things, so it's not open-source (and it has become a "legacy" tool in need of a rewrite/refactoring).
> 
> 
> Lo?c.
> -- 
> |   Lo?c Tortay <tortay at cc.in2p3.fr>  -     IN2P3 Computing Centre     |
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210303/ea0d47bb/attachment-0002.htm>

From robert.horton at icr.ac.uk  Thu Mar  4 09:51:45 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Thu, 4 Mar 2021 09:51:45 +0000
Subject: [gpfsug-discuss] Self service creation of filesets
In-Reply-To: <EA2BB7A5-C44F-4F65-BBD6-98926ABC906D@nordquist.info>
References: <C25D9C64-3FB4-4C92-B050-181D1D187A98@nordquist.info>
	<2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr>
	<EA2BB7A5-C44F-4F65-BBD6-98926ABC906D@nordquist.info>
Message-ID: <566f81f3bfd243f1b0258562b627e4e1b6869863.camel@icr.ac.uk>

On Wed, 2021-03-03 at 12:14 -0500, Russell Nordquist wrote:
CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe.

Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :)  Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 am I missing something.


That reminds me... We use a Python wrapper around the REST API to monitor usage against fileset quotas etc. In principle this will also set quotas (and create filesets) but it means giving it storage administrator access. It would be nice if the GUI had sufficiently fine grained permissions that you could set quotas without being able to delete the filesystem.

Rob

--

Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk | Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210304/7f0388dd/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Fri Mar  5 10:04:22 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 5 Mar 2021 10:04:22 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
Message-ID: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>


I am seeing that whenever I try and restore a file with an ACL I get the 
a ANS1589W error in /var/log/dsmerror.log

ANS1589W Unable to write extended attributes for ****** due to errno: 
13, reason: Permission denied

But bizarrely the ACL is actually restored. At least as far as I can 
tell. This is the 8.1.11-0 TSM client with GPFS version 5.0.5-1 against 
a 8.1.10-0 TSM server. Running on RHEL 7.7 to match the DSS-G 2.7b 
install. The backup node makes the third quorum node for the cluster 
being as that it runs genuine RHEL (unlike all the compute nodes which 
are running CentOS).

Googling I can't find any references to this being fixed in a later 
version of the GPFS software, though being on RHEL7 and it's derivatives 
I am stuck on 5.0.5

Surely root has permissions to write the extended attributes for anyone? 
It would seem perverse if you have to be the owner of a file to restore 
the ACL's.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From stockf at us.ibm.com  Fri Mar  5 12:15:38 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Fri, 5 Mar 2021 12:15:38 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
Message-ID: <OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210305/3e490cfc/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Fri Mar  5 13:07:56 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 5 Mar 2021 13:07:56 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
	<OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
Message-ID: <d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>

On 05/03/2021 12:15, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> Have you checked to see if Spectrum Protect (TSM) has addressed this 
> problem.? There recently was an issue with Protect and how it used the 
> GPFS API for ACLs.? If I recall Protect was not properly handling a 
> return code.? I do not know if it is relevant to your problem but? it 
> seemed worth mentioning.

As far as I am aware 8.1.11.0 is the most recent version of the Spectrum 
Protect/TSM client. There is nothing newer showing on the IBM FTP site

ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/

Checking on fix central also seems to show that 8.1.11.0 is the latest 
version, and the only fix over 8.1.10.0 is a security update to do with 
the client web user interface.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Renar.Grunenberg at huk-coburg.de  Fri Mar  5 18:06:43 2021
From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar)
Date: Fri, 5 Mar 2021 18:06:43 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
	<OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
	<d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
Message-ID: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>

Hallo All,
thge mentioned problem with protect was this:
https://www.ibm.com/support/pages/node/6415985?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E
Regards Renar


Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:  09561 96-44110
Telefax:  09561 96-44104
E-Mail:   Renar.Grunenberg at huk-coburg.de
Internet: www.huk.de
=======================================================================
HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Thomas Sehn, Daniel Thomas.
=======================================================================
Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden.
=======================================================================

-----Urspr?ngliche Nachricht-----
Von: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Jonathan Buzzard
Gesendet: Freitag, 5. M?rz 2021 14:08
An: gpfsug-discuss at spectrumscale.org
Betreff: Re: [gpfsug-discuss] TSM errors restoring files with ACL's

On 05/03/2021 12:15, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before
> clicking links or attachments.
> Have you checked to see if Spectrum Protect (TSM) has addressed this
> problem.  There recently was an issue with Protect and how it used the
> GPFS API for ACLs.  If I recall Protect was not properly handling a
> return code.  I do not know if it is relevant to your problem but  it
> seemed worth mentioning.

As far as I am aware 8.1.11.0 is the most recent version of the Spectrum
Protect/TSM client. There is nothing newer showing on the IBM FTP site

ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/

Checking on fix central also seems to show that 8.1.11.0 is the latest
version, and the only fix over 8.1.10.0 is a security update to do with
the client web user interface.


JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From stockf at us.ibm.com  Fri Mar  5 19:12:47 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Fri, 5 Mar 2021 19:12:47 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>
References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>,
	<1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk><OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com><d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
Message-ID: <OF08E960CD.84556D49-ON0025868F.00695AC7-0025868F.00698A79@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210305/3911592a/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Fri Mar  5 20:31:54 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 5 Mar 2021 20:31:54 +0000
Subject: [gpfsug-discuss] TSM errors restoring files with ACL's
In-Reply-To: <OF08E960CD.84556D49-ON0025868F.00695AC7-0025868F.00698A79@notes.na.collabserv.com>
References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>
	<1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk>
	<OFDBCB89B9.7199311D-ON0025868F.00432E7A-0025868F.00435973@notes.na.collabserv.com>
	<d57d9f7e-3e66-3c5b-eb14-798ed4e0af33@strath.ac.uk>
	<OF08E960CD.84556D49-ON0025868F.00695AC7-0025868F.00698A79@notes.na.collabserv.com>
Message-ID: <696e96cc-da52-a24f-d53e-6510407e51e7@strath.ac.uk>

On 05/03/2021 19:12, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> I was referring to this flash, 
> https://www.ibm.com/support/pages/node/6381354?myns=swgtiv&mynp=OCSSEQVQ&mync=E&cm_sp=swgtiv-_-OCSSEQVQ-_-E 
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fpages%2Fnode%2F6381354%3Fmyns%3Dswgtiv%26mynp%3DOCSSEQVQ%26mync%3DE%26cm_sp%3Dswgtiv-_-OCSSEQVQ-_-E&data=04%7C01%7Cjonathan.buzzard%40strath.ac.uk%7C85cd5149f3f745b7137308d8e00ab18d%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C637505683823937774%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2b0LjE1Ycc3DPKto5kUTTLc0u5lG3DycsQGOjs%2BHgtw%3D&reserved=0> 
> 
> Spectrum Protect 8.1.11 client has the fix so this should not be an 
> issue for Jonathan.? Probably best to open a help case against Spectrum 
> Protect and begin the investigation there.
> 

Also the fix is to stop an unchanged file with an ACL from being backed 
up again, but only one more time.

I suspect we where hit with that issue, but given we only have ~90GB of 
files with ACL's on them I would not have noticed. That is significantly 
less than the normal daily churn.

This however is an issue with the *restore*.

Everything looks to get restored correctly even the ACL's. At the end of 
the restore all looks good given the headline report from dsmc. However 
there are ANS1589W warnings in dsmerror.log and dsmc exits with an error 
code of 8 rather than zero.

Will open a case against Spectrum Protect on Monday. I am pretty 
confident the warnings are false. The current plan is to do carefully 
curated hand restores of the three afflicted users when the rest of the 
restore if finished to double check the ACL's are the only issue.

Quite how the Spectrum Protect team have missed this bug is beyond me. 
Do they not have some unit tests to check this stuff before pushing out 
updates. I know in the past it worked, though that was many years ago 
now. However I restored many TB of data from backup with ACL's on them.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From Robert.Oesterlin at nuance.com  Mon Mar  8 14:49:59 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 8 Mar 2021 14:49:59 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
Message-ID: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com>

Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance:

file1.py -> /fs1/patha/pathb/file1.py (I want to include these)
file2.py -> /fs2/patha/pathb/file2.py (exclude these)

The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution?


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/9fe0a20a/attachment-0002.htm>

From stockf at us.ibm.com  Mon Mar  8 15:29:42 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 8 Mar 2021 15:29:42 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com>
References: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com>
Message-ID: <OF5A646652.698C93DD-ON00258692.00550C1D-00258692.00551E23@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/9eb9a52a/attachment-0002.htm>

From Robert.Oesterlin at nuance.com  Mon Mar  8 15:34:21 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 8 Mar 2021 15:34:21 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
Message-ID: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>

Well - the case here is that the file system has, let?s say, 100M files. Some percentage of these are sym-links to a location that?s not in this file system. I want a report of all these off file system links. However, not all of the sym-links off file system are of interest, just some of them.

I can?t say for sure where in the file system they are (and I don?t care).


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Frederick Stock <stockf at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, March 8, 2021 at 9:29 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Policy scan of symbolic links with contents?

CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________
Could you use the PATHNAME LIKE statement to limit the location to the files of interest?

Fred
_______________________________________________________
Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821
stockf at us.ibm.com


----- Original message -----
From: "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Policy scan of symbolic links with contents?
Date: Mon, Mar 8, 2021 10:12 AM


Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance:


file1.py -> /fs1/patha/pathb/file1.py (I want to include these)

file2.py -> /fs2/patha/pathb/file2.py (exclude these)


The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution?


Bob Oesterlin

Sr Principal Storage Engineer, Nuance


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=i6m1zVXf4peZo0yo02IiRaQ_pUX95MN3wU53M0NiWcI&s=z-ibh2kAPHbehAsrGavNIg2AJdXmHkpUwy5YhZfUbpc&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/f62a22b8/attachment-0002.htm>

From stockf at us.ibm.com  Mon Mar  8 16:07:48 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 8 Mar 2021 16:07:48 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
References: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
Message-ID: <OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210308/42133a8c/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Mon Mar  8 20:45:05 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 8 Mar 2021 20:45:05 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>
References: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
	<OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>
Message-ID: <ac22d795-833b-3f8a-9397-70609c20d3fe@strath.ac.uk>

On 08/03/2021 16:07, Frederick Stock wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> Presumably the only feature that would help here is if policy could 
> determine that the end location pointed to by a symbolic link is within 
> the current file system.? I am not aware of any such feature or 
> attribute which policy could check so I think all you can do is run 
> policy to find the symbolic links and then check each link to see if it 
> points into the same file system.? You might find the mmfind command 
> useful for this purpose.? I expect it would eliminate the need to create 
> a policy to find the symbolic links.
> 

Unless you are using bind mounts if the symbolic link points outside the 
mount point of the file system it is not within the current file system.

So noting that you can write very SQL like statements something like the 
following should in theory do it

RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND 
SUBSTR(PATH_NAME,0,4)='/fs1/'

Note the above is not checked in any way shape or form for working. Even 
if you do have bind mounts of other GPFS file systems you just need a 
more complicated WHERE statement.

When doing policy engine stuff I find having that section of the GPFS 
manual printed out and bound, along with an SQL book for reference is 
very helpful.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Mon Mar  8 21:00:04 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 8 Mar 2021 21:00:04 +0000
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <ac22d795-833b-3f8a-9397-70609c20d3fe@strath.ac.uk>
References: <A8618173-3171-4952-B24F-7B6C86650E9F@nuance.com>
	<OFE8118186.F0EFD510-ON00258692.00585DD4-00258692.00589AF5@notes.na.collabserv.com>
	<ac22d795-833b-3f8a-9397-70609c20d3fe@strath.ac.uk>
Message-ID: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>

On 08/03/2021 20:45, Jonathan Buzzard wrote:

[SNIP]

> So noting that you can write very SQL like statements something like the
> following should in theory do it
> 
> RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND
> SUBSTR(PATH_NAME,0,4)='/fs1/'
> 
> Note the above is not checked in any way shape or form for working. Even
> if you do have bind mounts of other GPFS file systems you just need a
> more complicated WHERE statement.

Duh, of course as soon as I sent it, I realized there is a missing SHOW

RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' 
AND SUBSTR(PATH_NAME,0,4)='/fs1/'

You could replace the SUBSTR with a REGEX if you prefer


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From ulmer at ulmer.org  Mon Mar  8 22:33:38 2021
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Mon, 8 Mar 2021 17:33:38 -0500
Subject: [gpfsug-discuss] Policy scan of symbolic links with contents?
In-Reply-To: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>
References: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>
Message-ID: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org>

Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood).

-- 
Stephen Ulmer

Sent from a mobile device; please excuse auto-correct silliness.

> On Mar 8, 2021, at 3:34 PM, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> ?On 08/03/2021 20:45, Jonathan Buzzard wrote:
> 
> [SNIP]
> 
>> So noting that you can write very SQL like statements something like the
>> following should in theory do it
>> RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND
>> SUBSTR(PATH_NAME,0,4)='/fs1/'
>> Note the above is not checked in any way shape or form for working. Even
>> if you do have bind mounts of other GPFS file systems you just need a
>> more complicated WHERE statement.
> 
> Duh, of course as soon as I sent it, I realized there is a missing SHOW
> 
> RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/'
> 
> You could replace the SUBSTR with a REGEX if you prefer
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Robert.Oesterlin at nuance.com  Tue Mar  9 12:25:56 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Tue, 9 Mar 2021 12:25:56 +0000
Subject: [gpfsug-discuss] [EXTERNAL] Re: Policy scan of symbolic links
 with contents?
In-Reply-To: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org>
References: <ff64efb5-63b1-e9f6-f1e2-6fba540d055c@strath.ac.uk>
	<23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org>
Message-ID: <3B0AD02E-335F-4540-B109-EC5301C3188A@nuance.com>

RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L'
AND SUBSTR(PATH_NAME,0,4)='/fs1/'

In this case PATH_NAME is the path within the GPFS file system, not the target of the link, correct? That's not what I want. I want the path of the *link target*.
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance


?On 3/8/21, 4:41 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Stephen Ulmer" <gpfsug-discuss-bounces at spectrumscale.org on behalf of ulmer at ulmer.org> wrote:

    CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.

    ----------------------------------------------------------------------
    Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood).

    
From bill.burke.860 at gmail.com  Wed Mar 10 02:19:02 2021
From: bill.burke.860 at gmail.com (William Burke)
Date: Tue, 9 Mar 2021 21:19:02 -0500
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
Message-ID: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>

 I would like to know what files were modified/created/deleted (only for
the current day) on the GPFS's file system so that I could rsync ONLY those
files to a predetermined external location. I am running GPFS 4.2.3.9

Is there a way to access the GPFS's metadata directly so that I do not have
to traverse the filesystem looking for these files? If i use the rsync tool
it will scan the file system which is 400+ million files.  Obviously this
will be problematic to complete a scan in a day, if it would ever complete
single-threaded. There are tools or scripts that run multithreaded rsync
but it's still a brute force attempt. and it would be nice to know where
the delta of files that have changed.

I began looking at Spectrum Scale Data Management (DM) API but I am not
sure if this is the best approach to looking at the GPFS metadata - inodes,
modify times, creation times, etc.


-- 

Best Regards,

William Burke (he/him)
Lead HPC Engineer
Advance Research Computing
860.255.8832 m | LinkedIn <http://LinkedIn.com/in/billcburke>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210309/3dfbe52a/attachment-0002.htm>

From novosirj at rutgers.edu  Wed Mar 10 02:21:54 2021
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Wed, 10 Mar 2021 02:21:54 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
Message-ID: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>

Yup, you want to use the policy engine:

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync).

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> 
>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> 
> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed.
> 
> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc.
> 
> 
> 
> -- 
> 
> Best Regards,
> 
> William Burke (he/him)
> Lead HPC Engineer
> Advance Research Computing
> 860.255.8832 m | LinkedIn
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From anacreo at gmail.com  Wed Mar 10 02:59:18 2021
From: anacreo at gmail.com (Alec)
Date: Tue, 9 Mar 2021 18:59:18 -0800
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
	<05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
Message-ID: <CAGhSTwiQCiiagiEPh12pPu6Dp3oRTVWqwKz9ChvO9FYNomh8vQ@mail.gmail.com>

You would definitely be able to search by inode creation date and find the
files you want... our 1.25m file filesystem takes about 47 seconds to
query...  One thing I would worry about though is inode deletion and
inter-fileset file moves.   The SQL based engine wouldn't be able to
identify those changes and so you'd not be able to replicate deletes and
such.

Alternatively....
I have a script that runs in about 4 minutes and it pulls all the data out
of the backup indexes, and compares the pre-built hourly file index on our
system and identifies files that don't exist in the backup, so I have a
daily backup validation...  I filter the file list using ksh's printf date
manipulation to filter out files that are less than 2 days old, to reduce
the noise.  A modification to this could simply compare a daily file index
with the previous day's index, and send rsync a list of files (existing or
deleted) based on just a delta of the two indexes (sort|diff), then you
could properly account for all the changes.  If you don't care about file
modifications just produce both lists based on creation time instead of
modification time.  The mmfind command or GPFS policy engine should be able
to produce a full file list/index very rapidly.

In another thread there was a conversation with ACL's...  I don't think our
backup system backs up ACL's so I just have GPFS produce a list of all ACL
applied objects on the daily, and have a script that just makes a null
delimited backup file of every single ACL on our file system... and have a
script to apply the ACL's as a "restore".  It's a pretty simple thing to
write-up and keeping 90 day history on this lets me compare the ACL
evolution on a file very easily.

Alec

MVH
Most Victorious Hunting
(Why should Scandinavians own this cool sign off)

On Tue, Mar 9, 2021 at 6:22 PM Ryan Novosielski <novosirj at rutgers.edu>
wrote:

> Yup, you want to use the policy engine:
>
>
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>
> Something in here ought to help. We do something like this (but I?m
> reluctant to provide examples as I?m actually suspicious that we don?t have
> it quite right and are passing far too much stuff to rsync).
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS,     |---------------------------*O*---------------------------
> ||_// the State  |         Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
>      `'
>
> > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com>
> wrote:
> >
> >  I would like to know what files were modified/created/deleted (only for
> the current day) on the GPFS's file system so that I could rsync ONLY those
> files to a predetermined external location. I am running GPFS 4.2.3.9
> >
> > Is there a way to access the GPFS's metadata directly so that I do not
> have to traverse the filesystem looking for these files? If i use the rsync
> tool it will scan the file system which is 400+ million files.  Obviously
> this will be problematic to complete a scan in a day, if it would ever
> complete single-threaded. There are tools or scripts that run multithreaded
> rsync but it's still a brute force attempt. and it would be nice to know
> where the delta of files that have changed.
> >
> > I began looking at Spectrum Scale Data Management (DM) API but I am not
> sure if this is the best approach to looking at the GPFS metadata - inodes,
> modify times, creation times, etc.
> >
> >
> >
> > --
> >
> > Best Regards,
> >
> > William Burke (he/him)
> > Lead HPC Engineer
> > Advance Research Computing
> > 860.255.8832 m | LinkedIn
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210309/3dd0f70a/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Wed Mar 10 15:15:58 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 10 Mar 2021 15:15:58 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <CAGhSTwiQCiiagiEPh12pPu6Dp3oRTVWqwKz9ChvO9FYNomh8vQ@mail.gmail.com>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
	<05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
	<CAGhSTwiQCiiagiEPh12pPu6Dp3oRTVWqwKz9ChvO9FYNomh8vQ@mail.gmail.com>
Message-ID: <641ea714-579b-1d74-4b86-d0e0b2e8e9c3@strath.ac.uk>

On 10/03/2021 02:59, Alec wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> You would definitely be able to search by inode creation date and find 
> the files you want... our 1.25m file filesystem takes about 47 seconds 
> to query...? One thing I would worry about though is inode deletion and 
> inter-fileset file moves.? ?The SQL based engine wouldn't be able to 
> identify those changes and so you'd not be able to replicate deletes and 
> such.
> 

This is the problem with rsync "backups", you need to run it with 
--delete otherwise any restore will "upset" your users as they find 
large numbers of file they had deleted unhelpfully "restored"

> Alternatively....
> I have a script that runs in about 4 minutes and it pulls all the data 
> out of the backup indexes, and compares the pre-built hourly file index 
> on our system and identifies files that don't exist in the backup, so I 
> have a daily backup validation...? I filter the file list using 
> ksh's?printf date manipulation to filter out files that are less than 2 
> days old, to reduce the noise.? A modification to this could simply 
> compare a daily file index with the previous day's index, and send rsync 
> a list of files (existing or deleted) based on just a delta of the two 
> indexes (sort|diff), then you could properly account for all the 
> changes.? If you don't care about file modifications just produce both 
> lists based on creation time instead of modification time.? The mmfind 
> command or GPFS policy engine should be able to produce a full file 
> list/index very rapidly.
> 

My view would be somewhere along the lines of this is a lot of work and 
if you have the space to rsync your GPFS file system to, presumably with 
a server attached to said storage then for under 500 PVU of Spectrum 
Protect licensing you can have a fully supported client/server Spectrum 
Protect/TSM backup solution and just use mmbackup.

You need to play the game and use older hardware ;-) I use an ancient 
pimped out Dell PowerEdge R300 as my TSM client node. Why this old, well 
it has a dual core Xeon E3113 for only 100 PVU. Anything newer would be 
quad core and 70 PVU per core which would cost an additional ~$1000 in 
licensing.

If it breaks down they are under $100 on eBay. It's never skipped a beat 
and I have just finished a complete planned restore of our DSS-G using it.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Wed Mar 10 19:09:13 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 10 Mar 2021 19:09:13 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
References: <CAN2fJsy9PhhVy82g71Qc96sxUNmowQNpJsiyUg_jKc8yRQ4j6Q@mail.gmail.com>
	<05584755-C218-46D9-93C4-0373C2045589@rutgers.edu>
Message-ID: <CFFD9E75-92C8-428A-8B34-660C32562E47@bham.ac.uk>

I was looking for the original source for this, but it was on dev works ... which is now dead.

But you can use something like:

tsbuhelper clustermigdiff \
$migratePath/.mmmigrateCfg/mmmigrate.list.v${prevFileCount}.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.latest.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.changed.v${fileCount}.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.deleted.v${fileCount}.filelist

"mmmigrate.list.latest.filelist" would be the output of a policyscan of your files today
"mmmigrate.list.v${prevFileCount}.filelist" is yesterday's policyscan

This then generates the changed and deleted list of files for you. tsbuhelper is what is used internally in mmbackup, though is not very documented...

We actually used something along these lines to support migrating between file-systems (generate daily diffs and sync those). The policy scan uses:

RULE EXTERNAL LIST 'latest.filelist' EXEC '' \
 RULE 'FilesToMigrate' LIST 'latest.filelist' DIRECTORIES_PLUS \
 SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || \
 VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || \
 ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' \
 WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' \
 ELSE 'resdnt' END )) \
 WHERE \
 ( \
 NOT \
 ( (PATH_NAME LIKE '/%/.mmbackup%') OR \
 (PATH_NAME LIKE '/%/.mmmigrate%') OR \
 (PATH_NAME LIKE '/%/.afm%') OR \
 (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR \
 (PATH_NAME LIKE '/%/.mmLockDir/%') OR \
 (MODE LIKE 's%') \
 ) \
 ) \
 AND \
 (MISC_ATTRIBUTES LIKE '%u%') \
 AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) \
 AND (NOT (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.SpaceMan/%'))

On our file-system, both the scan and diff took a long time (hours), but hundreds of millions of files.

This comes with no warranty ...

We don't use this for backup, Spectrum Protect and mmbackup are our friends ...

Simon

?On 10/03/2021, 02:22, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski" <gpfsug-discuss-bounces at spectrumscale.org on behalf of novosirj at rutgers.edu> wrote:

    Yup, you want to use the policy engine:

    https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

    Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync).

    --
    #BlackLivesMatter
    ____
    || \\UTGERS,  	 |---------------------------*O*---------------------------
    ||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
    || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
    ||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
         `'

    > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
    > 
    >  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
    > 
    > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed.
    > 
    > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc.
    > 
    > 
    > 
    > -- 
    > 
    > Best Regards,
    > 
    > William Burke (he/him)
    > Lead HPC Engineer
    > Advance Research Computing
    > 860.255.8832 m | LinkedIn
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From enrico.tagliavini at fmi.ch  Thu Mar 11 09:22:46 2021
From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico)
Date: Thu, 11 Mar 2021 09:22:46 +0000
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
References: <8d58f5c6c8ee4f44a5e09c4f9e3a6dac@ex2013mbx2.fmi.ch>
Message-ID: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>

Hello William,

I've got your email forwarded my another user and I decided to subscribe to give you my two cents.

I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.

DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
enough not to go that route.

What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
the ctime changes in the last couple of days (to update metadata info).

Good luck.
Kind regards.

-- 

Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


-------- Forwarded Message --------
> 
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org?<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
> Sent: Wednesday, March 10, 2021 3:22 AM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
> 
> Yup, you want to use the policy engine:
> 
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
> 
> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
> don?t have it quite right and are passing far too much stuff to rsync).
> 
> --
> #BlackLivesMatter
> ____
> > > \\UTGERS,?? |---------------------------*O*---------------------------
> > > _// the State |???????? Ryan Novosielski - novosirj at rutgers.edu
> > > \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> > > ?\\??? of NJ | Office of Advanced Research Computing - MSB C630, Newark
> ???? `'
> 
> > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> > 
> > ?I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
> > could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> > 
> > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
> > i use the rsync tool it will scan the file system which is 400+ million files.? Obviously this will be problematic to complete a
> > scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
> > brute force attempt. and it would be nice to know where the delta of files that have changed.
> > 
> > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
> > metadata - inodes, modify times, creation times, etc.
> > 
> > 
> > 
> > --
> > 
> > Best Regards,
> > 
> > William Burke (he/him)
> > Lead HPC Engineer
> > Advance Research Computing
> > 860.255.8832 m | LinkedIn
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From ulmer at ulmer.org  Thu Mar 11 13:17:30 2021
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Thu, 11 Mar 2021 08:17:30 -0500
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>
References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>
Message-ID: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org>

I?m going to ask what may be a dumb question:

Given that you have GPFS on both ends, what made you decide to NOT use AFM?

 -- 
Stephen


> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ?Hello William,
> 
> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
> 
> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
> 
> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
> enough not to go that route.
> 
> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
> the ctime changes in the last couple of days (to update metadata info).
> 
> Good luck.
> Kind regards.
> 
> -- 
> 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> -------- Forwarded Message --------
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
>> Sent: Wednesday, March 10, 2021 3:22 AM
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
>> 
>> Yup, you want to use the policy engine:
>> 
>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>> 
>> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
>> don?t have it quite right and are passing far too much stuff to rsync).
>> 
>> --
>> #BlackLivesMatter
>> ____
>>>> \\UTGERS,   |---------------------------*O*---------------------------
>>>> _// the State |         Ryan Novosielski - novosirj at rutgers.edu
>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>>>  \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
>>      `'
>> 
>>>> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
>>> 
>>>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
>>> 
>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
>>> i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
>>> brute force attempt. and it would be nice to know where the delta of files that have changed.
>>> 
>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
>>> metadata - inodes, modify times, creation times, etc.
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best Regards,
>>> 
>>> William Burke (he/him)
>>> Lead HPC Engineer
>>> Advance Research Computing
>>> 860.255.8832 m | LinkedIn
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/410f7c04/attachment-0002.htm>

From enrico.tagliavini at fmi.ch  Thu Mar 11 13:24:47 2021
From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico)
Date: Thu, 11 Mar 2021 13:24:47 +0000
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org>
References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch>
	<9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org>
Message-ID: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>

Hello Stephen,

actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.

The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.

Kind regards.


--


Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
I?m going to ask what may be a dumb question:

Given that you have GPFS on both ends, what made you decide to NOT use AFM?

 --
Stephen


On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:

?Hello William,

I've got your email forwarded my another user and I decided to subscribe to give you my two cents.

I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.

DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
enough not to go that route.

What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
the ctime changes in the last couple of days (to update metadata info).

Good luck.
Kind regards.

--

Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


-------- Forwarded Message --------

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
Sent: Wednesday, March 10, 2021 3:22 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync

Yup, you want to use the policy engine:

https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
don?t have it quite right and are passing far too much stuff to rsync).

--
#BlackLivesMatter
____
\\UTGERS,   |---------------------------*O*---------------------------
_// the State |         Ryan Novosielski - novosirj at rutgers.edu
\\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
 \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
     `'

On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:

 I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9

Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
brute force attempt. and it would be nice to know where the delta of files that have changed.

I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
metadata - inodes, modify times, creation times, etc.


--

Best Regards,

William Burke (he/him)
Lead HPC Engineer
Advance Research Computing
860.255.8832 m | LinkedIn
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/36d19e4a/attachment-0002.htm>

From ulmer at ulmer.org  Thu Mar 11 13:47:44 2021
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Thu, 11 Mar 2021 08:47:44 -0500
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
Message-ID: <AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>

Thank you! Would you mind letting me know in what era you made your evaluation?

I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 -- 
Stephen


> On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ?
> Hello Stephen,
> 
> actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.
> 
> The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.
> 
> Kind regards.
> 
>  -- 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
>> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
>> I?m going to ask what may be a dumb question:
>> 
>> Given that you have GPFS on both ends, what made you decide to NOT use AFM?
>> 
>>  -- 
>> Stephen
>> 
>> 
>>> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
>>> 
>>> ?Hello William,
>>> 
>>> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
>>> 
>>> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
>>> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
>>> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
>>> 
>>> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
>>> enough not to go that route.
>>> 
>>> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
>>> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
>>> the ctime changes in the last couple of days (to update metadata info).
>>> 
>>> Good luck.
>>> Kind regards.
>>> 
>>> -- 
>>> 
>>> Enrico Tagliavini
>>> Systems / Software Engineer
>>> 
>>> enrico.tagliavini at fmi.ch
>>> 
>>> Friedrich Miescher Institute for Biomedical Research
>>> Infomatics
>>> 
>>> Maulbeerstrasse 66
>>> 4058 Basel
>>> Switzerland
>>> 
>>> 
>>> 
>>> 
>>> -------- Forwarded Message --------
>>>> 
>>>> -----Original Message-----
>>>> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
>>>> Sent: Wednesday, March 10, 2021 3:22 AM
>>>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
>>>> 
>>>> Yup, you want to use the policy engine:
>>>> 
>>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>>>> 
>>>> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
>>>> don?t have it quite right and are passing far too much stuff to rsync).
>>>> 
>>>> --
>>>> #BlackLivesMatter
>>>> ____
>>>>>> \\UTGERS,   |---------------------------*O*---------------------------
>>>>>> _// the State |         Ryan Novosielski - novosirj at rutgers.edu
>>>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>>>>>  \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
>>>>      `'
>>>> 
>>>>>> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
>>>>> 
>>>>>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
>>>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
>>>>> 
>>>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
>>>>> i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
>>>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
>>>>> brute force attempt. and it would be nice to know where the delta of files that have changed.
>>>>> 
>>>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
>>>>> metadata - inodes, modify times, creation times, etc.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best Regards,
>>>>> 
>>>>> William Burke (he/him)
>>>>> Lead HPC Engineer
>>>>> Advance Research Computing
>>>>> 860.255.8832 m | LinkedIn
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at spectrumscale.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>> 
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/b1e24f88/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Thu Mar 11 14:20:05 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 11 Mar 2021 14:20:05 +0000
Subject: [gpfsug-discuss] Synchronization/Restore of file systems
Message-ID: <f71b5cf2-fe87-121b-1df3-ab3b72dfed47@strath.ac.uk>


As promised last year I having just completed a storage upgrade, I have 
sanitized my scripts and put them up on Github for other people to have 
a look at the methodology I use in these sorts of scenarios.

This time the upgrade involved pulling out all the existing disks and 
fitting large ones then restoring from backup, rather than synchronizing 
to a new system, but the principles are the same.

Bear in mind the code is written in Perl because it's history is ancient 
now and with few opportunities to test it in anger, rewriting it in the 
latest fashionable scripting language is unappealing.

     https://github.com/digitalcabbage/syncrestore

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From enrico.tagliavini at fmi.ch  Thu Mar 11 14:24:43 2021
From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico)
Date: Thu, 11 Mar 2021 14:24:43 +0000
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
	<AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
Message-ID: <c86f822581dcec1cc9c4d68e7448bcbfc5154ad0.camel@fmi.ch>

We evaluated AFM multiple times. The first time was in 2017 with Spectrum Scale 4.2 . When we switched to Spectrum Scale 5 not long ago we also re-evaluated AFM.

The horror stories about data loss are becoming more rare with modern setups, especially in the non DR case scenario. However AFM is still a very complicated tool, way to complicated if what you are looking for is a "simple" rsync style backup (but faster). The 3000+ pages of documentation for GPFS do not help our small team and many of those pages are dedicated to just AFM.

The performance problem is also still a real issue with modern versions as far as I was told. We can have a quite erratic data turnover in our setup, tied to very big scientific instruments capable of generating many TB of data per hour. Having good performance is important. I used the same tool we use for backups also to migrate the data from the old storage to the new storage (and from GPFS 4 to GPFS 5), and I managed to reach speeds of 17 - 19 GB / s data transfer (when hitting big files that is) using only two servers equipped with Infiniband EDR. I made a simple script to parallelize rsync to make it faster: https://github.com/fmi-basel/splitrsync . Combined with another program using the policy engine to generate the file list to avoid the painful crawling.

As I said we are a small team, so we have to be efficient. Developing that tool costed me time, but the ROI is there as I can use the same tool with non GPFS powered storage system, and we had many occasions where this was the case, for example when moving data from old system to be decommissioned to the GPFS storage.

And I would like to finally mention another hot topic: who says we will be on GPFS forever? The recent licensing change would probably destroy our small IT budget and we would not be able to afford Spectrum Scale any longer. We might be forced to switch to a cheaper solution. At least this way we can carry some of the code we wrote with us. With AFM we would have to start from scratch. Originally we were not really planning to move as we didn't expect this change in licensing with the associated increased cost. But now, this turns out to be a small time saver if we indeed have to switch.

Kind regards.


--


Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland


On Thu, 2021-03-11 at 08:47 -0500, Stephen Ulmer wrote:
Thank you! Would you mind letting me know in what era you made your evaluation?

I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:

?
Hello Stephen,

actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.

The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.

Kind regards.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/fe9b528f/attachment-0002.htm>

From sadaniel at us.ibm.com  Thu Mar 11 16:08:11 2021
From: sadaniel at us.ibm.com (Steven Daniels)
Date: Thu, 11 Mar 2021 09:08:11 -0700
Subject: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
In-Reply-To: <AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
	<AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
Message-ID: <OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404@notes.na.collabserv.com>

Also, be aware there have been massive improvements in AFM, in terms of
usability, reliablity and performance.

I just completed a project where we moved about 3/4 PB during 7x24
operations to retire a very old storage system (1st Gen IBM GSS) to a new
ESS. We were able to get considerable performance but not without effort,
it allowed the client to continue operations and migrate to new hardware
seamlessly.

The new v5.1 AFM feature supports filesystem level AFM which would have
greatly simplified the effort and I believe will make AFM vastly easier to
implement in the general case.

I'll leave it to Venkat and others on the development team to share more
details about improvements.


Steven A. Daniels
Cross-brand Client Architect
Senior Certified IT Specialist
National Programs
Fax and Voice: 3038101229
sadaniel at us.ibm.com
http://www.ibm.com


From:	Stephen Ulmer <ulmer at ulmer.org>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:	bill.burke.860 at gmail.com
Date:	03/11/2021 06:47 AM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with
            Rsync
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Thank you! Would you mind letting me know in what era you made your
evaluation?

I?m not suggesting you should change anything at all, but when I make
recommendations for my own customers I like to be able to associate the
level of GPFS with the anecdotes. I view the software as more of a stream
of features and capabilities than as a set product.

Different clients have different requirements, so every implementation
could be different. When I add someone else?s judgement to my own, I just
like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


      On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico
      <enrico.tagliavini at fmi.ch> wrote:

      ?
      Hello Stephen,

      actually not a dumb question at all. We evaluated AFM quite a bit
      before turning it down.

      The horror stories about it and massive data loss are too scary. Plus
      we had actual reports of very bad performance. Personally I think AFM
      is very complicated, overcomplicated for what we need. We need the
      data safe, we don't need active / active DR or anything like that.
      While AFM can technically do what we need the complexity of its
      design makes it too easy to make a mistake and cause a service
      disruption or, even worst, data loss. We are a very small institute
      with a small IT team, so investing time in making it right was also
      not really worth it due to the high TCO.

      Kind regards.

      --
      Enrico Tagliavini
      Systems / Software Engineer

      enrico.tagliavini at fmi.ch

      Friedrich Miescher Institute for Biomedical Research
      Infomatics

      Maulbeerstrasse 66
      4058 Basel
      Switzerland


      On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
        I?m going to ask what may be a dumb question:

        Given that you have GPFS on both ends, what made you decide to NOT
        use AFM?

         --
        Stephen


         On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico
         <enrico.tagliavini at fmi.ch> wrote:

         ?Hello William,

         I've got your email forwarded my another user and I decided to
         subscribe to give you my two cents.

         I would like to warn you about the risk of dong what you have in
         mind. Using the GPFS policy engine to get a list of file to rsync
         is
         easily going to get you with missing data in the backup. The
         problem is that there are cases that are not covered by it. For
         example
         if you mv a folder with a lot of nested subfolders and files none
         of the subfolders would show up in your list of files to be
         updated.

         DM API would be the way to go, as you could replicate the mv on
         the backup side, but you must not miss any event, which scares me
         enough not to go that route.

         What I ended up doing instead: we run GPFS on both side, main and
         backup storage. So I use the policy engine on both sides and just
         build up the differences. We have about 250 million files and this
         is surprisingly fast. On top of that add all the files for which
         the ctime changes in the last couple of days (to update metadata
         info).

         Good luck.
         Kind regards.

         --

         Enrico Tagliavini
         Systems / Software Engineer

         enrico.tagliavini at fmi.ch

         Friedrich Miescher Institute for Biomedical Research
         Infomatics

         Maulbeerstrasse 66
         4058 Basel
         Switzerland


         -------- Forwarded Message --------

           -----Original Message-----
           From: gpfsug-discuss-bounces at spectrumscale.org
           <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan
           Novosielski
           Sent: Wednesday, March 10, 2021 3:22 AM
           To: gpfsug main discussion list
           <gpfsug-discuss at spectrumscale.org>
           Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync

           Yup, you want to use the policy engine:

           https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

           Something in here ought to help. We do something like this (but
           I?m reluctant to provide examples as I?m actually suspicious
           that we
           don?t have it quite right and are passing far too much stuff to
           rsync).

           --
           #BlackLivesMatter
           ____
              \\UTGERS,
              |---------------------------*O*---------------------------
              _// the State |         Ryan Novosielski -
              novosirj at rutgers.edu
              \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
              RBHS Campus
               \\    of NJ | Office of Advanced Research Computing - MSB
              C630, Newark
                `'

            On Mar 9, 2021, at 9:19 PM, William Burke
            <bill.burke.860 at gmail.com> wrote:

             I would like to know what files were modified/created/deleted
            (only for the current day) on the GPFS's file system so that I
            could rsync ONLY those files to a predetermined external
            location. I am running GPFS 4.2.3.9

            Is there a way to access the GPFS's metadata directly so that I
            do not have to traverse the filesystem looking for these files?
            If
            i use the rsync tool it will scan the file system which is 400+
            million files.  Obviously this will be problematic to complete
            a
            scan in a day, if it would ever complete single-threaded. There
            are tools or scripts that run multithreaded rsync but it's
            still a
            brute force attempt. and it would be nice to know where the
            delta of files that have changed.

            I began looking at Spectrum Scale Data Management (DM) API but
            I am not sure if this is the best approach to looking at the
            GPFS
            metadata - inodes, modify times, creation times, etc.


            --

            Best Regards,

            William Burke (he/him)
            Lead HPC Engineer
            Advance Research Computing
            860.255.8832 m | LinkedIn
            _______________________________________________
            gpfsug-discuss mailing list
            gpfsug-discuss at spectrumscale.org
            http://gpfsug.org/mailman/listinfo/gpfsug-discuss

           _______________________________________________
           gpfsug-discuss mailing list
           gpfsug-discuss at spectrumscale.org
           http://gpfsug.org/mailman/listinfo/gpfsug-discuss
         _______________________________________________
         gpfsug-discuss mailing list
         gpfsug-discuss at spectrumscale.org
         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/c7c77d22/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A816397.jpg
Type: image/jpeg
Size: 4919 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/c7c77d22/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/c7c77d22/attachment-0002.gif>

From novosirj at rutgers.edu  Thu Mar 11 16:28:57 2021
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Thu, 11 Mar 2021 16:28:57 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
In-Reply-To: <OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404@notes.na.collabserv.com>
References: <a024602b1c392e5cdb9a2e20ca3c5389062e3ba1.camel@fmi.ch>
	<AB381687-E586-4496-B3B7-FC5A3EBE7F90@ulmer.org>
	<OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404@notes.na.collabserv.com>
Message-ID: <1298DFDD-9701-4FE4-9B06-1541455E0F52@rutgers.edu>

Agreed. Since 5.0.4.1 on the client side (we do rely on it for home directories that are geographically distributed), we have effectively not had any more problems. Our server side are all 5.0.3.2-3. 

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Mar 11, 2021, at 11:08 AM, Steven Daniels <sadaniel at us.ibm.com> wrote:
> 
> Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. 
> 
> I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly.
> 
> The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. 
> 
> I'll leave it to Venkat and others on the development team to share more details about improvements. 
> 
> 
> Steven A. Daniels
> Cross-brand Client Architect
> Senior Certified IT Specialist
> National Programs
> Fax and Voice: 3038101229
> sadaniel at us.ibm.com
> http://www.ibm.com
> <1A816397.jpg>
> 
> <graycol.gif>Stephen Ulmer ---03/11/2021 06:47:59 AM---Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting y
> 
> From:  Stephen Ulmer <ulmer at ulmer.org>
> To:  gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc:  bill.burke.860 at gmail.com
> Date:  03/11/2021 06:47 AM
> Subject:  [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync
> Sent by:  gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Thank you! Would you mind letting me know in what era you made your evaluation?
> 
> I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.
> 
> Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.
> 
> Your original post was very thoughtful, and I appreciate your time.
> 
> -- 
> Stephen
> 
> On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ? 
> Hello Stephen,
> 
> actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.
> 
> The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.
> 
> Kind regards.
> 
> -- 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> 
> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
> I?m going to ask what may be a dumb question:
> 
> Given that you have GPFS on both ends, what made you decide to NOT use AFM?
> 
> --  
> Stephen
> 
> 
> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> ?Hello William,
> 
> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
> 
> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
> 
> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
> enough not to go that route.
> 
> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
> the ctime changes in the last couple of days (to update metadata info).
> 
> Good luck.
> Kind regards.
> 
> -- 
> 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> -------- Forwarded Message --------
> 
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
> Sent: Wednesday, March 10, 2021 3:22 AM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
> 
> Yup, you want to use the policy engine:
> 
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
> 
> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we
> don?t have it quite right and are passing far too much stuff to rsync).
> 
> --
> #BlackLivesMatter
> ____
> \\UTGERS, |---------------------------*O*---------------------------
> _// the State | Ryan Novosielski - novosirj at rutgers.edu
> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
> `'
> 
> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> 
> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> 
> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a
> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
> brute force attempt. and it would be nice to know where the delta of files that have changed.
> 
> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
> metadata - inodes, modify times, creation times, etc.
> 
> 
> 
> --
> 
> Best Regards,
> 
> William Burke (he/him)
> Lead HPC Engineer
> Advance Research Computing
> 860.255.8832 m | LinkedIn
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From honwai.leong at sydney.edu.au  Thu Mar 11 22:28:57 2021
From: honwai.leong at sydney.edu.au (Honwai Leong)
Date: Thu, 11 Mar 2021 22:28:57 +0000
Subject: [gpfsug-discuss] Backing up GPFS with Rsync
Message-ID: <SYBPR01MB477885175569465AF7B15139D1909@SYBPR01MB4778.ausprd01.prod.outlook.com>

This paper might provide some ideas, not the best solution but works fine 

https://github.com/HPCSYSPROS/Workshop20/blob/master/Parallelized_data_replication_of_multi-petabyte_storage_systems/ws_hpcsysp103s1-file1.pdf

It is a two-part workflow to replicate files from production to DR site. It leverages on snapshot ID to determine which files have been updated/modified after a snapshot was taken. It doesn't take care of deletion of files moved from one directory to another, so it uses dsync to take care of that part. 

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of gpfsug-discuss-request at spectrumscale.org
Sent: Friday, March 12, 2021 3:08 AM
To: gpfsug-discuss at spectrumscale.org
Subject: gpfsug-discuss Digest, Vol 110, Issue 20

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
	gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Fwd: FW:  Backing up GPFS with Rsync (Steven Daniels)


----------------------------------------------------------------------

Message: 1
Date: Thu, 11 Mar 2021 09:08:11 -0700
From: "Steven Daniels" <sadaniel at us.ibm.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc: gpfsug-discuss-bounces at spectrumscale.org, bill.burke.860 at gmail.com
Subject: Re: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with Rsync
Message-ID:
	<OF7D742C62.38489C0C-ON00258695.00580EF0-87258695.0058A404 at notes.na.collabserv.com>
	
Content-Type: text/plain; charset="utf-8"

Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance.

I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly.

The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case.

I'll leave it to Venkat and others on the development team to share more details about improvements.


Steven A. Daniels
Cross-brand Client Architect
Senior Certified IT Specialist
National Programs
Fax and Voice: 3038101229
sadaniel at us.ibm.com
https://protect-au.mimecast.com/s/ZnryCr81nyt88D8ZkuztwY-?domain=ibm.com


From:	Stephen Ulmer <ulmer at ulmer.org>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:	bill.burke.860 at gmail.com
Date:	03/11/2021 06:47 AM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Fwd: FW:  Backing up GPFS with
            Rsync
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Thank you! Would you mind letting me know in what era you made your evaluation?

I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


      On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico
      <enrico.tagliavini at fmi.ch> wrote:

      ?
      Hello Stephen,

      actually not a dumb question at all. We evaluated AFM quite a bit
      before turning it down.

      The horror stories about it and massive data loss are too scary. Plus
      we had actual reports of very bad performance. Personally I think AFM
      is very complicated, overcomplicated for what we need. We need the
      data safe, we don't need active / active DR or anything like that.
      While AFM can technically do what we need the complexity of its
      design makes it too easy to make a mistake and cause a service
      disruption or, even worst, data loss. We are a very small institute
      with a small IT team, so investing time in making it right was also
      not really worth it due to the high TCO.

      Kind regards.

      --
      Enrico Tagliavini
      Systems / Software Engineer

      enrico.tagliavini at fmi.ch

      Friedrich Miescher Institute for Biomedical Research
      Infomatics

      Maulbeerstrasse 66
      4058 Basel
      Switzerland


      On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
        I?m going to ask what may be a dumb question:

        Given that you have GPFS on both ends, what made you decide to NOT
        use AFM?

         --
        Stephen


         On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico
         <enrico.tagliavini at fmi.ch> wrote:

         ?Hello William,

         I've got your email forwarded my another user and I decided to
         subscribe to give you my two cents.

         I would like to warn you about the risk of dong what you have in
         mind. Using the GPFS policy engine to get a list of file to rsync
         is
         easily going to get you with missing data in the backup. The
         problem is that there are cases that are not covered by it. For
         example
         if you mv a folder with a lot of nested subfolders and files none
         of the subfolders would show up in your list of files to be
         updated.

         DM API would be the way to go, as you could replicate the mv on
         the backup side, but you must not miss any event, which scares me
         enough not to go that route.

         What I ended up doing instead: we run GPFS on both side, main and
         backup storage. So I use the policy engine on both sides and just
         build up the differences. We have about 250 million files and this
         is surprisingly fast. On top of that add all the files for which
         the ctime changes in the last couple of days (to update metadata
         info).

         Good luck.
         Kind regards.

         --

         Enrico Tagliavini
         Systems / Software Engineer

         enrico.tagliavini at fmi.ch

         Friedrich Miescher Institute for Biomedical Research
         Infomatics

         Maulbeerstrasse 66
         4058 Basel
         Switzerland


         -------- Forwarded Message --------

           -----Original Message-----
           From: gpfsug-discuss-bounces at spectrumscale.org
           <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan
           Novosielski
           Sent: Wednesday, March 10, 2021 3:22 AM
           To: gpfsug main discussion list
           <gpfsug-discuss at spectrumscale.org>
           Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync

           Yup, you want to use the policy engine:

           https://protect-au.mimecast.com/s/5FXFCvl1rKi77y78YhzCNU5?domain=ibm.com

           Something in here ought to help. We do something like this (but
           I?m reluctant to provide examples as I?m actually suspicious
           that we
           don?t have it quite right and are passing far too much stuff to
           rsync).

           --
           #BlackLivesMatter
           ____
              \\UTGERS,
              |---------------------------*O*---------------------------
              _// the State |         Ryan Novosielski -
              novosirj at rutgers.edu
              \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
              RBHS Campus
               \\    of NJ | Office of Advanced Research Computing - MSB
              C630, Newark
                `'

            On Mar 9, 2021, at 9:19 PM, William Burke
            <bill.burke.860 at gmail.com> wrote:

             I would like to know what files were modified/created/deleted
            (only for the current day) on the GPFS's file system so that I
            could rsync ONLY those files to a predetermined external
            location. I am running GPFS 4.2.3.9

            Is there a way to access the GPFS's metadata directly so that I
            do not have to traverse the filesystem looking for these files?
            If
            i use the rsync tool it will scan the file system which is 400+
            million files.  Obviously this will be problematic to complete
            a
            scan in a day, if it would ever complete single-threaded. There
            are tools or scripts that run multithreaded rsync but it's
            still a
            brute force attempt. and it would be nice to know where the
            delta of files that have changed.

            I began looking at Spectrum Scale Data Management (DM) API but
            I am not sure if this is the best approach to looking at the
            GPFS
            metadata - inodes, modify times, creation times, etc.


            --

            Best Regards,

            William Burke (he/him)
            Lead HPC Engineer
            Advance Research Computing
            860.255.8832 m | LinkedIn
            _______________________________________________
            gpfsug-discuss mailing list
            gpfsug-discuss at spectrumscale.org
            https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org

           _______________________________________________
           gpfsug-discuss mailing list
           gpfsug-discuss at spectrumscale.org
           https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
         _______________________________________________
         gpfsug-discuss mailing list
         gpfsug-discuss at spectrumscale.org
         https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://protect-au.mimecast.com/s/uNqKCwV1vMfGGRGxqcKIIVS?domain=urldefense.proofpoint.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://protect-au.mimecast.com/s/bouzCxngwOf11Q1v7TRQ-qb?domain=gpfsug.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1A816397.jpg
Type: image/jpeg
Size: 4919 bytes
Desc: not available
URL: <https://protect-au.mimecast.com/s/MVTSCyojxQTrryro8UA5AGt?domain=gpfsug.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <https://protect-au.mimecast.com/s/D4DACzvkyVCMMmMqkcB4NCX?domain=gpfsug.org>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org


End of gpfsug-discuss Digest, Vol 110, Issue 20
***********************************************


From juergen.hannappel at desy.de  Mon Mar 15 16:20:51 2021
From: juergen.hannappel at desy.de (Hannappel, Juergen)
Date: Mon, 15 Mar 2021 17:20:51 +0100 (CET)
Subject: [gpfsug-discuss] Detecting open files
Message-ID: <1985303510.24419797.1615825251660.JavaMail.zimbra@desy.de>

Hi,
when unlinking filesets that sometimes fails because some open files on that fileset still exist.

Is there a way to find which files are open, and from which node?
Without running a mmdsh -N all lsof  on serveral (big) remote clusters, that is. 

-- 
Dr. J?rgen Hannappel  DESY/IT    Tel.  : +49 40 8998-4616
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1711 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210315/e23fa39c/attachment-0002.bin>

From Robert.Oesterlin at nuance.com  Wed Mar 17 11:59:57 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 17 Mar 2021 11:59:57 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key
 value violates unique constraint
Message-ID: <C774922F-FB37-43C7-A8E3-AE331108D4F2@nuance.com>

Anyone run into this error from the GUI task ?FILESYSTEM_MOUNT? or ideas on how to fix it?

Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 07:55:14.051000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists.

Call getNextException to see other errors in the batch.,Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg5_tools','ems1-hs','RO','2021-03-17 07:55:15.686000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg5_tools) already exists. Call getNextException to see other errors in the batch.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/f11b7bd9/attachment-0002.htm>

From A.Wolf-Reber at de.ibm.com  Wed Mar 17 14:18:56 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 17 Mar 2021 14:18:56 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
In-Reply-To: <C774922F-FB37-43C7-A8E3-AE331108D4F2@nuance.com>
References: <C774922F-FB37-43C7-A8E3-AE331108D4F2@nuance.com>
Message-ID: <OF085714D2.A1FDDAF3-ON0025869B.004E3EEA-0025869B.004EA345@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898090.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898091.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898092.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/ed0d75b0/attachment-0008.png>

From Robert.Oesterlin at nuance.com  Wed Mar 17 14:30:36 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 17 Mar 2021 14:30:36 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
Message-ID: <D28DC646-3309-477B-AA8F-FC0DEC7EC67F@nuance.com>

Can you give me details on how to do this? I tried this:

[root at ess1ems ~]# su postgres -c 'psql -d postgres -c "delete from fscc.filesystem_mounts"'
could not change directory to "/root"
psql: FATAL:  Peer authentication failed for user "postgres"


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Alexander Wolf <A.Wolf-Reber at de.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, March 17, 2021 at 9:19 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint

CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________
This is strange, the Java code should only try to insert rows that are not already there. If it was just the insert for the duplicate row we could ignore it. But this is a batch insert failing and therefore the FILESYSTEM_MOUNTS table does not get updated anymore. A quick fix is to launch the psql client and do a "delete from fscc.filesystem_mounts" to clear the table and run the FILESYSTEM_MOUNT task afterwards to repopulate it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/d2e9f442/attachment-0002.htm>

From A.Wolf-Reber at de.ibm.com  Wed Mar 17 15:09:51 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 17 Mar 2021 15:09:51 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
In-Reply-To: <D28DC646-3309-477B-AA8F-FC0DEC7EC67F@nuance.com>
References: <D28DC646-3309-477B-AA8F-FC0DEC7EC67F@nuance.com>
Message-ID: <OFC1C7EC99.FD4CA2B7-ON0025869B.00516D01-0025869B.00534CC4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898093.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898094.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898095.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/45bedf9d/attachment-0008.png>

From Robert.Oesterlin at nuance.com  Wed Mar 17 15:33:54 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Wed, 17 Mar 2021 15:33:54 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
Message-ID: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com>

The command completed, and I re-ran the FILESYSTEM_MOUNT, but it failed the same way.

[root at ess1ems ~]# psql postgres postgres -c "delete from fscc.filesystem_mounts"
DELETE 20

/usr/lpp/mmfs/gui/cli/runtask FILESYSTEM_MOUNT -debug
10:32 AM
Operation Failed
10:32 AM
Error: debug: locale=en_US
debug: Running 'mmlsmount 'fs1' -Y ' on node localhost
debug: Running 'mmlsmount 'fs2' -Y ' on node localhost
debug: Running 'mmlsmount 'fs3' -Y ' on node localhost
debug: Running 'mmlsmount 'fs4' -Y ' on node localhost
debug: Running 'mmlsmount 'nrg1_tools' -Y ' on node localhost
debug: Running 'mmlsmount 'nrg5_tools' -Y ' on node localhost
err: java.sql.BatchUpdateException: Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 11:32:38.830000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk"
Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists. Call getNextException to see other errors in the batch.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Alexander Wolf <A.Wolf-Reber at de.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, March 17, 2021 at 10:10 AM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint

CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________
I think

    psql postgres postgres -c "delete from fscc.filesystem_mounts"'

ran as root should do the trick.

Mit freundlichen Gr??en / Kind regards

[cid:image001.png at 01D71B19.07732D00]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/aa71a5d5/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1135 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/aa71a5d5/attachment-0002.png>

From A.Wolf-Reber at de.ibm.com  Wed Mar 17 17:05:11 2021
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Wed, 17 Mar 2021 17:05:11 +0000
Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate
 key value violates unique constraint
In-Reply-To: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com>
References: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com>
Message-ID: <OF7AA3CED5.BCAF2104-ON0025869B.005CE45C-0025869B.005DDC1A@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898096.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898097.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0009.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.image001.png at 01D71B19.07732D00.png
Type: image/png
Size: 1135 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0010.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.16159683898098.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210317/1cd6b394/attachment-0011.png>

From robert.horton at icr.ac.uk  Thu Mar 18 15:47:07 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Thu, 18 Mar 2021 15:47:07 +0000
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
Message-ID: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>

Hello,

We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that?

Thanks,
Rob

 Call Trace:
? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26]
 _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26]
 _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca

--

Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk | Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210318/0adafef8/attachment-0002.htm>

From vpuvvada at in.ibm.com  Fri Mar 19 06:32:00 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 19 Mar 2021 12:02:00 +0530
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
Message-ID: <OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>

Robert,

What is the scale version ? This issue may be related to these alerts.

https://www.ibm.com/support/pages/node/6355983
https://www.ibm.com/support/pages/node/6380740

These are the recommended steps to resolve the issue, but need more 
details on the scale version.

1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" 
command.
2. Perform rolling upgrade parallely at both cache and home clusters
    a. All nodes on home cluster to 5.0.5.6
    b. All gateway nodes in cache cluster to 5.0.5.6
 3. At home cluster, for each fileset target path, repeat below steps
      a. Remove .afmctl file
         mmafmlocal rm <fileset target path>/.afm/.afmctl
      b. Enable AFM
         mmafmconfig enable <fileset target path>
4. Start all AFM filesets at cache using "mmafmctl device start -j 
fileset" command. 

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/18/2021 09:17 PM
Subject:        [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,

We've recently started having an issue where processes running in a 
singularity container get stuck in a soft lockup and eventually the node 
needs to be forcibly rebooted. I have included a sample call trace below. 
Additionally, other (non-singularity) processes on other nodes accessing 
the same fileset seem to get into the same state. It's also an AFM IW 
fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? 
I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - 
although fact it seems to spread to other nodes would seem to suggest some 
GPFS involvement. It's possible the user is doing something inadvisable 
with Singularity (it's difficult to work out what's happening in the 
Nextflow pipeline) but even if they are it would be good to find a way of 
preventing them taking nodes down. I'm assuming the AFM is unlikely to be 
relevant - any views on that?

Thanks,
Rob

 Call Trace:
? 
_Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 
[mmfs26]
 
_ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 
[mmfs26]
 
_Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 
[mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? 
_ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 
[mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 
[mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 
_Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 
[mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? 
_Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 
[mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? 
_ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 
[mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | 
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210319/cf4e9cc2/attachment-0002.htm>

From robert.horton at icr.ac.uk  Fri Mar 19 09:42:22 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Fri, 19 Mar 2021 09:42:22 +0000
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
	<OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
Message-ID: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk>

Hi Venkat,

Thanks for getting back to me.

On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 everywhere else, including gateway nodes.
The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the licensing but we're in the process of replacing that system.

The actual AFM seems to be behaving fine though so I'm not sure that's our issue. I guess our next job is to see if we can reproduce it in a non-AFM fileset.

Rob

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe.

Robert,

What is the scale version ? This issue may be related to these alerts.

https://www.ibm.com/support/pages/node/6355983
https://www.ibm.com/support/pages/node/6380740

These are the recommended steps to resolve the issue, but need more details on the scale version.

1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command.
2. Perform rolling upgrade parallely at both cache and home clusters
    a. All nodes on home cluster to 5.0.5.6
    b. All gateway nodes in cache cluster to 5.0.5.6
 3. At home cluster, for each fileset target path, repeat below steps
      a. Remove .afmctl file
         mmafmlocal rm <fileset target path>/.afm/.afmctl
      b. Enable AFM
         mmafmconfig enable <fileset target path>
4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command.

~Venkat (vpuvvada at in.ibm.com)


From:        Robert Horton <robert.horton at icr.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        03/18/2021 09:17 PM
Subject:        [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that?

Thanks,
Rob

 Call Trace:
? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26]
 _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26]
 _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
--
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk>| W www.icr.ac.uk| Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network._______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e=


--

Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk | Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210319/039db1ec/attachment-0002.htm>

From vpuvvada at in.ibm.com  Fri Mar 19 09:50:04 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 19 Mar 2021 15:20:04 +0530
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk><OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
	<459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk>
Message-ID: <OFFEAA8537.FB8BE861-ON6525869D.0035B3A4-6525869D.003605D6@notes.na.collabserv.com>

Hi Robert,

So you might have started seeing problem after upgrading the gateway nodes 
to 5.0.5.2. Upgrading gateway nodes at cache cluster to 5.0.5.6 would 
resolve this problem.

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/19/2021 03:13 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Venkat,

Thanks for getting back to me.

On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 
everywhere else, including gateway nodes.
The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the 
licensing but we're in the process of replacing that system.

The actual AFM seems to be behaving fine though so I'm not sure that's our 
issue. I guess our next job is to see if we can reproduce it in a non-AFM 
fileset.

Rob

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
CAUTION: This email originated from outside of the ICR. Do not click links 
or open attachments unless you recognize the sender's email address and 
know the content is safe.

Robert,

What is the scale version ? This issue may be related to these alerts.

https://www.ibm.com/support/pages/node/6355983
https://www.ibm.com/support/pages/node/6380740

These are the recommended steps to resolve the issue, but need more 
details on the scale version.

1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" 
command.
2. Perform rolling upgrade parallely at both cache and home clusters
    a. All nodes on home cluster to 5.0.5.6
    b. All gateway nodes in cache cluster to 5.0.5.6
 3. At home cluster, for each fileset target path, repeat below steps
      a. Remove .afmctl file
         mmafmlocal rm <fileset target path>/.afm/.afmctl
      b. Enable AFM
         mmafmconfig enable <fileset target path>
4. Start all AFM filesets at cache using "mmafmctl device start -j 
fileset" command. 

~Venkat (vpuvvada at in.ibm.com)


From:        Robert Horton <robert.horton at icr.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:        03/18/2021 09:17 PM
Subject:        [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello,

We've recently started having an issue where processes running in a 
singularity container get stuck in a soft lockup and eventually the node 
needs to be forcibly rebooted. I have included a sample call trace below. 
Additionally, other (non-singularity) processes on other nodes accessing 
the same fileset seem to get into the same state. It's also an AFM IW 
fileset just to add to the complexity ;)

Does anyone have any thoughts on what might be happening / how to proceed? 
I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - 
although fact it seems to spread to other nodes would seem to suggest some 
GPFS involvement. It's possible the user is doing something inadvisable 
with Singularity (it's difficult to work out what's happening in the 
Nextflow pipeline) but even if they are it would be good to find a way of 
preventing them taking nodes down. I'm assuming the AFM is unlikely to be 
relevant - any views on that?

Thanks,
Rob

 Call Trace:
? 
_Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 
[mmfs26]
 
_ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 
[mmfs26]
 
_Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 
[mmfs26]
 ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26]
  ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26]
 ? 
_ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 
[mmfs26]
 ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26]
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 
[mmfs26]
? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
 
_Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 
[mmfs26]
gpfs_i_lookup+0x189/0x3f0 [mmfslinux]
 ? 
_Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 
[mmfs26]
 ? d_alloc_parallel+0x99/0x4a0
 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
 __lookup_slow+0x97/0x150
 lookup_slow+0x35/0x50
  walk_component+0x1bf/0x330
 ? 
_ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 
[mmfs26]
 path_lookupat.isra.49+0x75/0x200
  filename_lookup.part.63+0xa0/0x170
? strncpy_from_user+0x4f/0x1b0
 vfs_statx+0x73/0xe0
  __do_sys_newlstat+0x39/0x70
 ? syscall_trace_enter+0x1d3/0x2c0
 ? __audit_syscall_exit+0x249/0x2a0
  do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk| W www.icr.ac.uk| 
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e=


-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | 
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only. If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=KgYs-kXBKE5JoAaGYRiU9iIxNkJSZeicxpSTmL39_B8&s=6FodZ_EQ8VAOE_xoEkfoUzmJpaiF7bgbERvA9avLZfg&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210319/21e8dcdd/attachment-0002.htm>

From u.sibiller at science-computing.de  Mon Mar 22 09:32:10 2021
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 22 Mar 2021 10:32:10 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
Message-ID: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>

Hello,

we usually create filesets for project dirs and homes.

Unfortunately we have discovered that this convention has been ignored for some dirs and their data 
no resides in the root fileset. We would like to move the data to independent filesets.

Is there a way to do this without having to schedule a downtime for the dirs in question?

I mean, is there a way to transparently move data to an independent fileset at the same path?


Kind regards,

Ulrich Sibiller
-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From janfrode at tanso.net  Mon Mar 22 09:54:28 2021
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Mon, 22 Mar 2021 10:54:28 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
Message-ID: <CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>

No ? all copying between filesets require full data copy. No simple rename.

This might be worthy of an RFE, as it?s a bit unexpected, and could
potentially work more efficiently..


  -jf

man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller <
u.sibiller at science-computing.de>:

> Hello,
>
> we usually create filesets for project dirs and homes.
>
> Unfortunately we have discovered that this convention has been ignored for
> some dirs and their data
> no resides in the root fileset. We would like to move the data to
> independent filesets.
>
> Is there a way to do this without having to schedule a downtime for the
> dirs in question?
>
> I mean, is there a way to transparently move data to an independent
> fileset at the same path?
>
>
> Kind regards,
>
> Ulrich Sibiller
> --
> Science + Computing AG
> Vorstandsvorsitzender/Chairman of the board of management:
> Dr. Martin Matzke
> Vorstand/Board of Management:
> Matthias Schempp, Sabine Hohenstein
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Philippe Miltin
> Aufsichtsrat/Supervisory Board:
> Martin Wibbe, Ursula Morgenstern
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210322/ecf10a03/attachment-0002.htm>

From S.J.Thompson at bham.ac.uk  Mon Mar 22 12:24:59 2021
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 22 Mar 2021 12:24:59 +0000
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
	<CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
Message-ID: <DAA1D98E-D106-4783-8C64-4E495570083B@bham.ac.uk>

You could maybe create the new file-set, link in a different place, copy the data ?

Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially reducing the time to do the copy.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "janfrode at tanso.net" <janfrode at tanso.net>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Monday, 22 March 2021 at 09:54
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Move data to fileset seamlessly

No ? all copying between filesets require full data copy. No simple rename.

This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently..


  -jf

man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller <u.sibiller at science-computing.de<mailto:u.sibiller at science-computing.de>>:
Hello,

we usually create filesets for project dirs and homes.

Unfortunately we have discovered that this convention has been ignored for some dirs and their data
no resides in the root fileset. We would like to move the data to independent filesets.

Is there a way to do this without having to schedule a downtime for the dirs in question?

I mean, is there a way to transparently move data to an independent fileset at the same path?


Kind regards,

Ulrich Sibiller
--
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210322/02a88c29/attachment-0002.htm>

From u.sibiller at science-computing.de  Mon Mar 22 13:20:46 2021
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 22 Mar 2021 14:20:46 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <DAA1D98E-D106-4783-8C64-4E495570083B@bham.ac.uk>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
	<CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
	<DAA1D98E-D106-4783-8C64-4E495570083B@bham.ac.uk>
Message-ID: <a40d9dea-9247-db0d-6a7a-d9e6c8913435@science-computing.de>

On 22.03.21 13:24, Simon Thompson wrote:
> You could maybe create the new file-set, link in a different place, copy the data ?
> 
> Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially 
> reducing the time to do the copy.

Yes, but this does not help if a file is open all the time, e.g. during a long-running job.

Uli
-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From u.sibiller at science-computing.de  Mon Mar 22 13:41:39 2021
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 22 Mar 2021 14:41:39 +0100
Subject: [gpfsug-discuss] Move data to fileset seamlessly
In-Reply-To: <CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de>
	<CAHwPatgjK779oOSzhoLbKXGQfGCczk5BHHTFER0Q+1QFX6rhoA@mail.gmail.com>
Message-ID: <6f626186-cb7a-46d5-781c-8f3a21b7e270@science-computing.de>

On 22.03.21 10:54, Jan-Frode Myklebust wrote:
> No ? all copying between filesets require full data copy. No simple rename.
> 
> This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently..

Yes, your are right. So please vote here:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=149429

Uli

-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From robert.horton at icr.ac.uk  Tue Mar 23 19:02:05 2021
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Tue, 23 Mar 2021 19:02:05 +0000
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk>
	<OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
Message-ID: <a3ed2b7dcfb13def6281a8a8be214c14c68b780b.camel@icr.ac.uk>

Hi,

Sorry for the delay...

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
> 
...
> 1. Stop all AFM filesets at cache using "mmafmctl device stop -j
> fileset" command.
> 2. Perform rolling upgrade parallely at both cache and home clusters
>     a. All nodes on home cluster to 5.0.5.6
>     b. All gateway nodes in cache cluster to 5.0.5.6
>  3. At home cluster, for each fileset target path, repeat below steps
>       a. Remove .afmctl file
>          mmafmlocal rm <fileset target path>/.afm/.afmctl
>       b. Enable AFM

At point 3 I'm getting:

# mmafmlocal rm <target path>/.afm/.afmctl
/bin/rm: cannot remove ?<target path>/.afm/.afmctl?: Operation not permitted

afmconfig disable is the same.

Any idea what the issue is?

Thanks,
Rob
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

From vpuvvada at in.ibm.com  Wed Mar 24 02:36:31 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Wed, 24 Mar 2021 08:06:31 +0530
Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups
In-Reply-To: <a3ed2b7dcfb13def6281a8a8be214c14c68b780b.camel@icr.ac.uk>
References: <f727b539de000e0cd6e15385dc2077fdbcf4610c.camel@icr.ac.uk><OFF5E40683.88A68144-ON6525869D.00211448-6525869D.0023E3BB@notes.na.collabserv.com>
	<a3ed2b7dcfb13def6281a8a8be214c14c68b780b.camel@icr.ac.uk>
Message-ID: <OF3845FB87.4819CCDC-ON652586A2.000E1B29-652586A2.000E5477@notes.na.collabserv.com>

># mmafmlocal rm <target path>/.afm/.afmctl
>/bin/rm: cannot remove ?<target path>/.afm/.afmctl?: Operation not 
permitted

This step is only required if home cluster is on 5.0.5.2/5.0.5.3. You can 
ignore this issue, and restart AFM filesets at cache.

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/24/2021 12:33 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / 
Singularity soft lockups
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,

Sorry for the delay...

On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote:
> 
...
> 1. Stop all AFM filesets at cache using "mmafmctl device stop -j
> fileset" command.
> 2. Perform rolling upgrade parallely at both cache and home clusters
>     a. All nodes on home cluster to 5.0.5.6
>     b. All gateway nodes in cache cluster to 5.0.5.6
>  3. At home cluster, for each fileset target path, repeat below steps
>       a. Remove .afmctl file
>          mmafmlocal rm <fileset target path>/.afm/.afmctl
>       b. Enable AFM

At point 3 I'm getting:

# mmafmlocal rm <target path>/.afm/.afmctl
/bin/rm: cannot remove ?<target path>/.afm/.afmctl?: Operation not 
permitted

afmconfig disable is the same.

Any idea what the issue is?

Thanks,
Rob
-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only.  If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=OLf3tBvTItpLRieM34xb8Xd69tBYbwTDYAecT0D_B7k&s=FCJEEoTWGIoM4eY4SMzE55qskwhAnxC_noZu7fJHoqw&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210324/cae042b1/attachment-0002.htm>

From prasad.surampudi at theatsgroup.com  Wed Mar 24 14:32:30 2021
From: prasad.surampudi at theatsgroup.com (Prasad Surampudi)
Date: Wed, 24 Mar 2021 14:32:30 +0000
Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset
 statistics for some filesystems
Message-ID: <MN2PR13MB2976E1827F4FDDCD08FE9F0C9E639@MN2PR13MB2976.namprd13.prod.outlook.com>

Recently while checking fileset quotas in a ESS cluster, we noticed that the mmrepquota command is not reporting the root fileset quota and inode details for some filesystems. Does anyone else also saw this issue?

Please see the output below. The root fileset shows up for 'prod' filesystem and does not show up for 'prod-private'. I could not figure out why it does not show up for prod-private. Any ideas?

/usr/lpp/mmfs/bin/mmrepquota -j prod-private
                         Block Limits                                    |                     File Limits
Name       fileset    type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace
xFIN    root       FILESET    12028144          0          0          0     none |  4524237       0        0        0     none

/usr/lpp/mmfs/bin/mmrepquota -j prod
                         Block Limits                                    |                     File Limits
Name       fileset    type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace
root       root       FILESET     7106656          0          0 1273643728     none |        7       0        0      400     none
xxx_tick root       FILESET           0          0          0          0     none |        1       0        0        0     none

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210324/f6f9ad44/attachment-0002.htm>

From scale at us.ibm.com  Thu Mar 25 16:33:48 2021
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Thu, 25 Mar 2021 11:33:48 -0500
Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset
 statistics for some filesystems
In-Reply-To: <MN2PR13MB2976E1827F4FDDCD08FE9F0C9E639@MN2PR13MB2976.namprd13.prod.outlook.com>
References: <MN2PR13MB2976E1827F4FDDCD08FE9F0C9E639@MN2PR13MB2976.namprd13.prod.outlook.com>
Message-ID: <OF34F58B90.44FECEB5-ON852586A3.005A27F3-852586A3.005AFC86@notes.na.collabserv.com>


Prasad,

This is unexpected.  Please open a PMR so that data can be collected and
looked at.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	Prasad Surampudi <prasad.surampudi at theatsgroup.com>
To:	"gpfsug-discuss at spectrumscale.org"
            <gpfsug-discuss at spectrumscale.org>
Date:	03/24/2021 10:32 AM
Subject:	[EXTERNAL] [gpfsug-discuss] mmrepquota is not reporting root
            fileset statistics for some filesystems
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Recently while checking fileset quotas in a ESS cluster, we noticed that
the mmrepquota command is not reporting the root fileset quota and inode
details for some filesystems. Does anyone else also saw this issue?

Please see the output below. The root fileset shows up for 'prod'
filesystem and does not show up for 'prod-private'. I could not figure out
why it does not show up for prod-private. Any ideas?

/usr/lpp/mmfs/bin/mmrepquota -j prod-private
                         Block Limits                                    |
File Limits
Name       fileset    type             KB      quota      limit   in_doubt
grace |    files   quota    limit in_doubt    grace
xFIN    root       FILESET    12028144          0          0          0
none |  4524237       0        0        0     none

/usr/lpp/mmfs/bin/mmrepquota -j prod
                         Block Limits                                    |
File Limits
Name       fileset    type             KB      quota      limit   in_doubt
grace |    files   quota    limit in_doubt    grace
root       root       FILESET     7106656          0          0 1273643728
none |        7       0        0      400     none
xxx_tick root       FILESET           0          0          0          0
none |        1       0        0        0     none
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210325/497c8710/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210325/497c8710/attachment-0002.gif>

From oluwasijibomi.saula at ndsu.edu  Mon Mar 29 19:38:00 2021
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Mon, 29 Mar 2021 18:38:00 +0000
Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node
Message-ID: <PH0PR08MB6598F1A7BC557225D417D8C9987E9@PH0PR08MB6598.namprd08.prod.outlook.com>

Hello Folks,

So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset.

These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7.

Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted:


2021-03-29_12:47:37.343-0500: [N] mmfsd ready

2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all

2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1

2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1

2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor <c0n2>

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 <c0n2>

There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident.

I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly...

Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ?


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210329/4ce36267/attachment-0002.htm>

From olaf.weiser at de.ibm.com  Tue Mar 30 07:06:54 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Tue, 30 Mar 2021 06:06:54 +0000
Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node
In-Reply-To: <PH0PR08MB6598F1A7BC557225D417D8C9987E9@PH0PR08MB6598.namprd08.prod.outlook.com>
References: <PH0PR08MB6598F1A7BC557225D417D8C9987E9@PH0PR08MB6598.namprd08.prod.outlook.com>
Message-ID: <OF4FF5120B.5E2B3DE7-ON002586A8.0021023A-002586A8.0021976C@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210330/ae3c3cdd/attachment-0002.htm>

From oluwasijibomi.saula at ndsu.edu  Tue Mar 30 19:24:00 2021
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Tue, 30 Mar 2021 18:24:00 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 110, Issue 34
In-Reply-To: <mailman.61.1617084420.1331.gpfsug-discuss@spectrumscale.org>
References: <mailman.61.1617084420.1331.gpfsug-discuss@spectrumscale.org>
Message-ID: <CO1PR08MB659639A07B856B4185BF0F2E987D9@CO1PR08MB6596.namprd08.prod.outlook.com>

Hey Olaf,

We'll investigate as suggested. I'm hopeful the journald logs would provide some additional insight.

As for OFED versions, we use the same Mellanox version across the cluster and haven't seen any issues with working nodes that mount the filesystem.

We also have a PMR open with IBM but we'll send a follow-up if we discover something more for group discussion.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Tuesday, March 30, 2021 1:07 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 110, Issue 34

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Filesystem mount attempt hangs GPFS client node
      (Saula, Oluwasijibomi)
   2. Re: Filesystem mount attempt hangs GPFS client node (Olaf Weiser)


----------------------------------------------------------------------

Message: 1
Date: Mon, 29 Mar 2021 18:38:00 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client
        node
Message-ID:
        <PH0PR08MB6598F1A7BC557225D417D8C9987E9 at PH0PR08MB6598.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="utf-8"

Hello Folks,

So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset.

These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7.

Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted:


2021-03-29_12:47:37.343-0500: [N] mmfsd ready

2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all

2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1

2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1

2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor <c0n2>

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0

2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1

2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 <c0n2>

There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident.

I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly...

Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ?


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210329/4ce36267/attachment-0001.html>

------------------------------

Message: 2
Date: Tue, 30 Mar 2021 06:06:54 +0000
From: "Olaf Weiser" <olaf.weiser at de.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Filesystem mount attempt hangs GPFS
        client node
Message-ID:
        <OF4FF5120B.5E2B3DE7-ON002586A8.0021023A-002586A8.0021976C at notes.na.collabserv.com>

Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20210330/ae3c3cdd/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 110, Issue 34
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210330/5df0235f/attachment-0002.htm>