From madhu.punjabi at in.ibm.com  Mon Nov  2 08:17:23 2020
From: madhu.punjabi at in.ibm.com (Madhu P Punjabi)
Date: Mon, 2 Nov 2020 08:17:23 +0000
Subject: [gpfsug-discuss] [NFS-Ganesha-Support] 'ganesha_mgr
	display_export - client not listed
In-Reply-To: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch>
References: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch>
Message-ID: <OFB3FCC929.2377A4AA-ON00258614.002C5D33-00258614.002D8995@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201102/e249fee6/attachment.htm>

From christian.vieser at 1und1.de  Mon Nov  2 13:44:50 2020
From: christian.vieser at 1und1.de (Christian Vieser)
Date: Mon, 2 Nov 2020 14:44:50 +0100
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <1109480230.484366.1603799162955@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
Message-ID: <1dffa509-1bcc-0d5a-bf79-fa82746dca07@1und1.de>

Hi Andi,

we suffer from the same issue. IBM support told me that Spectrum Scale
5.1 will come with a new release of the underlying Openstack components,
so we still hope that some/most of limitations will vanish then. But I
already know, that the new S3 policies won't be available, only the
"legacy" S3 ACLs.

We also tried MinIO but deemed that it's not "production ready". It's
fine for quickly setting up a S3 service for development, but they
release too often and with breaking changes, and documentation is
lacking all aspects regarding maintenance.

Regards,

Christian

Am 27.10.20 um 12:46 schrieb Andi Christiansen:
> Hi all,
>
> We have over a longer period used the S3 API within spectrum Scale..
> And that has shown that it does not support very many applications
> because of limitations of the API..

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201102/6b1be66e/attachment.htm>

From jmtick at us.ibm.com  Tue Nov  3 00:21:43 2020
From: jmtick at us.ibm.com (Jacob M Tick)
Date: Tue, 3 Nov 2020 00:21:43 +0000
Subject: [gpfsug-discuss] Use cases for file audit logging and clustered
	watch folder
Message-ID: <OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201103/caba8727/attachment.htm>

From S.J.Thompson at bham.ac.uk  Tue Nov  3 17:00:54 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 3 Nov 2020 17:00:54 +0000
Subject: [gpfsug-discuss] SSUG::Digital Scalable multi-node training for AI
 workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
Message-ID: <a48cee803cee419abd8574a0b71dda44@bham.ac.uk>

Apologies, looks like the calendar invite for this week?s SSUG::Digital didn?t get sent!


Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.


>>>&nbsp;Join&nbsp;Here&nbsp;<<<<https://ibm.webex.com/ibm/onstage/g.php?MTID=e896290a1eef7e81ab4b411669138a17e>


This episode will start 15 minutes later as usual.


   *   San Francisco, USA at 08:15 PST

   *   New York, USA at 11:15 EST

   *   London, United Kingdom at 16:15 GMT

   *   Frankfurt, Germany at 17:15 CET

   *   Pune, India at 21:45 IST


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201103/0c3c01a2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2488 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201103/0c3c01a2/attachment.ics>

From andi at christiansen.xxx  Wed Nov  4 07:14:41 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Wed, 4 Nov 2020 08:14:41 +0100 (CET)
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <1dffa509-1bcc-0d5a-bf79-fa82746dca07@1und1.de>
References: <1109480230.484366.1603799162955@privateemail.com>
	<1dffa509-1bcc-0d5a-bf79-fa82746dca07@1und1.de>
Message-ID: <1512108314.679947.1604474081488@privateemail.com>

Hi Christian,

Thanks for the information! My question also triggered IBM to tell me the same so i think we will stay on S3 with Scale and hoping the same with the new release..

Yes, MinIO is really lacking some good documentation.. but definatly a cool software package that i will keep an eye on in the future...

Best Regards
Andi Christiansen


>     On 11/02/2020 2:44 PM Christian Vieser <christian.vieser at 1und1.de> wrote:
> 
> 
> 
>     Hi Andi,
> 
>     we suffer from the same issue. IBM support told me that Spectrum Scale 5.1 will come with a new release of the underlying Openstack components, so we still hope that some/most of limitations will vanish then. But I already know, that the new S3 policies won't be available, only the "legacy" S3 ACLs.
> 
>     We also tried MinIO but deemed that it's not "production ready". It's fine for quickly setting up a S3 service for development, but they release too often and with breaking changes, and documentation is lacking all aspects regarding maintenance.
> 
>     Regards,
> 
>     Christian
> 
>     Am 27.10.20 um 12:46 schrieb Andi Christiansen:
> 
>         > >         Hi all,
> > 
> >         We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..
> > 
> >     >     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/a492d2e2/attachment.htm>

From joe at excelero.com  Wed Nov  4 12:19:07 2020
From: joe at excelero.com (joe at excelero.com)
Date: Wed, 4 Nov 2020 06:19:07 -0600
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 106, Issue 3
Message-ID: <924bb673-0b2a-420a-8ce2-be24c5e6e4e8@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/fdc15501/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/fdc15501/attachment.bin>

From oluwasijibomi.saula at ndsu.edu  Wed Nov  4 16:05:50 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Wed, 4 Nov 2020 16:05:50 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 106, Issue 3
In-Reply-To: <mailman.1.1604491201.3061627.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1604491201.3061627.gpfsug-discuss@spectrumscale.org>
Message-ID: <PH0PR08MB6598EA67BBF1D74990C2441B98EF0@PH0PR08MB6598.namprd08.prod.outlook.com>

Could someone share the password for the event today? Thanks!


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Wednesday, November 4, 2020 6:00 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 106, Issue 3

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. SSUG::Digital Scalable multi-node training for AI workloads
      on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
      (Simon Thompson)
   2. Re: Alternative to Scale S3 API. (Andi Christiansen)


----------------------------------------------------------------------

Message: 1
Date: Tue, 3 Nov 2020 17:00:54 +0000
From: Simon Thompson <S.J.Thompson at bham.ac.uk>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] SSUG::Digital Scalable multi-node training
        for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum
        Scale
Message-ID: <a48cee803cee419abd8574a0b71dda44 at bham.ac.uk>
Content-Type: text/plain; charset="utf-8"

Apologies, looks like the calendar invite for this week?s SSUG::Digital didn?t get sent!


Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.


>>>&nbsp;Join&nbsp;Here&nbsp;<<<<https://ibm.webex.com/ibm/onstage/g.php?MTID=e896290a1eef7e81ab4b411669138a17e>


This episode will start 15 minutes later as usual.


   *   San Francisco, USA at 08:15 PST

   *   New York, USA at 11:15 EST

   *   London, United Kingdom at 16:15 GMT

   *   Frankfurt, Germany at 17:15 CET

   *   Pune, India at 21:45 IST


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201103/0c3c01a2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2488 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201103/0c3c01a2/attachment-0001.ics>

------------------------------

Message: 2
Date: Wed, 4 Nov 2020 08:14:41 +0100 (CET)
From: Andi Christiansen <andi at christiansen.xxx>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
        Christian Vieser <christian.vieser at 1und1.de>
Subject: Re: [gpfsug-discuss] Alternative to Scale S3 API.
Message-ID: <1512108314.679947.1604474081488 at privateemail.com>
Content-Type: text/plain; charset="utf-8"

Hi Christian,

Thanks for the information! My question also triggered IBM to tell me the same so i think we will stay on S3 with Scale and hoping the same with the new release..

Yes, MinIO is really lacking some good documentation.. but definatly a cool software package that i will keep an eye on in the future...

Best Regards
Andi Christiansen


>     On 11/02/2020 2:44 PM Christian Vieser <christian.vieser at 1und1.de> wrote:
>
>
>
>     Hi Andi,
>
>     we suffer from the same issue. IBM support told me that Spectrum Scale 5.1 will come with a new release of the underlying Openstack components, so we still hope that some/most of limitations will vanish then. But I already know, that the new S3 policies won't be available, only the "legacy" S3 ACLs.
>
>     We also tried MinIO but deemed that it's not "production ready". It's fine for quickly setting up a S3 service for development, but they release too often and with breaking changes, and documentation is lacking all aspects regarding maintenance.
>
>     Regards,
>
>     Christian
>
>     Am 27.10.20 um 12:46 schrieb Andi Christiansen:
>
>         > >         Hi all,
> >
> >         We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..
> >
> >     >     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201104/a492d2e2/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 106, Issue 3
**********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/ef8b79cd/attachment.htm>

From herrmann at sprintmail.com  Sat Nov  7 21:10:36 2020
From: herrmann at sprintmail.com (Ron H)
Date: Sat, 7 Nov 2020 16:10:36 -0500
Subject: [gpfsug-discuss] Use cases for file audit logging and
	clusteredwatch folder
In-Reply-To: <OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>
References: <OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>
Message-ID: <8F771847BDEB4447919D30A16FE48FAB@rone8PC>

Hi Jacob,

Can you point me to a good overview of each of these features?   I know File Audit and Watch is part of the DME Scale edition license, but I can?t seem to find
a good explanation of what these features can offer.

Thanks

Ron


From: Jacob M Tick 
Sent: Monday, November 02, 2020 7:21 PM
To: gpfsug-discuss at spectrumscale.org 
Cc: April Brown 
Subject: [gpfsug-discuss] Use cases for file audit logging and clusteredwatch folder

Hi All, 

I am reaching out on behalf of the Spectrum Scale development team to get some insight on how our customers are using the file audit logging and the clustered watch folder features. If you have it enabled in your test or production environment, could you please elaborate on how and why you are using the feature? Also, knowing how you have the function configured (ie: watching or auditing for certain events, only enabling on certain filesets, ect..) would help us out. Please respond back to April, John (both on CC), and I with any info you are willing to provide. Thanks in advance!

Regards,

Jake Tick
Manager
Spectrum Scale - Scalable Data Interfaces
IBM Systems Group

Email:jmtick at us.ibm.com

IBM


--------------------------------------------------------------------------------
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201107/5b5e649f/attachment.htm>

From jmtick at us.ibm.com  Mon Nov  9 17:31:00 2020
From: jmtick at us.ibm.com (Jacob M Tick)
Date: Mon, 9 Nov 2020 17:31:00 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Use_cases_for_file_audit_logging_and?=
 =?utf-8?q?=09clusteredwatch_folder?=
In-Reply-To: <8F771847BDEB4447919D30A16FE48FAB@rone8PC>
References: <8F771847BDEB4447919D30A16FE48FAB@rone8PC>,
	<OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>
Message-ID: <OF4405685B.5F90D85B-ON0025861B.005F4A68-0025861B.00603915@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201109/6b387bbe/attachment.htm>

From Kamil.Czauz at Squarepoint-Capital.com  Wed Nov 11 22:29:31 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Wed, 11 Nov 2020 22:29:31 +0000
Subject: [gpfsug-discuss] Poor client performance with high cpu usage of
	mmfsd process
Message-ID: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>

We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete.   This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science.  I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier.  Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201111/99c10ebe/attachment.htm>

From UWEFALKE at de.ibm.com  Thu Nov 12 01:56:46 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 12 Nov 2020 02:56:46 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage_of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>

Hi, Kamil, 
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes. 
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client. 

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters 

That might give you a first idea what is taking long and which components 
are involved. 

Also, 
mmdiag --iohist 
shows you the last IOs and some stats (service time, size) for them. 

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example). 
GPFS has a nice tracing which you can configure or just run the default 
trace. 

Running a dedicated (low-level) io trace can be achieved by 
mmtracectl --start --trace=io  --tracedev-write-mode=overwrite -N 
<your_critical_node>
then, when the issue is seen, stop the trace by 
mmtracectl --stop   -N <your_critical_node>

Do not wait  to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output). 


There you should see lines with  FIO which carry the inode of the related 
file after the "tag" keyword.
example: 
0.000745100  25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path. 
you need to build this first if not yet done: 
cd /usr/lpp/mmfs/samples/util ; make
, then run 
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool : 
/usr/lpp/mmfs/samples/debugtools/trsum.awk. 

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit. 

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/11/2020 23:36
Subject:        [EXTERNAL] [gpfsug-discuss] Poor client performance with 
high cpu usage of       mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a 
ls could take a few minutes to complete.   This affects every gpfs mount 
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.
 
The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science.  I often end up killing many processes before finding the 
offending one.
 
My question is really about finding the offending process easier.  Is 
there something similar to iotop or a trace that I can enable that can 
tell me what files/processes and being heavily used by the mmfsd process 
on the client?
 
-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation. 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From luis.bolinches at fi.ibm.com  Thu Nov 12 13:19:05 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Thu, 12 Nov 2020 13:19:05 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage_of=09mmfsd_process?=
In-Reply-To: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
References: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>,
	<BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OF56A60072.6FB30E63-ON0025861E.00491448-0025861E.004928D6@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b3d403a/attachment.htm>

From jyyum at kr.ibm.com  Thu Nov 12 14:10:17 2020
From: jyyum at kr.ibm.com (Jae Yoon Yum)
Date: Thu, 12 Nov 2020 14:10:17 +0000
Subject: [gpfsug-discuss] Question about the Clearing Spectrum Scale GUI
	event
Message-ID: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b2087b7/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163322.png
Type: image/png
Size: 262 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b2087b7/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163323.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b2087b7/attachment.jpg>

From Eric.Wendel at ibm.com  Thu Nov 12 15:43:46 2020
From: Eric.Wendel at ibm.com (Eric Wendel - Eric.Wendel@ibm.com)
Date: Thu, 12 Nov 2020 15:43:46 +0000
Subject: [gpfsug-discuss] Problems reading emails to the mailing list
Message-ID: <31233620a4324240885aed7ad18a729a@ibm.com>

Hi Folks,

As you are no doubt aware, Lotus Notes and its ecosystem is virtually extinct.

For those of us who have moved on to more modern email clients (including an increasing number of IBMERs like me), the email links we receive from SSUG (for example)  'OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4 at notes.na.collabserv.com are useless because they can only be read if you have the Notes client installed.  This is especially problematic for Linux users as the Linux client for Notes is discontinued.

It would be very helpful if the SSUG could move to a modern email platform.

Thanks,

Eric Wendel
eric.wendel at ibm.com  

 
-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of gpfsug-discuss-request at spectrumscale.org
Sent: Thursday, November 12, 2020 8:10 AM
To: gpfsug-discuss at spectrumscale.org
Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 106, Issue 8

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
	gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Poor client performance with high cpu usage of	mmfsd
      process (Luis Bolinches)
   2. Question about the Clearing Spectrum Scale GUI	event
      (Jae Yoon Yum)


----------------------------------------------------------------------

Message: 1
Date: Thu, 12 Nov 2020 13:19:05 +0000
From: "Luis Bolinches" <luis.bolinches at fi.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu
	usage of	mmfsd process
Message-ID:
	<OF56A60072.6FB30E63-ON0025861E.00491448-0025861E.004928D6 at notes.na.collabserv.com>
	
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b3d403a/attachment-0001.html >

------------------------------

Message: 2
Date: Thu, 12 Nov 2020 14:10:17 +0000
From: "Jae Yoon Yum" <jyyum at kr.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Question about the Clearing Spectrum Scale
	GUI	event
Message-ID:
	<OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4 at notes.na.collabserv.com>
	
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b2087b7/attachment.html >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163322.png
Type: image/png
Size: 262 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b2087b7/attachment.png >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163323.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b2087b7/attachment.jpg >

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


End of gpfsug-discuss Digest, Vol 106, Issue 8
**********************************************


From stefan.roth at de.ibm.com  Thu Nov 12 17:13:38 2020
From: stefan.roth at de.ibm.com (Stefan Roth)
Date: Thu, 12 Nov 2020 18:13:38 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Question_about_the_Clearing_Spectrum_S?=
 =?utf-8?q?cale_GUI=09event?=
In-Reply-To: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>
References: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>
Message-ID: <OF81AFB144.75684F13-ONC125861E.005DE5FA-C125861E.005EA21B@notes.na.collabserv.com>

Hello Jay,

as long as those errors are still shown by "mmhealth node show" CLI
command, they will again appear in the GUI.

In the GUI events table you can show an "Event Type" column which is hidden
by default.
Events that have event type "Notice" can be cleared by the "Mark as Read"
action.
Events that have event type "State" can not be cleared by the "Mark as
Read" action. They have to disappear by solving the problem.
If a problem is solved the error should disappear from "mmhealth node show"
and after that it will disappear from the GUI as well.

Mit freundlichen Gr??en / Kind regards

Stefan Roth

Spectrum Scale Developement
                                                                                                                
                                                                                                                
 Phone:            +49 162 4159934                     IBM Deutschland Research & Development                   
                                                      GmbH                                                      
                                                                                                                
 Email:            stefan.roth at de.ibm.com              Am Weiher 24                                             
                                                                                                                
                                                       65451 Kelsterbach                                        
                                                                                                                
                                                                                                                
 IBM Data Privacy                                                                                               
 Statement                                                                                                      
                                                                                                                
 IBM Deutschland                                                                                                
 Research &                                                                                                     
 Development                                                                                                    
 GmbH /                                                                                                         
 Vorsitzender des                                                                                               
 Aufsichtsrats:                                                                                                 
 Gregor Pillen                                                                                                  
 Gesch?ftsf?hrung:                                                                                              
 Dirk Wittkopp                                                                                                  
 Sitz der                                                                                                       
 Gesellschaft:                                                                                                  
 B?blingen /                                                                                                    
 Registergericht:                                                                                               
 Amtsgericht                                                                                                    
 Stuttgart, HRB                                                                                                 
 243294                                                                                                         
                                                                                                                

From:	"Jae Yoon Yum" <jyyum at kr.ibm.com>
To:	gpfsug-discuss at spectrumscale.org
Date:	12.11.2020 15:10
Subject:	[EXTERNAL] [gpfsug-discuss] Question about the Clearing
            Spectrum Scale GUI	event
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi Team,
I hope you all stay safe from COVID 19,

One of my client wants to clear their ?ERROR? events on the Scale GUI.
As you know, there is ?mark as read? for ?warning? messages but there isn?t
for ?ERROR?. (In fact, the ?mark as read? button is exist but it does not
work.)

So I sent him to run this command on cli.
/usr/lpp/mmfs/gui/cli/lshealth --reset

On my test VM, all of the error messages has been cleared when I run the
command?.
But, for the client?s system, client said that  ?All of the error / warning
messages had been appeared again include the one which I had delete by
clicking ?mark as read?.?

Does anyone who has similar experience like this? and How Could I solve
this problem?

Or, Is there any way to clear the event one by one?

* I sent the same message to the Slack 'scale-help' channel.


Thanks.

Jay.


Best Regards,


 JaeYoon(Jay)                              IBM Korea, Three IFC,                            
 Yum                                                                                        
                                                                                            
                                           10 Gukjegeumyung-ro,                             
                                          Yeongdeungpo-gu,                                  
                                                                                            
 IBM Systems                               Seoul, Korea                                     
 Hardware,                                                                                  
 Storage                                                                                    
 Technical Sales                                                                            
                                                                                            
 Mobile :        +82-10-4995-4814          07326                                            
                                                                                            
 e-mail:         jyyum at kr.ibm.com                                                           
                                                                                            

 ? ??? ??? ??, ??? ??, ???? ???? ???? ?????. ???   
 ??? ??? ??? ????, ????? ??? ?? ?????,??IBM ??? ?  
 ?? ??? ????, ????? ???? ????. (If you don't wish to receive   
 e-mail from sender, please send e-mail directly. For IBM e-mail, please click       
 here). ??? ???? ??,??, ?? ??? ???????(??: 02-3781-7800,?  
 ?? mktg at kr.ibm.com )? ?? ?? ???? ?? ???? ? ????.              
                                                                                     

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1E506389.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1E764757.gif
Type: image/gif
Size: 262 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1E982001.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment.jpg>

From arc at b4restore.com  Thu Nov 12 17:33:01 2020
From: arc at b4restore.com (=?utf-8?B?QW5kaSBOw7hyIENocmlzdGlhbnNlbg==?=)
Date: Thu, 12 Nov 2020 17:33:01 +0000
Subject: [gpfsug-discuss] Question about the Clearing Spectrum Scale
	GUI	event
In-Reply-To: <OF81AFB144.75684F13-ONC125861E.005DE5FA-C125861E.005EA21B@notes.na.collabserv.com>
References: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>
	<OF81AFB144.75684F13-ONC125861E.005DE5FA-C125861E.005EA21B@notes.na.collabserv.com>
Message-ID: <PR3P194MB0570EF4EC3BFFE22A7BA8C59FBE70@PR3P194MB0570.EURP194.PROD.OUTLOOK.COM>

Hi Jay,

First of you need to make sure your system is actually healthy. Events that are not fixed will reappear.

I have had a lot of ?stale? entries happening over the last years and more often than not ?/usr/lpp/mmfs/gui/cli/lshealth ?reset? clears the entries if they are not actual faults..

As Stefan says if the errors/warnings are shown in ?mmhealth node show or mmhealth cluster show? they will reappear as they should. (I have sometimes seen stale entries there aswell)

When I have encountered stale entries which wasn?t cleared with ?lshealth ?reset? I could clear them with ?mmsysmoncontrol restart?.

I think I actually run that command maybe once or twice every month because of stale entries in the GUI og mmhealth itself.. don?t know why they happen but they seem to appear more frequently for me atleast.. I have high hopes for the 5.1.0.0/5.1.0.1 release as I have heard there should be some new things for the GUI as well.. not sure what they are yet though &#128522;

Hope this helps.

Cheers
A. Christiansen

Fra: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> P? vegne af Stefan Roth
Sendt: Thursday, November 12, 2020 6:14 PM
Til: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Emne: Re: [gpfsug-discuss] Question about the Clearing Spectrum Scale GUI event


Hello Jay,

as long as those errors are still shown by "mmhealth node show" CLI command, they will again appear in the GUI.

In the GUI events table you can show an "Event Type" column which is hidden by default.
Events that have event type "Notice" can be cleared by the "Mark as Read" action.
Events that have event type "State" can not be cleared by the "Mark as Read" action. They have to disappear by solving the problem.
If a problem is solved the error should disappear from "mmhealth node show" and after that it will disappear from the GUI as well.

Mit freundlichen Gr??en / Kind regards

Stefan Roth

Spectrum Scale Developement

________________________________


Phone:

+49 162 4159934

IBM Deutschland Research & Development GmbH

[cid:image002.gif at 01D6B922.3FE99E70]

Email:

stefan.roth at de.ibm.com<mailto:stefan.roth at de.ibm.com>

Am Weiher 24


65451 Kelsterbach

________________________________

IBM Data Privacy Statement<https://www.ibm.com/privacy/us/en/>

IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


[cid:image003.gif at 01D6B922.3FE99E70]"Jae Yoon Yum" ---12.11.2020 15:10:35---Hi Team, I hope you all stay safe from COVID 19, One of my client wants to clear their ?ERROR? ev

From: "Jae Yoon Yum" <jyyum at kr.ibm.com<mailto:jyyum at kr.ibm.com>>
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Date: 12.11.2020 15:10
Subject: [EXTERNAL] [gpfsug-discuss] Question about the Clearing Spectrum Scale GUI event
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>

________________________________


Hi Team,
I hope you all stay safe from COVID 19,

One of my client wants to clear their ?ERROR? events on the Scale GUI.
As you know, there is ?mark as read? for ?warning? messages but there isn?t for ?ERROR?. (In fact, the ?mark as read? button is exist but it does not work.)

So I sent him to run this command on cli.
/usr/lpp/mmfs/gui/cli/lshealth --reset

On my test VM, all of the error messages has been cleared when I run the command?.
But, for the client?s system, client said that ?All of the error / warning messages had been appeared again include the one which I had delete by clicking ?mark as read?.?

Does anyone who has similar experience like this? and How Could I solve this problem?

Or, Is there any way to clear the event one by one?

* I sent the same message to the Slack 'scale-help' channel.


Thanks.

Jay.


Best Regards,


JaeYoon(Jay) Yum

IBM Korea, Three IFC,

[cid:image005.jpg at 01D6B922.3FE99E70]


10 Gukjegeumyung-ro, Yeongdeungpo-gu,

IBM Systems Hardware, Storage Technical Sales

Seoul, Korea

Mobile :

+82-10-4995-4814

07326

e-mail:

jyyum at kr.ibm.com<mailto:jyyum at kr.ibm.com>


? ??? ??? ??, ??? ??, ???? ???? ???? ?????. ??? ??? ??? ??? ????, ????? ??? ?? ?????,??IBM ??? ??? ??? ????, ????? ???? ????. (If you don't wish to receive e-mail from sender, please send e-mail directly. For IBM e-mail, please click here). ??? ???? ??,??, ?? ??? ???????(??: 02-3781-7800,??? mktg at kr.ibm.com<mailto:mktg at kr.ibm.com> )? ?? ?? ???? ?? ???? ? ????.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 1851 bytes
Desc: image002.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.gif
Type: image/gif
Size: 105 bytes
Desc: image003.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: image005.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.png
Type: image/png
Size: 166 bytes
Desc: image006.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.png
Type: image/png
Size: 616 bytes
Desc: image007.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0001.png>

From Kamil.Czauz at Squarepoint-Capital.com  Fri Nov 13 02:33:17 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Fri, 13 Nov 2020 02:33:17 +0000
Subject: [gpfsug-discuss] Poor client performance with high cpu usage
	of	mmfsd process
In-Reply-To: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
Message-ID: <BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>

Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the iohist report.  Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening.  I took 2 traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
  buffer size: 134217728
   64 kernel trace streams, indices 0-63 (selected by low bits of processor ID)
  128 daemon trace streams, indices 64-191 (selected by low bits of thread ID)
Interval for calibrating clock rate was 100.019054 seconds and 260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
  kernel trace enabled  Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220)
  daemon trace enabled  Thu Nov 12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 20701293488095152)
  all streams included  Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here
  trace quiesced        Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <----   to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it?  The 'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

   21234 ***************** lookup               ************** 0.165851604
  290020 ***************** lookup               ************** 0.151032241
  302757 ***************** lookup               ************** 0.168723402
  301677 ***************** lookup               ************** 0.070016530
  230983 ***************** lookup               ************** 0.127699082
   21233 ***************** lookup               ************** 0.060357257
  309046 ***************** lookup               ************** 0.157124551
  301643 ***************** lookup               ************** 0.165543982
  304042 ***************** lookup               ************** 0.172513838
  167794 ***************** lookup               ************** 0.056056815
  189680 ***************** lookup               ************** 0.062022237
  362216 ***************** lookup               ************** 0.072063619
  406314 ***************** lookup               ************** 0.114121838
  167776 ***************** lookup               ************** 0.114899642
  303016 ***************** lookup               ************** 0.144491120
  290021 ***************** lookup               ************** 0.142311603
  167762 ***************** lookup               ************** 0.144240366
  248530 ***************** lookup               ************** 0.168728131
       0       0.000000000 *********  Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF

Elapsed trace time:                                    0.182617894 seconds
Elapsed trace time from first VFS call to last:        0.182617893
Time idle between VFS calls:                           0.000006317 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  rdwr                       0.012021696        35      343.477
  read_inode2                0.000100787        43        2.344
  follow_link                0.000050609         8        6.326
  pagein                     0.000097806        10        9.781
  revalidate                 0.000010884       156        0.070
  open                       0.001001824        18       55.657
  lookup                     1.152449696        36    32012.492
  delete_inode               0.000036816        38        0.969
  permission                 0.000080574        14        5.755
  release                    0.000470096        18       26.116
  mmap                       0.000340095         9       37.788
  llseek                     0.000001903         9        0.211


User thread stats: GPFS-time(sec)    Appl-time  GPFS-%  Appl-%     Ops
    221919       0.000000244       0.050064080   0.00% 100.00%       4
    167794       0.000011891       0.000069707  14.57%  85.43%       4
    309046       0.147664569       0.000074663  99.95%   0.05%       9
    349767       0.000000070       0.000000000 100.00%   0.00%       1
    301677       0.017638372       0.048741086  26.57%  73.43%      12
     84407       0.000010448       0.000016977  38.10%  61.90%       3
    406314       0.000002279       0.000122367   1.83%  98.17%       7
     25464       0.043270937       0.000006200  99.99%   0.01%       2
    362216       0.000005617       0.000017498  24.30%  75.70%       2
    379982       0.000000626       0.000000000 100.00%   0.00%       1
    230983       0.123947465       0.000056796  99.95%   0.05%       6
     21233       0.047877661       0.004887113  90.74%   9.26%      17
    302757       0.154486003       0.010695642  93.52%   6.48%      24
    248530       0.000006763       0.000035442  16.02%  83.98%       3
    303016       0.014678039       0.000013098  99.91%   0.09%       2
    301643       0.088025575       0.054036566  61.96%  38.04%      33
      3339       0.000034997       0.178199426   0.02%  99.98%      35
     21234       0.164240073       0.000262711  99.84%   0.16%      39
    167762       0.000011886       0.000041865  22.11%  77.89%       3
    336006       0.000001246       0.100519562   0.00% 100.00%      16
    304042       0.121322325       0.019218406  86.33%  13.67%      33
    301644       0.054325242       0.087715613  38.25%  61.75%      37
    301680       0.000015005       0.020838281   0.07%  99.93%       9
    290020       0.147713357       0.000121422  99.92%   0.08%      19
    290021       0.000476072       0.000085833  84.72%  15.28%      10
     44777       0.040819757       0.000010957  99.97%   0.03%       3
    189680       0.000000044       0.000002376   1.82%  98.18%       1
    241759       0.000000698       0.000000000 100.00%   0.00%       1
    184839       0.000001621       0.150341986   0.00% 100.00%      28
    362220       0.000010818       0.000020949  34.05%  65.95%       2
    104687       0.000000495       0.000000000 100.00%   0.00%       1

# total App-read/write = 45 Average duration = 0.000269322 sec
#  time(sec)  count         %     %ile       read      write  avgBytesR  avgBytesW
0.000500         34  0.755556 0.755556         34          0      32889          0
0.001000         10  0.222222 0.977778         10          0     108136          0
0.004000          1  0.022222 1.000000          1          0          8          0

# max concurrant App-read/write = 2
# conc    count         %     %ile
   1         38  0.844444 0.844444
   2          7  0.155556 1.000000


Capture 2

Unfinished operations:

  335096 ***************** lookup               ************** 0.289127895
  334691 ***************** lookup               ************** 0.225380797
  362246 ***************** lookup               ************** 0.052106493
  334694 ***************** lookup               ************** 0.048567769
  362220 ***************** lookup               ************** 0.054825580
  333972 ***************** lookup               ************** 0.275355791
  406314 ***************** lookup               ************** 0.283219905
  334686 ***************** lookup               ************** 0.285973208
  289606 ***************** lookup               ************** 0.064608288
   21233 ***************** lookup               ************** 0.074923689
  189680 ***************** lookup               ************** 0.089702578
  335100 ***************** lookup               ************** 0.151553955
  334685 ***************** lookup               ************** 0.117808430
  167700 ***************** lookup               ************** 0.119441314
  336813 ***************** lookup               ************** 0.120572137
  334684 ***************** lookup               ************** 0.124718126
   21234 ***************** lookup               ************** 0.131124745
   84407 ***************** lookup               ************** 0.132442945
  334696 ***************** lookup               ************** 0.140938740
  335094 ***************** lookup               ************** 0.201637910
  167735 ***************** lookup               ************** 0.164059859
  334687 ***************** lookup               ************** 0.252930745
  334695 ***************** lookup               ************** 0.278037098
  341818       0.291815990 *********  Unfinished IO: buffer/disk 50000015000 3:439888512^\scratch_metadata_5
  341818       0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 199.688 + us
  100041       0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time:                                    0.292021772 seconds
Elapsed trace time from first VFS call to last:        0.292021771
Time idle between VFS calls:                           0.001436519 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  rdwr                       0.000831801         4      207.950
  read_inode2                0.000082347        31        2.656
  pagein                     0.000033905         3       11.302
  revalidate                 0.000013109       156        0.084
  open                       0.000237969        22       10.817
  lookup                     1.233407280        10   123340.728
  delete_inode               0.000013877        33        0.421
  permission                 0.000046486         8        5.811
  release                    0.000172456        21        8.212
  mmap                       0.000064411         2       32.206
  llseek                     0.000000391         2        0.196
  readdir                    0.000213657        36        5.935


User thread stats: GPFS-time(sec)    Appl-time  GPFS-%  Appl-%     Ops
    335094       0.053506265       0.000170270  99.68%   0.32%      16
    167700       0.000008522       0.000027547  23.63%  76.37%       2
    167776       0.000008293       0.000019462  29.88%  70.12%       2
    334684       0.000023562       0.000160872  12.78%  87.22%       8
    349767       0.000000467       0.250029787   0.00% 100.00%       5
     84407       0.000000230       0.000017947   1.27%  98.73%       2
    334685       0.000028543       0.000094147  23.26%  76.74%       8
    406314       0.221755229       0.000009720 100.00%   0.00%       2
    334694       0.000024913       0.000125229  16.59%  83.41%      10
    335096       0.254359005       0.000240785  99.91%   0.09%      18
    334695       0.000028966       0.000127823  18.47%  81.53%      10
    334686       0.223770082       0.000267271  99.88%   0.12%      24
    334687       0.000031265       0.000132905  19.04%  80.96%       9
    334696       0.000033808       0.000131131  20.50%  79.50%       9
    129075       0.000000102       0.000000000 100.00%   0.00%       1
    341842       0.000000318       0.000000000 100.00%   0.00%       1
    335100       0.059518133       0.000287934  99.52%   0.48%      19
    224423       0.000000471       0.000000000 100.00%   0.00%       1
    336812       0.000042720       0.000193294  18.10%  81.90%      10
     21233       0.000556984       0.000083399  86.98%  13.02%      11
    289606       0.000000088       0.000018043   0.49%  99.51%       2
    362246       0.014440188       0.000046516  99.68%   0.32%       4
     21234       0.000524848       0.000162353  76.37%  23.63%      13
    336813       0.000046426       0.000175666  20.90%  79.10%       9
      3339       0.000011816       0.272396876   0.00% 100.00%      29
    341818       0.000000778       0.000000000 100.00%   0.00%       1
    167735       0.000007866       0.000049468  13.72%  86.28%       3
    175480       0.000000278       0.000000000 100.00%   0.00%       1
    336006       0.000001170       0.250020470   0.00% 100.00%      16
     44777       0.000000367       0.250149757   0.00% 100.00%       6
    189680       0.000002717       0.000006518  29.42%  70.58%       1
    184839       0.000003001       0.250144214   0.00% 100.00%      35
    145858       0.000000687       0.000000000 100.00%   0.00%       1
    333972       0.218656404       0.000043897  99.98%   0.02%       4
    334691       0.187695040       0.000295117  99.84%   0.16%      25

# total App-read/write = 7 Average duration = 0.000123672 sec
#  time(sec)  count         %     %ile       read      write  avgBytesR  avgBytesW
0.000500          7  1.000000 1.000000          7          0       1172          0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io  --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by
mmtracectl --stop   -N <your_critical_node>

Do not wait  to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with  FIO which carry the inode of the related file after the "tag" keyword.
example:
0.000745100  25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date:   11/11/2020 23:36
Subject:        [EXTERNAL] [gpfsug-discuss] Poor client performance with
high cpu usage of       mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a
ls could take a few minutes to complete.   This affects every gpfs mount
on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science.  I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier.  Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201113/20e7414e/attachment.htm>

From UWEFALKE at de.ibm.com  Fri Nov 13 09:21:17 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Fri, 13 Nov 2020 10:21:17 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage=09of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
	<BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.00336328@notes.na.collabserv.com>

Hi, Kamil, 
looks your tracefile setting has been too low: 
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
means you effectively captured a period of about 5ms only ... you can't 
see much from that. 

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to. 
try running with something like 
 mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N <your_critical_node>. 

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload. 

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff). 
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 03:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu usage     of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 100.019054 seconds and 
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using 
this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 
1605232600.114080, cycles 20701293141385220)
daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 
1605232600.247430, cycles 20701293488095152)
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893
Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000 
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771
Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components 
are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default 
trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl 
--start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> 
then, when the issue is seen, stop the trace by
mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related 
file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with
high cpu usage of mmfsd process
Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a
ls could take a few minutes to complete. This affects every gpfs mount
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science. I often end up killing many processes before finding the 
offending one.

My question is really about finding the offending process easier. Is there 
something similar to iotop or a trace that I can enable that can tell me 
what files/processes and being heavily used by the mmfsd process on the 
client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From UWEFALKE at de.ibm.com  Fri Nov 13 09:37:04 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Fri, 13 Nov 2020 10:37:04 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage=09of=09mmfsd_process?=
In-Reply-To: <OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
	<BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain>
Message-ID: <OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>

Hi Kamil, 
in my mail just a few minutes ago  I'd overlooked that the buffer size in 
your trace was indeed 128M (I suppose the trace file is adapting that size 
if not set in particular). That is very strange, even under high load, the 
trace should then capture some longer time than 10 secs, and , most of 
all, it should contain much more activities than just these few you had. 
That is very mysterious. 
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when 
everything is normal and check that out - you should see many records, the 
trcsum.awk should list just a small portion of unfinished ops at the end, 
... If that is fine, then the tracing itself is affected by your crritical 
condition (never experienced that before - rather GPFS grinds to a halt 
than the trace is abandoned), and that might well be worth a support 
ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Uwe Falke/Germany/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 10:21
Subject:        Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client 
performance with high cpu usage of      mmfsd process


Hi, Kamil, 
looks your tracefile setting has been too low: 
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
means you effectively captured a period of about 5ms only ... you can't 
see much from that. 

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to. 
try running with something like 
 mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N <your_critical_node>. 

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload. 

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff). 
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 03:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu usage     of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 100.019054 seconds and 
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using 
this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 
1605232600.114080, cycles 20701293141385220)
daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 
1605232600.247430, cycles 20701293488095152)
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893
Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000 
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771
Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components 
are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default 
trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl 
--start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> 
then, when the issue is seen, stop the trace by
mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related 
file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with
high cpu usage of mmfsd process
Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a
ls could take a few minutes to complete. This affects every gpfs mount
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science. I often end up killing many processes before finding the 
offending one.

My question is really about finding the offending process easier. Is there 
something similar to iotop or a trace that I can enable that can tell me 
what files/processes and being heavily used by the mmfsd process on the 
client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From Kamil.Czauz at Squarepoint-Capital.com  Fri Nov 13 13:31:21 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Fri, 13 Nov 2020 13:31:21 +0000
Subject: [gpfsug-discuss] Poor client performance with high cpu
	usage	of	mmfsd process
In-Reply-To: <OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
	<BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain>
	<OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
Message-ID: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>

Hi Uwe -

Regarding your previous message - waiters were coming / going with just 1-2 waiters when I ran the mmdiag command, with very low wait times (<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and this was the header result:

Overwrite trace parameters:
  buffer size: 134217728
   64 kernel trace streams, indices 0-63 (selected by low bits of processor ID)
  128 daemon trace streams, indices 64-191 (selected by low bits of thread ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 cycles
Measured cycle count update rate to be 2600001271 per second <---- using this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
  kernel trace enabled  Fri Nov 13 08:20:01.800558000 2020 (TOD 1605273601.800558, cycles 20807897445779444)
  daemon trace enabled  Fri Nov 13 08:20:01.910017000 2020 (TOD 1605273601.910017, cycles 20807897730372442)
  all streams included  Fri Nov 13 08:20:26.423085049 2020 (TOD 1605273626.423085, cycles 20807961464381068) <---- useful part of trace extends from here
  trace quiesced        Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, cycles 20807965037900696) <----   to here
Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled with lookup commands  / large lookup times.  Can you help debug what those long lookup operations mean?

Unfinished operations:

   27967 ***************** pagein               ************** 1.362382116
   27967 ***************** readpage             ************** 1.362381516
  139130       1.362448448 *********  Unfinished IO: buffer/disk 3002F670000 20:107498951168^\archive_data_16
  104686       1.362022068 *********  Unfinished IO: buffer/disk 50011878000 1:47169618944^\archive_data_1
       0       0.000000000 *********  Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFE
  341710       1.362423815 *********  Unfinished IO: buffer/disk 20022218000 19:107498951680^\archive_data_15
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 18:3452986648^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFF
  139150       1.361122006 *********  Unfinished IO: buffer/disk 50012018000 2:47169622016^\archive_data_2
       0       0.000000000 *********  Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\00000000FFFFFFFF
   95782       1.361112791 *********  Unfinished IO: buffer/disk 40016300000 20:107498950656^\archive_data_16
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\00000000FFFFFFFF
  271076       1.361579585 *********  Unfinished IO: buffer/disk 20023DB8000 4:47169606656^\archive_data_4
  341676       1.362018599 *********  Unfinished IO: buffer/disk 40038140000 5:47169614336^\archive_data_5
  139150       1.361131599 MSG FSnd: nsdMsgReadExt msg_id 2930654492 Sduration 13292.382 + us
  341676       1.362027104 MSG FSnd: nsdMsgReadExt msg_id 2930654495 Sduration 12396.877 + us
   95782       1.361124739 MSG FSnd: nsdMsgReadExt msg_id 2930654491 Sduration 13299.242 + us
  271076       1.361587653 MSG FSnd: nsdMsgReadExt msg_id 2930654493 Sduration 12836.328 + us
   92182       0.000000000 MSG FSnd:  msg_id 0 Sduration 0.000 + us
  341710       1.362429643 MSG FSnd: nsdMsgReadExt msg_id 2930654497 Sduration 11994.338 + us
  341662       0.000000000 MSG FSnd:  msg_id 0 Sduration 0.000 + us
  139130       1.362458376 MSG FSnd: nsdMsgReadExt msg_id 2930654498 Sduration 11965.605 + us
  104686       1.362028772 MSG FSnd: nsdMsgReadExt msg_id 2930654496 Sduration 12395.209 + us
  412373       0.775676657 MSG FRep: nsdMsgReadExt msg_id 304915249 Rduration 598747.324 us Rlen 262144 Hduration 598752.112 + us
  341770       0.589739579 MSG FRep: nsdMsgReadExt msg_id 338079050 Rduration 784684.402 us Rlen 4 Hduration 784692.651 + us
  143315       0.536252844 MSG FRep: nsdMsgReadExt msg_id 631945522 Rduration 838171.137 us Rlen 233472 Hduration 838174.299 + us
  341878       0.134331812 MSG FRep: nsdMsgReadExt msg_id 338079023 Rduration 1240092.169 us Rlen 262144 Hduration 1240094.403 + us
  175478       0.587353287 MSG FRep: nsdMsgReadExt msg_id 338079047 Rduration 787070.694 us Rlen 262144 Hduration 787073.990 + us
  139558       0.633517347 MSG FRep: nsdMsgReadExt msg_id 631945538 Rduration 740906.634 us Rlen 102400 Hduration 740910.172 + us
  143308       0.958832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration 415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time:                                    1.374423981 seconds
Elapsed trace time from first VFS call to last:        1.374423980
Time idle between VFS calls:                           0.001603738 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  readpage                   1.151660085      1874      614.546
  rdwr                       0.431456904       581      742.611
  read_inode2                0.001180648       934        1.264
  follow_link                0.000029502         7        4.215
  getattr                    0.000048413         9        5.379
  revalidate                 0.000007080        67        0.106
  pagein                     1.149699537      1877      612.520
  create                     0.007664829         9      851.648
  open                       0.001032657        19       54.350
  unlink                     0.002563726        14      183.123
  delete_inode               0.000764598       826        0.926
  lookup                     0.312847947       953      328.277
  setattr                    0.020651226       824       25.062
  permission                 0.000015018         1       15.018
  rename                     0.000529023         4      132.256
  release                    0.001613800        22       73.355
  getxattr                   0.000030494         6        5.082
  mmap                       0.000054767         1       54.767
  llseek                     0.000001130         4        0.283
  readdir                    0.000033947         2       16.973
  removexattr                0.002119736       820        2.585

User thread stats: GPFS-time(sec)    Appl-time  GPFS-%  Appl-%     Ops
     42625       0.000000138       0.000031017   0.44%  99.56%       3
     42378       0.000586959       0.011596801   4.82%  95.18%      32
     42627       0.000000272       0.000013421   1.99%  98.01%       2
     42641       0.003284590       0.012593594  20.69%  79.31%      35
     42628       0.001522335       0.000002748  99.82%   0.18%       2
     25464       0.003462795       0.500281914   0.69%  99.31%      12
    301420       0.000016711       0.052848218   0.03%  99.97%      38
     95103       0.000000544       0.000000000 100.00%   0.00%       1
    145858       0.000000659       0.000794896   0.08%  99.92%       2
     42221       0.000011484       0.000039445  22.55%  77.45%       5
    371718       0.000000707       0.001805425   0.04%  99.96%       2
     95109       0.000000880       0.008998763   0.01%  99.99%       2
     95337       0.000010330       0.503057866   0.00% 100.00%       8
     42700       0.002442175       0.012504429  16.34%  83.66%      35
    189680       0.003466450       0.500128627   0.69%  99.31%       9
     42681       0.006685396       0.000391575  94.47%   5.53%      16
     42702       0.000048203       0.000000500  98.97%   1.03%       2
     42703       0.000033280       0.140102087   0.02%  99.98%       9
    224423       0.000000195       0.000000000 100.00%   0.00%       1
     42706       0.000541098       0.000014713  97.35%   2.65%       3
    106275       0.000000456       0.000000000 100.00%   0.00%       1
     42721       0.000372857       0.000000000 100.00%   0.00%       1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi Kamil,
in my mail just a few minutes ago  I'd overlooked that the buffer size in your trace was indeed 128M (I suppose the trace file is adapting that size if not set in particular). That is very strange, even under high load, the trace should then capture some longer time than 10 secs, and , most of all, it should contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when everything is normal and check that out - you should see many records, the trcsum.awk should list just a small portion of unfinished ops at the end, ... If that is fine, then the tracing itself is affected by your crritical condition (never experienced that before - rather GPFS grinds to a halt than the trace is abandoned), and that might well be worth a support ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Uwe Falke/Germany/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 10:21
Subject:        Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client
performance with high cpu usage of      mmfsd process


Hi, Kamil,
looks your tracefile setting has been too low:
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here means you effectively captured a period of about 5ms only ... you can't see much from that.

I'd assumed the default trace file size would be sufficient here but it doesn't seem to.
try running with something like
 mmtracectl --start --trace-file-size=512M --trace=io --tracedev-write-mode=overwrite -N <your_critical_node>.

However, if you say "no major waiter" - how many waiters did you see at any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload.

The very short cyle time of the trace points, OTOH, to high activity, OTOH the trace file setting appears quite low (trace=io doesnt' collect many trace infos, just basic IO stuff).
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 03:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance
with high cpu usage     of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 100.019054 seconds and
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using this value OS reported cycle count update rate as 2599999000 per second Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220) daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 20701293488095152) all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893 Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771 Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete. This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science. I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier. Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201113/c702b31c/attachment.htm>

From stockf at us.ibm.com  Fri Nov 13 13:38:48 2020
From: stockf at us.ibm.com (Frederick Stock)
Date: Fri, 13 Nov 2020 13:38:48 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu?=
 =?utf-8?q?=09usage=09of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>,
	<BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com><BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com><OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain><OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
Message-ID: <OF0260E625.4DAA590F-ON0025861F.004AF58C-0025861F.004AF6E7@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201113/6be5bbb6/attachment.htm>

From kkr at lbl.gov  Fri Nov 13 21:11:16 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Fri, 13 Nov 2020 13:11:16 -0800
Subject: [gpfsug-discuss] REMINDER - SC20 Sessions - Monday Nov. 16 and
	Wednesday Nov. 18
Message-ID: <7B85E526-88D4-44AE-B034-4EC5A61E524C@lbl.gov>

Hi all,

	A Reminder to attend and also submit any panel questions for the Wednesday session. So far, there are 3 questions around these topics:

1)  excessive prefetch when reading small fractions of many large files
2)  improved the integration between TSM and GPFS
3) number of security vulnerabilities in GPFS, the GUI, ESS, or something else related

	Bring on your tough questions and make it interesting.

Cheers,
Kristy


?original email---

	The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is:

"Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST 

and the second one is 

"What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST.  

Please see the calendar at https://www.spectrumscaleug.org/eventslist/2020-11/ and register by clicking on a session on the calendar and then the "Please register here to join the session" link.

Best,
Kristy

Kristy Kallback-Rose
Senior HPC Storage Systems Analyst
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory


From UWEFALKE at de.ibm.com  Mon Nov 16 13:45:57 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Mon, 16 Nov 2020 14:45:57 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu?=
 =?utf-8?q?=09usage=09of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com><BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com><OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain><OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
	<BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OF2441E9DD.746C6053-ONC1258622.00456CFA-C1258622.004B9E2D@notes.na.collabserv.com>

Hi, 
while the other nodes can well block the local one, as Frederick suggests, 
 there should at least be something visible locally waiting for these 
other nodes. 
Looking at all waiters might be a good thing, but this case looks strange 
in other ways. Mind statement there are almost no local waiters and none 
of them gets older than 10 ms.

I am no developer nor do I have the code, so don't expect too much.  Can 
you tell what lookups you see (check in the trcrpt file, could be like 
gpfs_i_lookup or gpfs_v_lookup)? 
Lookups are metadata ops, do you have a separate pool for your metadata? 
How is that pool set up (doen to the physical block devices)?
Your trcsum down revealed 36 lookups, each one on avg taking >30ms. That 
is a lot (albeit the respective waiters won't show up at first glance as 
suspicious ...). 
So, which waiters did you see  (hope you saved them, if not, do it next 
time).

What are the node you see this on and the whole cluster used for? What is 
the MaxFilesToCache setting (for that node and for others)? what HW is 
that, how big are your nodes (memory,CPU)?
To check the unreasonably short trace capture time: how large are the 
trcrpt files you obtain?


Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 14:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu   usage   of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

Regarding your previous message - waiters were coming / going with just 
1-2 waiters when I ran the mmdiag command, with very low wait times 
(<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and 
this was the header result:

Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 
cycles
Measured cycle count update rate to be 2600001271 per second <---- using 
this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
kernel trace enabled Fri Nov 13 08:20:01.800558000 2020 (TOD 
1605273601.800558, cycles 20807897445779444)
daemon trace enabled Fri Nov 13 08:20:01.910017000 2020 (TOD 
1605273601.910017, cycles 20807897730372442)
all streams included Fri Nov 13 08:20:26.423085049 2020 (TOD 
1605273626.423085, cycles 20807961464381068) <---- useful part of trace 
extends from here
trace quiesced Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, 
cycles 20807965037900696) <---- to here
Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled 
with lookup commands / large lookup times. Can you help debug what those 
long lookup operations mean?

Unfinished operations:

27967 ***************** pagein ************** 1.362382116
27967 ***************** readpage ************** 1.362381516
139130 1.362448448 ********* Unfinished IO: buffer/disk 3002F670000 
20:107498951168^\archive_data_16
104686 1.362022068 ********* Unfinished IO: buffer/disk 50011878000 
1:47169618944^\archive_data_1
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\FFFFFFFE
341710 1.362423815 ********* Unfinished IO: buffer/disk 20022218000 
19:107498951680^\archive_data_15
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
18:3452986648^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
18:3452986648^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
18:3452986648^\FFFFFFFF
139150 1.361122006 ********* Unfinished IO: buffer/disk 50012018000 
2:47169622016^\archive_data_2
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\00000000FFFFFFFF
95782 1.361112791 ********* Unfinished IO: buffer/disk 40016300000 
20:107498950656^\archive_data_16
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\00000000FFFFFFFF
271076 1.361579585 ********* Unfinished IO: buffer/disk 20023DB8000 
4:47169606656^\archive_data_4
341676 1.362018599 ********* Unfinished IO: buffer/disk 40038140000 
5:47169614336^\archive_data_5
139150 1.361131599 MSG FSnd: nsdMsgReadExt msg_id 2930654492 Sduration 
13292.382 + us
341676 1.362027104 MSG FSnd: nsdMsgReadExt msg_id 2930654495 Sduration 
12396.877 + us
95782 1.361124739 MSG FSnd: nsdMsgReadExt msg_id 2930654491 Sduration 
13299.242 + us
271076 1.361587653 MSG FSnd: nsdMsgReadExt msg_id 2930654493 Sduration 
12836.328 + us
92182 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
341710 1.362429643 MSG FSnd: nsdMsgReadExt msg_id 2930654497 Sduration 
11994.338 + us
341662 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
139130 1.362458376 MSG FSnd: nsdMsgReadExt msg_id 2930654498 Sduration 
11965.605 + us
104686 1.362028772 MSG FSnd: nsdMsgReadExt msg_id 2930654496 Sduration 
12395.209 + us
412373 0.775676657 MSG FRep: nsdMsgReadExt msg_id 304915249 Rduration 
598747.324 us Rlen 262144 Hduration 598752.112 + us
341770 0.589739579 MSG FRep: nsdMsgReadExt msg_id 338079050 Rduration 
784684.402 us Rlen 4 Hduration 784692.651 + us
143315 0.536252844 MSG FRep: nsdMsgReadExt msg_id 631945522 Rduration 
838171.137 us Rlen 233472 Hduration 838174.299 + us
341878 0.134331812 MSG FRep: nsdMsgReadExt msg_id 338079023 Rduration 
1240092.169 us Rlen 262144 Hduration 1240094.403 + us
175478 0.587353287 MSG FRep: nsdMsgReadExt msg_id 338079047 Rduration 
787070.694 us Rlen 262144 Hduration 787073.990 + us
139558 0.633517347 MSG FRep: nsdMsgReadExt msg_id 631945538 Rduration 
740906.634 us Rlen 102400 Hduration 740910.172 + us
143308 0.958832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration 
415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time: 1.374423981 seconds
Elapsed trace time from first VFS call to last: 1.374423980
Time idle between VFS calls: 0.001603738 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
readpage 1.151660085 1874 614.546
rdwr 0.431456904 581 742.611
read_inode2 0.001180648 934 1.264
follow_link 0.000029502 7 4.215
getattr 0.000048413 9 5.379
revalidate 0.000007080 67 0.106
pagein 1.149699537 1877 612.520
create 0.007664829 9 851.648
open 0.001032657 19 54.350
unlink 0.002563726 14 183.123
delete_inode 0.000764598 826 0.926
lookup 0.312847947 953 328.277
setattr 0.020651226 824 25.062
permission 0.000015018 1 15.018
rename 0.000529023 4 132.256
release 0.001613800 22 73.355
getxattr 0.000030494 6 5.082
mmap 0.000054767 1 54.767
llseek 0.000001130 4 0.283
readdir 0.000033947 2 16.973
removexattr 0.002119736 820 2.585

User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
42625 0.000000138 0.000031017 0.44% 99.56% 3
42378 0.000586959 0.011596801 4.82% 95.18% 32
42627 0.000000272 0.000013421 1.99% 98.01% 2
42641 0.003284590 0.012593594 20.69% 79.31% 35
42628 0.001522335 0.000002748 99.82% 0.18% 2
25464 0.003462795 0.500281914 0.69% 99.31% 12
301420 0.000016711 0.052848218 0.03% 99.97% 38
95103 0.000000544 0.000000000 100.00% 0.00% 1
145858 0.000000659 0.000794896 0.08% 99.92% 2
42221 0.000011484 0.000039445 22.55% 77.45% 5
371718 0.000000707 0.001805425 0.04% 99.96% 2
95109 0.000000880 0.008998763 0.01% 99.99% 2
95337 0.000010330 0.503057866 0.00% 100.00% 8
42700 0.002442175 0.012504429 16.34% 83.66% 35
189680 0.003466450 0.500128627 0.69% 99.31% 9
42681 0.006685396 0.000391575 94.47% 5.53% 16
42702 0.000048203 0.000000500 98.97% 1.03% 2
42703 0.000033280 0.140102087 0.02% 99.98% 9
224423 0.000000195 0.000000000 100.00% 0.00% 1
42706 0.000541098 0.000014713 97.35% 2.65% 3
106275 0.000000456 0.000000000 100.00% 0.00% 1
42721 0.000372857 0.000000000 100.00% 0.00% 1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi Kamil,
in my mail just a few minutes ago I'd overlooked that the buffer size in 
your trace was indeed 128M (I suppose the trace file is adapting that size 
if not set in particular). That is very strange, even under high load, the 
trace should then capture some longer time than 10 secs, and , most of 
all, it should contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when 
everything is normal and check that out - you should see many records, the 
trcsum.awk should list just a small portion of unfinished ops at the end, 
... If that is fine, then the tracing itself is affected by your crritical 
condition (never experienced that before - rather GPFS grinds to a halt 
than the trace is abandoned), and that might well be worth a support 
ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: Uwe Falke/Germany/IBM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 10:21
Subject: Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client
performance with high cpu usage of mmfsd process


Hi, Kamil,
looks your tracefile setting has been too low:
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 
1605232700.000133, cycles 20701553190681534) <---- to here means you 
effectively captured a period of about 5ms only ... you can't see much 
from that.

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to.
try running with something like
mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N <your_critical_node>.

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload.

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff).
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 03:33
Subject: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance
with high cpu usage of mmfsd process
Sent by: gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 100.019054 seconds and
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using 
this value OS reported cycle count update rate as 2599999000 per second 
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 
1605232600.114080, cycles 20701293141385220) daemon trace enabled Thu Nov 
12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 
20701293488095152) all streams included Thu Nov 12 20:58:19.950515266 2020 
(TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of 
trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 
(TOD 1605232700.000133, cycles 20701553190681534) <---- to here 
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893 Time idle 
between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs 
rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec # time(sec) 
count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771 Time idle 
between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs 
rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) 
count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components 
are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default 
trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl 
--start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> 
then, when the issue is seen, stop the trace by mmtracectl --stop -N 
<your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related 
file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu 
usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a 
ls could take a few minutes to complete. This affects every gpfs mount on 
the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science. I often end up killing many processes before finding the 
offending one.

My question is really about finding the offending process easier. Is there 
something similar to iotop or a trace that I can enable that can tell me 
what files/processes and being heavily used by the mmfsd process on the 
client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From andi at christiansen.xxx  Mon Nov 16 19:44:14 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Mon, 16 Nov 2020 20:44:14 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over
	NFS?
Message-ID: <1388247256.209171.1605555854969@privateemail.com>

Hi all,

i have got a case where a customer wants 700TB migrated from isilon to Scale and the only way for him is exporting the same directory on NFS from two different nodes...

as of now we are using multiple rsync processes on different parts of folders within the main directory. this is really slow and will take forever.. right now 14 rsync processes spread across 3 nodes fetching from 2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from scale nodes and 20Gbits from isilon so we should be able to reach just under 20Gbit...


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201116/db8d01a1/attachment.htm>

From stockf at us.ibm.com  Mon Nov 16 21:44:30 2020
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 16 Nov 2020 21:44:30 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201116/a17a56c4/attachment.htm>

From skylar2 at uw.edu  Mon Nov 16 21:58:19 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Mon, 16 Nov 2020 13:58:19 -0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <20201116215819.wda6nophekamzs3v@thargelion>

When we did a similar (though larger, at ~2.5PB) migration, we used rsync
as well, but ran one rsync process per Isilon node, and made sure the NFS
clients were hitting separate Isilon nodes for their reads. We also didn't
have more than one rsync process running per client, as the Linux NFS
client (at least in CentOS 6) was terrible when it came to concurrent access.

Whatever method you end up using, I can guarantee you will be much happier
once you are on GPFS. :)

On Mon, Nov 16, 2020 at 08:44:14PM +0100, Andi Christiansen wrote:
> Hi all,
> 
> i have got a case where a customer wants 700TB migrated from isilon to Scale and the only way for him is exporting the same directory on NFS from two different nodes...
> 
> as of now we are using multiple rsync processes on different parts of folders within the main directory. this is really slow and will take forever.. right now 14 rsync processes spread across 3 nodes fetching from 2.. 
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from scale nodes and 20Gbits from isilon so we should be able to reach just under 20Gbit...
> 
> 
> if anyone have any ideas they are welcome! 
> 
> 
> Thanks in advance 
> Andi Christiansen

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From jonathan.buzzard at strath.ac.uk  Mon Nov 16 22:58:49 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 16 Nov 2020 22:58:49 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <4de1fa02-a074-0901-cf12-31be9e843f5f@strath.ac.uk>

On 16/11/2020 19:44, Andi Christiansen wrote:
> Hi all,
> 
> i have got a case where a customer wants 700TB migrated from isilon to 
> Scale and the only way for him is exporting the same directory on NFS 
> from two different nodes...
> 
> as of now we are using multiple rsync processes on different parts of 
> folders within the main directory. this is really slow and will take 
> forever.. right now 14 rsync processes spread across 3 nodes fetching 
> from 2..
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit 
> from scale nodes and 20Gbits from isilon so we should be able to reach 
> just under 20Gbit...
> 
> 
> if anyone have any ideas they are welcome!
> 

My biggest recommendation when doing this is to use a sqlite database to 
keep track of what is going on.

The main issue is that you are almost certainly going to need to do more 
than one rsync pass unless your source Isilon system has no user 
activity, and with 700TB to move that seems unlikely. Typically you do 
an initial rsync to move the bulk of the data while the users are still 
live, then shutdown user access to the source system and do the final 
rsync which hopefully has a significantly smaller amount of data to 
actually move.

So this is what I have done on a number of occasions now. I create a 
very simple sqlite DB with a list of source and destination folders and 
a status code. Initially the status code is set to -1.

Then I have a perl script which looks at the sqlite DB, picks a row with 
a status code of -1, and sets the status code to -2, aka that directory 
is in progress. It then proceeds to run the rsync and when it finishes 
it updates the status code to the exit code of the rsync process.

As long as all the rsync processes have access to the same copy of the 
sqlite DB (simplest to put it on either the source or destination file 
system) then all is good. You can fire off multiple rsync's on multiple 
nodes and they will all keep churning away till there is no more work to 
be done.

The advantage is you can easily interrogate the DB to find out the state 
of play. That is how many of your transfers have completed, how many are 
yet to be done, which ones are currently being transferred etc. without 
logging onto multiple nodes.

*MOST* importantly you can see if any of the rsync's had an error, by 
simply looking for status codes greater than zero. I cannot stress how 
important this is. Noting that if the source is still active you will 
see errors down to files being deleted on the source file system before 
rsync has a chance to copy them. However this has a specific exit code 
(24) so is easy to spot and not worry about.

Finally it is also very simple to set the status codes to -1 again and 
set the process away again. So the final run is easier to do.

If you want to mail me off list I can dig out a copy of the perl code I 
used if your interested. There are several version as I have tended to 
tailor to each transfer.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Mon Nov 16 23:12:47 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 16 Nov 2020 23:12:47 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <20201116215819.wda6nophekamzs3v@thargelion>
References: <1388247256.209171.1605555854969@privateemail.com>
	<20201116215819.wda6nophekamzs3v@thargelion>
Message-ID: <8d4d2987-77dd-e3e1-1c98-a635f1b96ddd@strath.ac.uk>

On 16/11/2020 21:58, Skylar Thompson wrote:
> When we did a similar (though larger, at ~2.5PB) migration, we used rsync
> as well, but ran one rsync process per Isilon node, and made sure the NFS
> clients were hitting separate Isilon nodes for their reads. We also didn't
> have more than one rsync process running per client, as the Linux NFS
> client (at least in CentOS 6) was terrible when it came to concurrent access.
> 

The million dollar question IMHO is the number of files and their sizes.

Basically if you have a million 1KB files to move it is going to take 
much longer than a 100 1GB files. That is the overhead of dealing with 
each file is a real bitch and kills your attainable transfer speed stone 
dead.

One option I have used in the past is to use your last backup and 
restore to the new system, then rsync in the changes. That way you don't 
impact the source file system which is live.

Another option I have used is to inform users in advance that data will 
be transferred based on a metric of how many files and how much data 
they have. So the less data and fewer files the quicker you will get 
access to the new system once access to the old system is turned off.

It is amazing how much users clear up junk under this scenario. Last 
time I did this a single user went from over 17 million files to 11 
thousand! In total many many TB of data just vanished from the system 
(around half of the data when puff) as users actually got around to some 
house keeping LOL. Moving less data and files is always less painful.

> Whatever method you end up using, I can guarantee you will be much happier
> once you are on GPFS. :)
> 
Goes without saying :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From UWEFALKE at de.ibm.com  Tue Nov 17 08:50:56 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Tue, 17 Nov 2020 09:50:56 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309BB9@notes.na.collabserv.com>

Hi Andi, 

what about leaving NFS completeley out and using rsync  (multiple rsyncs 
in parallel, of course) directly between your source and target servers? 
I am not sure how many TCP connections (suppose it is NFS4) in parallel 
are opened between client and server, using a 2x bonded interface well 
requires at least two.  That combined with the DB approach suggested by 
Jonathan to control the activity of the rsync streams would be my best 
guess.
If you have many small files, the overhead might still kill you. Tarring 
them up into larger aggregates for transfer would help a lot, but then you 
must be sure they won't change or you need to implement your own version 
control for that class of files.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Andi Christiansen <andi at christiansen.xxx>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   16/11/2020 20:44
Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
Isilon to Scale over    NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi all, 

i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes... 

as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit... 


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From UWEFALKE at de.ibm.com  Tue Nov 17 08:57:07 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Tue, 17 Nov 2020 09:57:07 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
Message-ID: <OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>

Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but 
it is over two nodes, so no bonding. But still, I'd expect to open several 
TCP connections in parallel per source-target pair  (like with several 
rsyncs per source node) would bear an advantage (and still I thing NFS 
doesn't do that, but I can be wrong). 
If more nodes have access to the Isilon data they could also participate 
(and don't need NFS exports for that).

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Uwe Falke/Germany/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   17/11/2020 09:50
Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over       NFS?


Hi Andi, 

what about leaving NFS completeley out and using rsync  (multiple rsyncs 
in parallel, of course) directly between your source and target servers? 
I am not sure how many TCP connections (suppose it is NFS4) in parallel 
are opened between client and server, using a 2x bonded interface well 
requires at least two.  That combined with the DB approach suggested by 
Jonathan to control the activity of the rsync streams would be my best 
guess.
If you have many small files, the overhead might still kill you. Tarring 
them up into larger aggregates for transfer would help a lot, but then you 
must be sure they won't change or you need to implement your own version 
control for that class of files.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Andi Christiansen <andi at christiansen.xxx>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   16/11/2020 20:44
Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
Isilon to Scale over    NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi all, 

i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes... 

as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit... 


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From andi at christiansen.xxx  Tue Nov 17 11:51:58 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 17 Nov 2020 12:51:58 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over	NFS?
In-Reply-To: <OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
Message-ID: <616234716.258600.1605613918767@privateemail.com>

Hi all,

thanks for all the information, there was some interesting things amount it..

I kept on going with rsync and ended up making a file with all top level user directories and splitting them into chunks of 347 per rsync session(total 42000 ish folders). yesterday we had only 14 sessions with 3000 folders in each and that was too much work for one rsync session..

i divided them out among all GPFS nodes to have them fetch an area each and actually doing that 3 times on each node and that has now boosted the bandwidth usage from 3Gbit to around 16Gbit in total..

all nodes have been seing doing work above 7Gbit individual which is actually near to what i was expecting without any modifications to the NFS server or TCP tuning..

CPU is around 30-50% on each server and mostly below or around 30% so it seems like it could have handled abit more sessions..

Small files are really a killer but with all 96+ sessions we have now its not often all sessions are handling small files at the same time so we have an average of about 10-12Gbit bandwidth usage.

Thanks all! ill keep you in mind if for some reason we see it slowing down again but for now i think we will try to see if it will go the last mile with a bit more sessions on each :)

Best Regards
Andi Christiansen

> On 11/17/2020 9:57 AM Uwe Falke <uwefalke at de.ibm.com> wrote:
> 
>  
> Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but 
> it is over two nodes, so no bonding. But still, I'd expect to open several 
> TCP connections in parallel per source-target pair  (like with several 
> rsyncs per source node) would bear an advantage (and still I thing NFS 
> doesn't do that, but I can be wrong). 
> If more nodes have access to the Isilon data they could also participate 
> (and don't need NFS exports for that).
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Uwe Falke
> IT Specialist
> Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
> Services
> +49 175 575 2877 Mobile
> Rathausstr. 7, 09111 Chemnitz, Germany
> uwefalke at de.ibm.com
> 
> IBM Services
> 
> IBM Data Privacy Statement
> 
> IBM Deutschland Business & Technology Services GmbH
> Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
> 
> 
> 
> From:   Uwe Falke/Germany/IBM
> To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:   17/11/2020 09:50
> Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data 
> from Isilon to Scale over       NFS?
> 
> 
> Hi Andi, 
> 
> what about leaving NFS completeley out and using rsync  (multiple rsyncs 
> in parallel, of course) directly between your source and target servers? 
> I am not sure how many TCP connections (suppose it is NFS4) in parallel 
> are opened between client and server, using a 2x bonded interface well 
> requires at least two.  That combined with the DB approach suggested by 
> Jonathan to control the activity of the rsync streams would be my best 
> guess.
> If you have many small files, the overhead might still kill you. Tarring 
> them up into larger aggregates for transfer would help a lot, but then you 
> must be sure they won't change or you need to implement your own version 
> control for that class of files.
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Uwe Falke
> IT Specialist
> Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
> Services
> +49 175 575 2877 Mobile
> Rathausstr. 7, 09111 Chemnitz, Germany
> uwefalke at de.ibm.com
> 
> IBM Services
> 
> IBM Data Privacy Statement
> 
> IBM Deutschland Business & Technology Services GmbH
> Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
> 
> 
> 
> 
> From:   Andi Christiansen <andi at christiansen.xxx>
> To:     "gpfsug-discuss at spectrumscale.org" 
> <gpfsug-discuss at spectrumscale.org>
> Date:   16/11/2020 20:44
> Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
> Isilon to Scale over    NFS?
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> Hi all, 
> 
> i have got a case where a customer wants 700TB migrated from isilon to 
> Scale and the only way for him is exporting the same directory on NFS from 
> two different nodes... 
> 
> as of now we are using multiple rsync processes on different parts of 
> folders within the main directory. this is really slow and will take 
> forever.. right now 14 rsync processes spread across 3 nodes fetching from 
> 2.. 
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
> scale nodes and 20Gbits from isilon so we should be able to reach just 
> under 20Gbit... 
> 
> 
> if anyone have any ideas they are welcome! 
> 
> 
> Thanks in advance 
> Andi Christiansen _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From janfrode at tanso.net  Tue Nov 17 12:07:30 2020
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Tue, 17 Nov 2020 13:07:30 +0100
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <616234716.258600.1605613918767@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
Message-ID: <CAHwPatjXpM_2B_FZd8b_ETQXL2AS3Es6oJmq7VVzoxa-8kp48g@mail.gmail.com>

Nice to see it working well!

But, what about ACLs? Does you rsync pull in all needed metadata, or do you
also need to sync ACLs ? Any plans for how to solve that ?

On Tue, Nov 17, 2020 at 12:52 PM Andi Christiansen <andi at christiansen.xxx>
wrote:

> Hi all,
>
> thanks for all the information, there was some interesting things amount
> it..
>
> I kept on going with rsync and ended up making a file with all top level
> user directories and splitting them into chunks of 347 per rsync
> session(total 42000 ish folders). yesterday we had only 14 sessions with
> 3000 folders in each and that was too much work for one rsync session..
>
> i divided them out among all GPFS nodes to have them fetch an area each
> and actually doing that 3 times on each node and that has now boosted the
> bandwidth usage from 3Gbit to around 16Gbit in total..
>
> all nodes have been seing doing work above 7Gbit individual which is
> actually near to what i was expecting without any modifications to the NFS
> server or TCP tuning..
>
> CPU is around 30-50% on each server and mostly below or around 30% so it
> seems like it could have handled abit more sessions..
>
> Small files are really a killer but with all 96+ sessions we have now its
> not often all sessions are handling small files at the same time so we have
> an average of about 10-12Gbit bandwidth usage.
>
> Thanks all! ill keep you in mind if for some reason we see it slowing down
> again but for now i think we will try to see if it will go the last mile
> with a bit more sessions on each :)
>
> Best Regards
> Andi Christiansen
>
> > On 11/17/2020 9:57 AM Uwe Falke <uwefalke at de.ibm.com> wrote:
> >
> >
> > Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons,
> but
> > it is over two nodes, so no bonding. But still, I'd expect to open
> several
> > TCP connections in parallel per source-target pair  (like with several
> > rsyncs per source node) would bear an advantage (and still I thing NFS
> > doesn't do that, but I can be wrong).
> > If more nodes have access to the Isilon data they could also participate
> > (and don't need NFS exports for that).
> >
> > Mit freundlichen Gr??en / Kind regards
> >
> > Dr. Uwe Falke
> > IT Specialist
> > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> > Services
> > +49 175 575 2877 Mobile
> > Rathausstr. 7, 09111 Chemnitz, Germany
> > uwefalke at de.ibm.com
> >
> > IBM Services
> >
> > IBM Data Privacy Statement
> >
> > IBM Deutschland Business & Technology Services GmbH
> > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> > Sitz der Gesellschaft: Ehningen
> > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >
> >
> >
> > From:   Uwe Falke/Germany/IBM
> > To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Date:   17/11/2020 09:50
> > Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data
> > from Isilon to Scale over       NFS?
> >
> >
> > Hi Andi,
> >
> > what about leaving NFS completeley out and using rsync  (multiple rsyncs
> > in parallel, of course) directly between your source and target servers?
> > I am not sure how many TCP connections (suppose it is NFS4) in parallel
> > are opened between client and server, using a 2x bonded interface well
> > requires at least two.  That combined with the DB approach suggested by
> > Jonathan to control the activity of the rsync streams would be my best
> > guess.
> > If you have many small files, the overhead might still kill you. Tarring
> > them up into larger aggregates for transfer would help a lot, but then
> you
> > must be sure they won't change or you need to implement your own version
> > control for that class of files.
> >
> > Mit freundlichen Gr??en / Kind regards
> >
> > Dr. Uwe Falke
> > IT Specialist
> > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> > Services
> > +49 175 575 2877 Mobile
> > Rathausstr. 7, 09111 Chemnitz, Germany
> > uwefalke at de.ibm.com
> >
> > IBM Services
> >
> > IBM Data Privacy Statement
> >
> > IBM Deutschland Business & Technology Services GmbH
> > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> > Sitz der Gesellschaft: Ehningen
> > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >
> >
> >
> >
> > From:   Andi Christiansen <andi at christiansen.xxx>
> > To:     "gpfsug-discuss at spectrumscale.org"
> > <gpfsug-discuss at spectrumscale.org>
> > Date:   16/11/2020 20:44
> > Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from
> > Isilon to Scale over    NFS?
> > Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> >
> >
> >
> > Hi all,
> >
> > i have got a case where a customer wants 700TB migrated from isilon to
> > Scale and the only way for him is exporting the same directory on NFS
> from
> > two different nodes...
> >
> > as of now we are using multiple rsync processes on different parts of
> > folders within the main directory. this is really slow and will take
> > forever.. right now 14 rsync processes spread across 3 nodes fetching
> from
> > 2..
> >
> > does anyone know of a way to speed it up? right now we see from 1Gbit to
> > 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit
> from
> > scale nodes and 20Gbits from isilon so we should be able to reach just
> > under 20Gbit...
> >
> >
> > if anyone have any ideas they are welcome!
> >
> >
> > Thanks in advance
> > Andi Christiansen _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201117/1fba22bb/attachment.htm>

From andi at christiansen.xxx  Tue Nov 17 12:24:22 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 17 Nov 2020 13:24:22 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <CAHwPatjXpM_2B_FZd8b_ETQXL2AS3Es6oJmq7VVzoxa-8kp48g@mail.gmail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<CAHwPatjXpM_2B_FZd8b_ETQXL2AS3Es6oJmq7VVzoxa-8kp48g@mail.gmail.com>
Message-ID: <1023406427.259407.1605615862969@privateemail.com>

Hi Jan,

We are syncing ACLs, groups, owners and timestamps aswell :)

/Andi Christiansen

>     On 11/17/2020 1:07 PM Jan-Frode Myklebust <janfrode at tanso.net> wrote:
> 
> 
>     Nice to see it working well!
> 
>     But, what about ACLs? Does you rsync pull in all needed metadata, or do you also need to sync ACLs ? Any plans for how to solve that ?
> 
>     On Tue, Nov 17, 2020 at 12:52 PM Andi Christiansen <andi at christiansen.xxx> wrote:
> 
>         > > Hi all,
> > 
> >         thanks for all the information, there was some interesting things amount it..
> > 
> >         I kept on going with rsync and ended up making a file with all top level user directories and splitting them into chunks of 347 per rsync session(total 42000 ish folders). yesterday we had only 14 sessions with 3000 folders in each and that was too much work for one rsync session..
> > 
> >         i divided them out among all GPFS nodes to have them fetch an area each and actually doing that 3 times on each node and that has now boosted the bandwidth usage from 3Gbit to around 16Gbit in total..
> > 
> >         all nodes have been seing doing work above 7Gbit individual which is actually near to what i was expecting without any modifications to the NFS server or TCP tuning..
> > 
> >         CPU is around 30-50% on each server and mostly below or around 30% so it seems like it could have handled abit more sessions..
> > 
> >         Small files are really a killer but with all 96+ sessions we have now its not often all sessions are handling small files at the same time so we have an average of about 10-12Gbit bandwidth usage.
> > 
> >         Thanks all! ill keep you in mind if for some reason we see it slowing down again but for now i think we will try to see if it will go the last mile with a bit more sessions on each :)
> > 
> >         Best Regards
> >         Andi Christiansen
> > 
> >         > On 11/17/2020 9:57 AM Uwe Falke <uwefalke at de.ibm.com mailto:uwefalke at de.ibm.com > wrote:
> >         >
> >         > 
> >         > Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but
> >         > it is over two nodes, so no bonding. But still, I'd expect to open several
> >         > TCP connections in parallel per source-target pair  (like with several
> >         > rsyncs per source node) would bear an advantage (and still I thing NFS
> >         > doesn't do that, but I can be wrong).
> >         > If more nodes have access to the Isilon data they could also participate
> >         > (and don't need NFS exports for that).
> >         >
> >         > Mit freundlichen Gr??en / Kind regards
> >         >
> >         > Dr. Uwe Falke
> >         > IT Specialist
> >         > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> >         > Services
> >         > +49 175 575 2877 Mobile
> >         > Rathausstr. 7, 09111 Chemnitz, Germany
> >         > uwefalke at de.ibm.com mailto:uwefalke at de.ibm.com
> >         >
> >         > IBM Services
> >         >
> >         > IBM Data Privacy Statement
> >         >
> >         > IBM Deutschland Business & Technology Services GmbH
> >         > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> >         > Sitz der Gesellschaft: Ehningen
> >         > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >         >
> >         >
> >         >
> >         > From:   Uwe Falke/Germany/IBM
> >         > To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org mailto:gpfsug-discuss at spectrumscale.org >
> >         > Date:   17/11/2020 09:50
> >         > Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data
> >         > from Isilon to Scale over       NFS?
> >         >
> >         >
> >         > Hi Andi,
> >         >
> >         > what about leaving NFS completeley out and using rsync  (multiple rsyncs
> >         > in parallel, of course) directly between your source and target servers?
> >         > I am not sure how many TCP connections (suppose it is NFS4) in parallel
> >         > are opened between client and server, using a 2x bonded interface well
> >         > requires at least two.  That combined with the DB approach suggested by
> >         > Jonathan to control the activity of the rsync streams would be my best
> >         > guess.
> >         > If you have many small files, the overhead might still kill you. Tarring
> >         > them up into larger aggregates for transfer would help a lot, but then you
> >         > must be sure they won't change or you need to implement your own version
> >         > control for that class of files.
> >         >
> >         > Mit freundlichen Gr??en / Kind regards
> >         >
> >         > Dr. Uwe Falke
> >         > IT Specialist
> >         > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> >         > Services
> >         > +49 175 575 2877 Mobile
> >         > Rathausstr. 7, 09111 Chemnitz, Germany
> >         > uwefalke at de.ibm.com mailto:uwefalke at de.ibm.com
> >         >
> >         > IBM Services
> >         >
> >         > IBM Data Privacy Statement
> >         >
> >         > IBM Deutschland Business & Technology Services GmbH
> >         > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> >         > Sitz der Gesellschaft: Ehningen
> >         > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >         >
> >         >
> >         >
> >         >
> >         > From:   Andi Christiansen <andi at christiansen.xxx>
> >         > To:     "gpfsug-discuss at spectrumscale.org mailto:gpfsug-discuss at spectrumscale.org "
> >         > <gpfsug-discuss at spectrumscale.org mailto:gpfsug-discuss at spectrumscale.org >
> >         > Date:   16/11/2020 20:44
> >         > Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from
> >         > Isilon to Scale over    NFS?
> >         > Sent by:        gpfsug-discuss-bounces at spectrumscale.org mailto:gpfsug-discuss-bounces at spectrumscale.org
> >         >
> >         >
> >         >
> >         > Hi all,
> >         >
> >         > i have got a case where a customer wants 700TB migrated from isilon to
> >         > Scale and the only way for him is exporting the same directory on NFS from
> >         > two different nodes...
> >         >
> >         > as of now we are using multiple rsync processes on different parts of
> >         > folders within the main directory. this is really slow and will take
> >         > forever.. right now 14 rsync processes spread across 3 nodes fetching from
> >         > 2..
> >         >
> >         > does anyone know of a way to speed it up? right now we see from 1Gbit to
> >         > 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from
> >         > scale nodes and 20Gbits from isilon so we should be able to reach just
> >         > under 20Gbit...
> >         >
> >         >
> >         > if anyone have any ideas they are welcome!
> >         >
> >         >
> >         > Thanks in advance
> >         > Andi Christiansen _______________________________________________
> >         > gpfsug-discuss mailing list
> >         > gpfsug-discuss athttp://spectrumscale.org
> >         > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         > _______________________________________________
> >         > gpfsug-discuss mailing list
> >         > gpfsug-discuss athttp://spectrumscale.org
> >         > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >         _______________________________________________
> >         gpfsug-discuss mailing list
> >         gpfsug-discuss athttp://spectrumscale.org
> >         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> >     >     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201117/e8883d92/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Tue Nov 17 13:53:43 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 17 Nov 2020 13:53:43 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <616234716.258600.1605613918767@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
Message-ID: <12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>

On 17/11/2020 11:51, Andi Christiansen wrote:
> Hi all,
> 
> thanks for all the information, there was some interesting things
> amount it..
> 
> I kept on going with rsync and ended up making a file with all top
> level user directories and splitting them into chunks of 347 per
> rsync session(total 42000 ish folders). yesterday we had only 14
> sessions with 3000 folders in each and that was too much work for one
> rsync session..

Unless you use something similar to my DB suggestion it is almost 
inevitable that some of those rsync sessions are going to have issues 
and you will have no way to track it or even know it has happened unless 
you do a single final giant catchup/check rsync.

I should add that a copy of the sqlite DB is cover your backside 
protection when a user pops up claiming that you failed to transfer one 
of their vitally important files six months down the line and the old 
system is turned off and scrapped.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From skylar2 at uw.edu  Tue Nov 17 14:59:43 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Tue, 17 Nov 2020 06:59:43 -0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
Message-ID: <20201117145943.5cxyfpfyrk7udmn4@thargelion>

On Tue, Nov 17, 2020 at 01:53:43PM +0000, Jonathan Buzzard wrote:
> On 17/11/2020 11:51, Andi Christiansen wrote:
> > Hi all,
> > 
> > thanks for all the information, there was some interesting things
> > amount it..
> > 
> > I kept on going with rsync and ended up making a file with all top
> > level user directories and splitting them into chunks of 347 per
> > rsync session(total 42000 ish folders). yesterday we had only 14
> > sessions with 3000 folders in each and that was too much work for one
> > rsync session..
> 
> Unless you use something similar to my DB suggestion it is almost inevitable
> that some of those rsync sessions are going to have issues and you will have
> no way to track it or even know it has happened unless you do a single final
> giant catchup/check rsync.
> 
> I should add that a copy of the sqlite DB is cover your backside protection
> when a user pops up claiming that you failed to transfer one of their
> vitally important files six months down the line and the old system is
> turned off and scrapped.

That's not a bad idea, and I like it more than the method I setup where we
captured the output of find from both sides of the transfer and preserved
it for posterity, but obviously did require a hard-stop date on the source.

Fortunately, we seem committed to GPFS so it might be we never have to do
another bulk transfer outside of the filesystem...

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From S.J.Thompson at bham.ac.uk  Tue Nov 17 15:55:41 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 17 Nov 2020 15:55:41 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <20201117145943.5cxyfpfyrk7udmn4@thargelion>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
	<20201117145943.5cxyfpfyrk7udmn4@thargelion>
Message-ID: <55E3401C-2F59-4B47-A176-CDF7BCACBE2E@bham.ac.uk>


>    Fortunately, we seem committed to GPFS so it might be we never have to do
>    another bulk transfer outside of the filesystem...

Until you want to move a v3 or v4 created file-system to v5 block sizes __

I hopes we won't be doing that sort of thing again...

Simon


From jonathan.buzzard at strath.ac.uk  Tue Nov 17 19:45:29 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 17 Nov 2020 19:45:29 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <55E3401C-2F59-4B47-A176-CDF7BCACBE2E@bham.ac.uk>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
	<20201117145943.5cxyfpfyrk7udmn4@thargelion>
	<55E3401C-2F59-4B47-A176-CDF7BCACBE2E@bham.ac.uk>
Message-ID: <1a1be12b-a4f2-f2b3-4cdf-e34bc5eace24@strath.ac.uk>

On 17/11/2020 15:55, Simon Thompson wrote:
> 
>>     Fortunately, we seem committed to GPFS so it might be we never have to do
>>     another bulk transfer outside of the filesystem...
> 
> Until you want to move a v3 or v4 created file-system to v5 block sizes __

You forget the v2 to v3 for more than two billion files switch. Either 
that or you where not using it back then. Then there was the v3.2 if you 
ever want to mount it on Windows.

> 
> I hopes we won't be doing that sort of thing again...
> 

Yep, going to be recycling my scripts in the coming week for a v4 to v5 
with capacity upgrade on our DSS-G. That basically involves a trashing 
of the file system and a restore from backup.

Going to be doing the your data will be restored based on a metric of 
how many files and how much data you have ploy again :-)

I too hope that will be the last time I have to do anything similar but 
my experience of the last couple of decades says that is likely to be a 
forlorn hope :-(

I speculate that one day the 10,000 file set limit will be lifted, but 
only if you reformat your file system...

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From andi at christiansen.xxx  Tue Nov 17 20:40:39 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 17 Nov 2020 21:40:39 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
Message-ID: <82434297.276248.1605645639435@privateemail.com>

Hi Jonathan,

yes you are correct! but we plan to resync this once or twice every week for the next 3-4months to be sure everything is as it should be.

Right now we are focused on getting them synced up and then we will run scheduled resyncs/checks once or twice a week depending on the data growth :)

Thanks
Andi Christiansen

> On 11/17/2020 2:53 PM Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
>  
> On 17/11/2020 11:51, Andi Christiansen wrote:
> > Hi all,
> > 
> > thanks for all the information, there was some interesting things
> > amount it..
> > 
> > I kept on going with rsync and ended up making a file with all top
> > level user directories and splitting them into chunks of 347 per
> > rsync session(total 42000 ish folders). yesterday we had only 14
> > sessions with 3000 folders in each and that was too much work for one
> > rsync session..
> 
> Unless you use something similar to my DB suggestion it is almost 
> inevitable that some of those rsync sessions are going to have issues 
> and you will have no way to track it or even know it has happened unless 
> you do a single final giant catchup/check rsync.
> 
> I should add that a copy of the sqlite DB is cover your backside 
> protection when a user pops up claiming that you failed to transfer one 
> of their vitally important files six months down the line and the old 
> system is turned off and scrapped.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From chris.schlipalius at pawsey.org.au  Tue Nov 17 23:17:18 2020
From: chris.schlipalius at pawsey.org.au (Chris Schlipalius)
Date: Wed, 18 Nov 2020 07:17:18 +0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
Message-ID: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>

So at my last job we used to rsync data between isilons across campus, and isilon to Windows File Cluster (and back).

I recommend using dry run to generate a list of files and then use this to run with rysnc.

This allows you also to be able to break up the transfer into batches, and check if files have changed before sync (say if your isilon files are not RO.

Also ensure you have a recent version of rsync that preserves extended attributes and check your ACLS.

 
A dry run example:

https://unix.stackexchange.com/a/261372

 
I always felt more comfortable having a list of files before a sync?.

 
Regards,

Chris Schlipalius

 
Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey Supercomputing Centre (CSIRO)

1 Bryce Avenue

Kensington  WA  6151

Australia

 
Tel  +61 8 6436 8815 

Email  chris.schlipalius at pawsey.org.au

Web  www.pawsey.org.au

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201118/c99c2fb1/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Wed Nov 18 11:48:52 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 18 Nov 2020 11:48:52 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
References: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
Message-ID: <7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>

On 17/11/2020 23:17, Chris Schlipalius wrote:
> So at my last job we used to rsync data between isilons across campus, 
> and isilon to Windows File Cluster (and back).
> 
> I recommend using dry run to generate a list of files and then use this 
> to run with rysnc.
> 
> This allows you also to be able to break up the transfer into batches, 
> and check if files have changed before sync (say if your isilon files 
> are not RO.
> 
> Also ensure you have a recent version of rsync that preserves extended 
> attributes and check your ACLS.
> 
> A dry run example:
> 
> https://unix.stackexchange.com/a/261372 
> 
> I always felt more comfortable having a list of files before a sync?.
> 

I would counsel in the strongest possible terms against that approach.

Basically you have to be assured that none of your file names have 
"wacky" characters in them, because handling "wacky" characters in file 
names is exceedingly difficult. I cannot stress how hard it is and the 
above example does not handle all "wacky" characters in file names.

So what do I mean by "wacky" characters. Well remember a file name can 
have just about anything in it on Linux with the exception of '/', and 
users especially when using a GUI, and even more so if they are Mac 
users can and do use what I will call "wacky" characters in their file 
names.

The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too. 
Then there is the use of the wildcard characters, especially '?' but 
also '*'.

Not too difficult to handle you might say. Right now deal with a file 
name with a newline character in it :-) Don't ask me how or why you even 
do that but let me assure you that I have seen them on more than one 
occasion. And now your dry run list is broken...

Not only that if you have a few hundred million files to move a list 
just becomes unwieldy anyway.

One thing I didn't mention is that I would run anything with in a screen 
(or tmux if that is your poison) and turn on logging.

For those interested I am in the process of cleaning up the script a bit 
and will post it somewhere in due course.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From andi at christiansen.xxx  Wed Nov 18 11:54:47 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Wed, 18 Nov 2020 12:54:47 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
References: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
	<7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
Message-ID: <1947408989.293430.1605700487095@privateemail.com>

Hi Jonathan,

i would be very interested in seeing your scripts when they are posted. Let me know where to get them!

Thanks a bunch!
Andi Christiansen

> On 11/18/2020 12:48 PM Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
>  
> On 17/11/2020 23:17, Chris Schlipalius wrote:
> > So at my last job we used to rsync data between isilons across campus, 
> > and isilon to Windows File Cluster (and back).
> > 
> > I recommend using dry run to generate a list of files and then use this 
> > to run with rysnc.
> > 
> > This allows you also to be able to break up the transfer into batches, 
> > and check if files have changed before sync (say if your isilon files 
> > are not RO.
> > 
> > Also ensure you have a recent version of rsync that preserves extended 
> > attributes and check your ACLS.
> > 
> > A dry run example:
> > 
> > https://unix.stackexchange.com/a/261372 
> > 
> > I always felt more comfortable having a list of files before a sync?.
> > 
> 
> I would counsel in the strongest possible terms against that approach.
> 
> Basically you have to be assured that none of your file names have 
> "wacky" characters in them, because handling "wacky" characters in file 
> names is exceedingly difficult. I cannot stress how hard it is and the 
> above example does not handle all "wacky" characters in file names.
> 
> So what do I mean by "wacky" characters. Well remember a file name can 
> have just about anything in it on Linux with the exception of '/', and 
> users especially when using a GUI, and even more so if they are Mac 
> users can and do use what I will call "wacky" characters in their file 
> names.
> 
> The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too. 
> Then there is the use of the wildcard characters, especially '?' but 
> also '*'.
> 
> Not too difficult to handle you might say. Right now deal with a file 
> name with a newline character in it :-) Don't ask me how or why you even 
> do that but let me assure you that I have seen them on more than one 
> occasion. And now your dry run list is broken...
> 
> Not only that if you have a few hundred million files to move a list 
> just becomes unwieldy anyway.
> 
> One thing I didn't mention is that I would run anything with in a screen 
> (or tmux if that is your poison) and turn on logging.
> 
> For those interested I am in the process of cleaning up the script a bit 
> and will post it somewhere in due course.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From cal.sawyer at framestore.com  Wed Nov 18 12:18:57 2020
From: cal.sawyer at framestore.com (Cal Sawyer)
Date: Wed, 18 Nov 2020 12:18:57 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 106, Issue 21
In-Reply-To: <mailman.1.1605700802.2437096.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1605700802.2437096.gpfsug-discuss@spectrumscale.org>
Message-ID: <CAD-C0PF=uzNL15-_hU1pQ-EWVktTGJrKSKBd358_oBvHeOGiGQ@mail.gmail.com>

Hello

Not a Scale user per se (we run a 3rdparty offshoot of Scale).  In a past
life managing Nexenta with OpenSolaris DR storage, I used nc/netcat for
bulk data sync, which is far more efficient than rsync.  With a bit of
planning and analysis of directory structure on the target, nc runs could
be parallelised as well, although not quite in the same way as running
rsync via parallels. Of course, nc has to be available on Isilon but i have
no experience with that platform. The only caveat in using nc is the amount
of change to the target data as copying progresses (is the target datastore
static or still seeing changes?). nc has to be followed with rsync to apply
any changes and/or verify the integrity of the bulk copy.

https://nakkaya.com/2009/04/15/using-netcat-for-file-transfers/

Are your Isilon and Scale systems located in the same network space?

I'd also suggest that if possible, add a quad-port 10GbE (or larger:
25/100GbE) NIC to your servers to gain a wider data path and conduct your
copy operations on those interfaces

regards

[image: Framestore]
Cal Sawyer ? Senior Systems Engineer   London ? New York ? Los Angeles ?
Chicago ? Montr?al ? Mumbai
28 Chancery Lane
London WC2A 1LB
[T] +44 (0)20 7344 8000
W3W: warm.soil.patio


On Wed, 18 Nov 2020 at 12:00, <gpfsug-discuss-request at spectrumscale.org>
wrote:

> Send gpfsug-discuss mailing list submissions to
>         gpfsug-discuss at spectrumscale.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
>         gpfsug-discuss-request at spectrumscale.org
>
> You can reach the person managing the list at
>         gpfsug-discuss-owner at spectrumscale.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>    1. Re: Migrate/syncronize data from Isilon to Scale  over NFS?
>       (Chris Schlipalius)
>    2. Re: Migrate/syncronize data from Isilon to Scale over NFS?
>       (Jonathan Buzzard)
>    3. Re: Migrate/syncronize data from Isilon to Scale over NFS?
>       (Andi Christiansen)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 18 Nov 2020 07:17:18 +0800
> From: Chris Schlipalius <chris.schlipalius at pawsey.org.au>
> To: <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to
>         Scale   over NFS?
> Message-ID: <578BE691-DEE4-43AC-97D2-546AC406E14A at pawsey.org.au>
> Content-Type: text/plain; charset="utf-8"
>
> So at my last job we used to rsync data between isilons across campus, and
> isilon to Windows File Cluster (and back).
>
> I recommend using dry run to generate a list of files and then use this to
> run with rysnc.
>
> This allows you also to be able to break up the transfer into batches, and
> check if files have changed before sync (say if your isilon files are not
> RO.
>
> Also ensure you have a recent version of rsync that preserves extended
> attributes and check your ACLS.
>
>
>
> A dry run example:
>
> https://unix.stackexchange.com/a/261372
>
>
>
> I always felt more comfortable having a list of files before a sync?.
>
>
>
>
>
>
>
> Regards,
>
> Chris Schlipalius
>
>
>
> Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey
> Supercomputing Centre (CSIRO)
>
> 1 Bryce Avenue
>
> Kensington  WA  6151
>
> Australia
>
>
>
> Tel  +61 8 6436 8815
>
> Email  chris.schlipalius at pawsey.org.au
>
> Web  www.pawsey.org.au
>
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201118/c99c2fb1/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Wed, 18 Nov 2020 11:48:52 +0000
> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to
>         Scale over NFS?
> Message-ID: <7983810e-f51c-8cf7-a750-5c3285870bd4 at strath.ac.uk>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> On 17/11/2020 23:17, Chris Schlipalius wrote:
> > So at my last job we used to rsync data between isilons across campus,
> > and isilon to Windows File Cluster (and back).
> >
> > I recommend using dry run to generate a list of files and then use this
> > to run with rysnc.
> >
> > This allows you also to be able to break up the transfer into batches,
> > and check if files have changed before sync (say if your isilon files
> > are not RO.
> >
> > Also ensure you have a recent version of rsync that preserves extended
> > attributes and check your ACLS.
> >
> > A dry run example:
> >
> > https://unix.stackexchange.com/a/261372
> >
> > I always felt more comfortable having a list of files before a sync?.
> >
>
> I would counsel in the strongest possible terms against that approach.
>
> Basically you have to be assured that none of your file names have
> "wacky" characters in them, because handling "wacky" characters in file
> names is exceedingly difficult. I cannot stress how hard it is and the
> above example does not handle all "wacky" characters in file names.
>
> So what do I mean by "wacky" characters. Well remember a file name can
> have just about anything in it on Linux with the exception of '/', and
> users especially when using a GUI, and even more so if they are Mac
> users can and do use what I will call "wacky" characters in their file
> names.
>
> The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too.
> Then there is the use of the wildcard characters, especially '?' but
> also '*'.
>
> Not too difficult to handle you might say. Right now deal with a file
> name with a newline character in it :-) Don't ask me how or why you even
> do that but let me assure you that I have seen them on more than one
> occasion. And now your dry run list is broken...
>
> Not only that if you have a few hundred million files to move a list
> just becomes unwieldy anyway.
>
> One thing I didn't mention is that I would run anything with in a screen
> (or tmux if that is your poison) and turn on logging.
>
> For those interested I am in the process of cleaning up the script a bit
> and will post it somewhere in due course.
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 18 Nov 2020 12:54:47 +0100 (CET)
> From: Andi Christiansen <andi at christiansen.xxx>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
>         Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> Subject: Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to
>         Scale over NFS?
> Message-ID: <1947408989.293430.1605700487095 at privateemail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hi Jonathan,
>
> i would be very interested in seeing your scripts when they are posted.
> Let me know where to get them!
>
> Thanks a bunch!
> Andi Christiansen
>
> > On 11/18/2020 12:48 PM Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> wrote:
> >
> >
> > On 17/11/2020 23:17, Chris Schlipalius wrote:
> > > So at my last job we used to rsync data between isilons across campus,
> > > and isilon to Windows File Cluster (and back).
> > >
> > > I recommend using dry run to generate a list of files and then use
> this
> > > to run with rysnc.
> > >
> > > This allows you also to be able to break up the transfer into batches,
> > > and check if files have changed before sync (say if your isilon files
> > > are not RO.
> > >
> > > Also ensure you have a recent version of rsync that preserves extended
> > > attributes and check your ACLS.
> > >
> > > A dry run example:
> > >
> > > https://unix.stackexchange.com/a/261372
> > >
> > > I always felt more comfortable having a list of files before a sync?.
> > >
> >
> > I would counsel in the strongest possible terms against that approach.
> >
> > Basically you have to be assured that none of your file names have
> > "wacky" characters in them, because handling "wacky" characters in file
> > names is exceedingly difficult. I cannot stress how hard it is and the
> > above example does not handle all "wacky" characters in file names.
> >
> > So what do I mean by "wacky" characters. Well remember a file name can
> > have just about anything in it on Linux with the exception of '/', and
> > users especially when using a GUI, and even more so if they are Mac
> > users can and do use what I will call "wacky" characters in their file
> > names.
> >
> > The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too.
> > Then there is the use of the wildcard characters, especially '?' but
> > also '*'.
> >
> > Not too difficult to handle you might say. Right now deal with a file
> > name with a newline character in it :-) Don't ask me how or why you even
> > do that but let me assure you that I have seen them on more than one
> > occasion. And now your dry run list is broken...
> >
> > Not only that if you have a few hundred million files to move a list
> > just becomes unwieldy anyway.
> >
> > One thing I didn't mention is that I would run anything with in a screen
> > (or tmux if that is your poison) and turn on logging.
> >
> > For those interested I am in the process of cleaning up the script a bit
> > and will post it somewhere in due course.
> >
> >
> > JAB.
> >
> > --
> > Jonathan A. Buzzard                         Tel: +44141-5483420
> > HPC System Administrator, ARCHIE-WeSt.
> > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 106, Issue 21
> ***********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201118/2fc06f24/attachment.htm>

From valdis.kletnieks at vt.edu  Wed Nov 18 23:05:40 2020
From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)
Date: Wed, 18 Nov 2020 18:05:40 -0500
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
In-Reply-To: <7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
References: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
	<7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
Message-ID: <39863.1605740740@turing-police>

On Wed, 18 Nov 2020 11:48:52 +0000, Jonathan Buzzard said:

> So what do I mean by "wacky" characters. Well remember a file name can
> have just about anything in it on Linux with the exception of '/', and

You want to see some fireworks?  At least at one time, it was possible to use
a file system debugger that's all too trusting of hexadecimal input and create
a directory entry of '../'. Let's just say that fs/namei.c was also far too trusting,
and fsck was more than happy to make *different* errors than the kernel was....

> The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too.
> Then there is the use of the wildcard characters, especially '?' but
> also '*'.

Don't forget ESC, CR, LF, backticks, forward ticks, semicolons, and pretty much
anything else that will give a shell indigestion. SQL isn't the only thing prone to
injection attacks.. :)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201118/3a3689b5/attachment.sig>

From chris.schlipalius at pawsey.org.au  Wed Nov 18 23:57:26 2020
From: chris.schlipalius at pawsey.org.au (Chris Schlipalius)
Date: Thu, 19 Nov 2020 07:57:26 +0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
In-Reply-To: <B81D7B2F-512E-4394-8D48-D6289B550432@pawsey.org.au>
References: <B81D7B2F-512E-4394-8D48-D6289B550432@pawsey.org.au>
Message-ID: <6288DF78-A9DF-4BE9-B166-4478EF8C2A20@pawsey.org.au>

?  I would counsel in the strongest possible terms against that approach.

?  Basically you have to be assured that none of your file names have "wacky" characters in them, because handling "wacky" characters in file

?  names is exceedingly difficult. I cannot stress how hard it is and the above example does not handle all "wacky" characters in file names.

 
Well that?s indeed another kettle of fish if you have irregular/special naming of files, no I didn?t cover that and if you have millions of files, yes a list would be unwieldy, then I would be tarring up dirs. before moving? and then untarring on GPFS ?or breaking up the list into sets or sub lists. 

If you have these wacky types of file names well there are fixes as in the rsync manpages? yes not easy but possible..

 
Ie

 
1.       -s, --protect-args

 
2.       As per usual you can escape the spaces, or substitute for spaces. rsync -avuz user at server1.com:"${remote_path// /\\ }" .

 
3.       Single quote the file name and path inside double quotes.

 
?  One thing I didn't mention is that I would run anything with in a screen (or tmux if that is your poison) and turn on logging.

 
Absolutely agree?

 
?  For those interested I am in the process of cleaning up the script a bit and will post it somewhere in due course.

?  JAB.

 
Would be interesting to see?.

 
I?ve also had success on GPFS with DCP and possibly this would be another option 

 
Regards,

Chris Schlipalius

 
Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey Supercomputing Centre (CSIRO)

1 Bryce Avenue

Kensington  WA  6151

Australia

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/e47e1eb4/attachment.htm>

From marc.caubet at psi.ch  Thu Nov 19 15:34:39 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 15:34:39 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
	filesystem
Message-ID: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>

Hi,


I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.

I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.


Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?

Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.


Thanks a lot,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/32027b1f/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Thu Nov 19 15:49:30 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 15:49:30 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
Message-ID: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>

On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> Hi,
> 
> 
> I have a filesystem holding many projects (i.e., mounted under 
> /projects), each project is managed with filesets.
> 
> I have a new big project which should be placed on a separate filesystem 
> (blocksize, replication policy, etc. will be different, and subprojects 
> of it will be managed with filesets). Ideally, this filesystem should be 
> mounted in /projects/newproject.
> 
> 
> Technically, mounting a filesystem on top of an existing filesystem 
> should be possible, but, is this discouraged for any reason? How GPFS 
> would behave with that and is there a technical reason for avoiding this 
> setup?
> 
> Another alternative would be independent mount point + symlink, but I 
> really would prefer to avoid symlinks.

This has all the hallmarks of either a Windows admin or a newbie 
Linux/Unix admin :-)

Simply put /projects is mounted on top of whatever file system is 
providing the root file system in the first place LOL.

Linux/Unix and/or GPFS does not give a monkeys about mounting another 
file system *ANYWHERE* in it period because there is no other way of 
doing it.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From spectrumscale at kiranghag.com  Thu Nov 19 16:40:47 2020
From: spectrumscale at kiranghag.com (KG)
Date: Thu, 19 Nov 2020 22:10:47 +0530
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
	filesystem
In-Reply-To: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
Message-ID: <CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>

You can also set mount priority on filesystems so that gpfs can try to
mount them in order...parent first

On Thu, Nov 19, 2020, 21:19 Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:

> On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> > Hi,
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under
> > /projects), each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects
> > of it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem
> > should be possible, but, is this discouraged for any reason? How GPFS
> > would behave with that and is there a technical reason for avoiding this
> > setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
>
> This has all the hallmarks of either a Windows admin or a newbie
> Linux/Unix admin :-)
>
> Simply put /projects is mounted on top of whatever file system is
> providing the root file system in the first place LOL.
>
> Linux/Unix and/or GPFS does not give a monkeys about mounting another
> file system *ANYWHERE* in it period because there is no other way of
> doing it.
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/2709ca22/attachment.htm>

From S.J.Thompson at bham.ac.uk  Thu Nov 19 16:42:07 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 19 Nov 2020 16:42:07 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
Message-ID: <2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>

If it is a remote cluster mount from your clients (hopefully!), you might want to look at priority to order mounting of the file-systems. I don?t know what would happen if the overmounted file-system went away, you would likely want to test.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "marc.caubet at psi.ch" <marc.caubet at psi.ch>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 November 2020 at 15:39
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem


Hi,


I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.

I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.


Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?

Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.


Thanks a lot,

Marc
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/21c0ec6e/attachment.htm>

From marc.caubet at psi.ch  Thu Nov 19 16:48:07 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 16:48:07 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>,
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
Message-ID: <c9e651f7f37445dc86fedb56809db871@psi.ch>

Hi Jonathan,


thanks for sharing your opinions. In the sentence "Technically, mounting a filesystem on top of an existing filesystem should be possible" , I guess I was referring to that...

I was concerned about other technical reasons, such like how would this would affect GPFS policies, or how to properly proceed with proper mounting, or any other technical reasons to consider.

For the GPFS policies, I usually applied some of the existing GPFS policies based on directories, but after checking I realized that one can manage via device (never used policies in that way, at least for the simple but necessary use cases I have on the existing filesystems).


Thanks a lot,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
Sent: Thursday, November 19, 2020 4:49:30 PM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> Hi,
>
>
> I have a filesystem holding many projects (i.e., mounted under
> /projects), each project is managed with filesets.
>
> I have a new big project which should be placed on a separate filesystem
> (blocksize, replication policy, etc. will be different, and subprojects
> of it will be managed with filesets). Ideally, this filesystem should be
> mounted in /projects/newproject.
>
>
> Technically, mounting a filesystem on top of an existing filesystem
> should be possible, but, is this discouraged for any reason? How GPFS
> would behave with that and is there a technical reason for avoiding this
> setup?
>
> Another alternative would be independent mount point + symlink, but I
> really would prefer to avoid symlinks.

This has all the hallmarks of either a Windows admin or a newbie
Linux/Unix admin :-)

Simply put /projects is mounted on top of whatever file system is
providing the root file system in the first place LOL.

Linux/Unix and/or GPFS does not give a monkeys about mounting another
file system *ANYWHERE* in it period because there is no other way of
doing it.

JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/ea5d2595/attachment.htm>

From marc.caubet at psi.ch  Thu Nov 19 17:01:37 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 17:01:37 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>,
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
Message-ID: <22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>

Hi Simon,


that's a very good point, thanks a lot :) I have it remotely mounted on a client cluster, so I will consider priorities when mounting the filesystems with remote cluster mount. That's very useful.

Also, as far as I saw, same approach can be also applied to local mounts (via mmchfs) during daemon startup with the same option --mount-priority.


Thanks a lot for the hints, these are very useful. I'll test that.


Cheers,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <S.J.Thompson at bham.ac.uk>
Sent: Thursday, November 19, 2020 5:42:07 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

If it is a remote cluster mount from your clients (hopefully!), you might want to look at priority to order mounting of the file-systems. I don?t know what would happen if the overmounted file-system went away, you would likely want to test.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "marc.caubet at psi.ch" <marc.caubet at psi.ch>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 November 2020 at 15:39
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem


Hi,


I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.

I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.


Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?

Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.


Thanks a lot,

Marc
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/f8d78653/attachment.htm>

From janfrode at tanso.net  Thu Nov 19 17:34:05 2020
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Thu, 19 Nov 2020 18:34:05 +0100
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
	filesystem
In-Reply-To: <22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
Message-ID: <CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>

I would not mount a GPFS filesystem within a GPFS filesystem. Technically
it should work, but I?d expect it to cause surprises if ever the lower
filesystem experienced problems. Alone, a filesystem might recover
automatically by remounting. But if there?s another filesystem mounted
within, I expect it will be a problem..

Much better to use symlinks.


  -jf

tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
marc.caubet at psi.ch>:

> Hi Simon,
>
>
> that's a very good point, thanks a lot :) I have it remotely mounted on a
> client cluster, so I will consider priorities when mounting the filesystems
> with remote cluster mount. That's very useful.
>
> Also, as far as I saw, same approach can be also applied to local mounts
> (via mmchfs) during daemon startup with the same option --mount-priority.
>
>
> Thanks a lot for the hints, these are very useful. I'll test that.
>
>
> Cheers,
>
> Marc
> _________________________________________________________
> Paul Scherrer Institut
> High Performance Computing & Emerging Technologies
> Marc Caubet Serrabou
> Building/Room: OHSA/014
> Forschungsstrasse, 111
> 5232 Villigen PSI
> Switzerland
>
> Telephone: +41 56 310 46 67
> E-Mail: marc.caubet at psi.ch
> ------------------------------
> *From:* gpfsug-discuss-bounces at spectrumscale.org <
> gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> S.J.Thompson at bham.ac.uk>
> *Sent:* Thursday, November 19, 2020 5:42:07 PM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> filesystem
>
>
> If it is a remote cluster mount from your clients (hopefully!), you might
> want to look at priority to order mounting of the file-systems. I don?t
> know what would happen if the overmounted file-system went away, you would
> likely want to test.
>
>
>
> Simon
>
>
>
> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> marc.caubet at psi.ch" <marc.caubet at psi.ch>
> *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> gpfsug-discuss at spectrumscale.org>
> *Date: *Thursday, 19 November 2020 at 15:39
> *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> >
> *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> filesystem
>
>
>
> Hi,
>
>
>
> I have a filesystem holding many projects (i.e., mounted under /projects),
> each project is managed with filesets.
>
> I have a new big project which should be placed on a separate filesystem
> (blocksize, replication policy, etc. will be different, and subprojects of
> it will be managed with filesets). Ideally, this filesystem should be
> mounted in /projects/newproject.
>
>
>
> Technically, mounting a filesystem on top of an existing filesystem should
> be possible, but, is this discouraged for any reason? How GPFS would behave
> with that and is there a technical reason for avoiding this setup?
>
> Another alternative would be independent mount point + symlink, but I
> really would prefer to avoid symlinks.
>
>
>
> Thanks a lot,
>
> Marc
>
> _________________________________________________________
> Paul Scherrer Institut
> High Performance Computing & Emerging Technologies
> Marc Caubet Serrabou
> Building/Room: OHSA/014
>
> Forschungsstrasse, 111
>
> 5232 Villigen PSI
> Switzerland
>
> Telephone: +41 56 310 46 67
> E-Mail: marc.caubet at psi.ch
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/d2841588/attachment.htm>

From skylar2 at uw.edu  Thu Nov 19 17:38:07 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Thu, 19 Nov 2020 09:38:07 -0800
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
Message-ID: <20201119173807.kormirvbweqs3un6@thargelion>

Agreed, not sure how the GPFS tools would react. An alternative to symlinks
would be bind mounts, if for some reason a tool doesn't behave properly
with a symlink in the path.

On Thu, Nov 19, 2020 at 06:34:05PM +0100, Jan-Frode Myklebust wrote:
> I would not mount a GPFS filesystem within a GPFS filesystem. Technically
> it should work, but I???d expect it to cause surprises if ever the lower
> filesystem experienced problems. Alone, a filesystem might recover
> automatically by remounting. But if there???s another filesystem mounted
> within, I expect it will be a problem..
> 
> Much better to use symlinks.
> 
> 
> 
>   -jf
> 
> tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
> marc.caubet at psi.ch>:
> 
> > Hi Simon,
> >
> >
> > that's a very good point, thanks a lot :) I have it remotely mounted on a
> > client cluster, so I will consider priorities when mounting the filesystems
> > with remote cluster mount. That's very useful.
> >
> > Also, as far as I saw, same approach can be also applied to local mounts
> > (via mmchfs) during daemon startup with the same option --mount-priority.
> >
> >
> > Thanks a lot for the hints, these are very useful. I'll test that.
> >
> >
> > Cheers,
> >
> > Marc
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> > Forschungsstrasse, 111
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > ------------------------------
> > *From:* gpfsug-discuss-bounces at spectrumscale.org <
> > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> > S.J.Thompson at bham.ac.uk>
> > *Sent:* Thursday, November 19, 2020 5:42:07 PM
> > *To:* gpfsug main discussion list
> > *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> > If it is a remote cluster mount from your clients (hopefully!), you might
> > want to look at priority to order mounting of the file-systems. I don???t
> > know what would happen if the overmounted file-system went away, you would
> > likely want to test.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> > marc.caubet at psi.ch" <marc.caubet at psi.ch>
> > *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> > gpfsug-discuss at spectrumscale.org>
> > *Date: *Thursday, 19 November 2020 at 15:39
> > *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> > >
> > *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under /projects),
> > each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects of
> > it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem should
> > be possible, but, is this discouraged for any reason? How GPFS would behave
> > with that and is there a technical reason for avoiding this setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
> >
> >
> >
> > Thanks a lot,
> >
> > Marc
> >
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> >
> > Forschungsstrasse, 111
> >
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From jonathan.buzzard at strath.ac.uk  Thu Nov 19 18:08:13 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 18:08:13 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
Message-ID: <acb8a49b-1537-b35a-8ff0-2ae63a3122a6@strath.ac.uk>

On 19/11/2020 17:34, Jan-Frode Myklebust wrote:
> 
> I would not mount a GPFS filesystem within a GPFS filesystem. 
> Technically it should work, but I?d expect it to cause surprises if ever 
> the lower filesystem experienced problems. Alone, a filesystem might 
> recover automatically by remounting. But if there?s another filesystem 
> mounted within, I expect it will be a problem..
> 
> Much better to use symlinks.
> 

Think about that for a minute...


I guess if you are worried about /projects going away (which would 
suggest something really bad has happened anyway) would be to mount the 
GPFS file system that is currently holding /projects somewhere else and 
then bind mount everything into /projects

At this point I would note that bind mounts are much better than 
symlinks which suck for this sort of application.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Thu Nov 19 18:12:03 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 18:12:03 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
	<CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
Message-ID: <2f789d09-3704-2d41-ef2a-953de178dce2@strath.ac.uk>

On 19/11/2020 16:40, KG wrote:
> You can also set mount priority on filesystems so that gpfs can try to 
> mount them in order...parent first
> 

One of the things that systemd brings to the table

https://github.com/systemd/systemd/commit/3519d230c8bafe834b2dac26ace49fcfba139823


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From marc.caubet at psi.ch  Thu Nov 19 18:13:08 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 18:13:08 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <20201119173807.kormirvbweqs3un6@thargelion>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>,
	<20201119173807.kormirvbweqs3un6@thargelion>
Message-ID: <0963457f2dfd418eabf8e1681ef2f801@psi.ch>


Hi all,


thanks a lot for your comments. Agreed, I better avoid it for now. I was concerned about how GPFS would behave in such case. For production I will take the safe route, but, just out of curiosity, I'll give it a try on a couple of test filesystems.


Thanks a lot for your help, it was very helpful,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Skylar Thompson <skylar2 at uw.edu>
Sent: Thursday, November 19, 2020 6:38:07 PM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

Agreed, not sure how the GPFS tools would react. An alternative to symlinks
would be bind mounts, if for some reason a tool doesn't behave properly
with a symlink in the path.

On Thu, Nov 19, 2020 at 06:34:05PM +0100, Jan-Frode Myklebust wrote:
> I would not mount a GPFS filesystem within a GPFS filesystem. Technically
> it should work, but I???d expect it to cause surprises if ever the lower
> filesystem experienced problems. Alone, a filesystem might recover
> automatically by remounting. But if there???s another filesystem mounted
> within, I expect it will be a problem..
>
> Much better to use symlinks.
>
>
>
>   -jf
>
> tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
> marc.caubet at psi.ch>:
>
> > Hi Simon,
> >
> >
> > that's a very good point, thanks a lot :) I have it remotely mounted on a
> > client cluster, so I will consider priorities when mounting the filesystems
> > with remote cluster mount. That's very useful.
> >
> > Also, as far as I saw, same approach can be also applied to local mounts
> > (via mmchfs) during daemon startup with the same option --mount-priority.
> >
> >
> > Thanks a lot for the hints, these are very useful. I'll test that.
> >
> >
> > Cheers,
> >
> > Marc
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> > Forschungsstrasse, 111
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > ------------------------------
> > *From:* gpfsug-discuss-bounces at spectrumscale.org <
> > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> > S.J.Thompson at bham.ac.uk>
> > *Sent:* Thursday, November 19, 2020 5:42:07 PM
> > *To:* gpfsug main discussion list
> > *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> > If it is a remote cluster mount from your clients (hopefully!), you might
> > want to look at priority to order mounting of the file-systems. I don???t
> > know what would happen if the overmounted file-system went away, you would
> > likely want to test.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> > marc.caubet at psi.ch" <marc.caubet at psi.ch>
> > *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> > gpfsug-discuss at spectrumscale.org>
> > *Date: *Thursday, 19 November 2020 at 15:39
> > *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> > >
> > *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under /projects),
> > each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects of
> > it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem should
> > be possible, but, is this discouraged for any reason? How GPFS would behave
> > with that and is there a technical reason for avoiding this setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
> >
> >
> >
> > Thanks a lot,
> >
> > Marc
> >
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> >
> > Forschungsstrasse, 111
> >
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/b2117811/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Thu Nov 19 18:32:39 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 18:32:39 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0963457f2dfd418eabf8e1681ef2f801@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
	<20201119173807.kormirvbweqs3un6@thargelion>
	<0963457f2dfd418eabf8e1681ef2f801@psi.ch>
Message-ID: <5b8edf06-a4ab-a39e-5a02-86fd7565b90a@strath.ac.uk>

On 19/11/2020 18:13, Caubet Serrabou Marc (PSI) wrote:
> 
> Hi all,
> 
> 
> thanks a lot for your comments. Agreed, I?better avoid it for now. I was 
> concerned about how GPFS would behave in such case. For production I 
> will take the safe route, but, just out of curiosity, I'll give it a try 
> on a couple of test filesystems.
> 

Don't use symlinks there is a range of applications that will break and 
you will confuse the hell out of your users as the fact you are not 
under /projects/new but /random/new is not hidden.

Besides which if the symlink goes away because /projects goes away then 
it is all a bust anyway.

If you are worried about /projects going away then the best plan is to 
mount the GPFS file systems somewhere else and then bind mount the 
directories into /projects on all the machines where they are mounted.

GPFS is quite happy with this. We bind mount /gpfs/users into /users and 
/gpfs/software into /opt/software by default. In the past I have bind 
mounted random paths for every user (hundred plus) into /home


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From novosirj at rutgers.edu  Thu Nov 19 18:34:09 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Thu, 19 Nov 2020 18:34:09 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
Message-ID: <A8A32368-1B23-4EA6-B91C-C4D9ACB646AC@rutgers.edu>

> On Nov 19, 2020, at 10:49 AM, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
>> Hi,
>> I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.
>> I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.
>> Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?
>> Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.
> 
> This has all the hallmarks of either a Windows admin or a newbie Linux/Unix admin :-)
> 
> Simply put /projects is mounted on top of whatever file system is providing the root file system in the first place LOL.
> 
> Linux/Unix and/or GPFS does not give a monkeys about mounting another file system *ANYWHERE* in it period because there is no other way of doing it.

Some others have said, but I disagree. It wasn?t that long ago that GPFS acted really screwy with systemd because it did something in a way other than Linux expected. As it is now, their devices are not /dev/whatever or server:/wherever like just about every other filesystem type. Not unreasonable to believe it would ?act funny? compared to other FS. 

I like GPFS a lot, but this is not one of my favorite characteristics of it.

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

From UWEFALKE at de.ibm.com  Thu Nov 19 19:18:41 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 19 Nov 2020 20:18:41 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Mounting_filesystem_on_top_of_an_exist?=
 =?utf-8?q?ing=09filesystem?=
In-Reply-To: <CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch><0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
	<CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
Message-ID: <OF751853B8.9C63EAA6-ONC1258625.0069CBE9-C1258625.006A14A0@notes.na.collabserv.com>

Just the risk your parent system dies which will block your access to the 
child file system mounted on a mount point within. 
If that is not  bothering , go ahead mount stacks . As for the symling 
though : it  is also gone if the parent dies :-). 

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   KG <spectrumscale at kiranghag.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   19/11/2020 17:41
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Mounting filesystem on top 
of an existing  filesystem
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


You can also set mount priority on filesystems so that gpfs can try to 
mount them in order...parent first

On Thu, Nov 19, 2020, 21:19 Jonathan Buzzard <
jonathan.buzzard at strath.ac.uk> wrote:
On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> Hi,
> 
> 
> I have a filesystem holding many projects (i.e., mounted under 
> /projects), each project is managed with filesets.
> 
> I have a new big project which should be placed on a separate filesystem 

> (blocksize, replication policy, etc. will be different, and subprojects 
> of it will be managed with filesets). Ideally, this filesystem should be 

> mounted in /projects/newproject.
> 
> 
> Technically, mounting a filesystem on top of an existing filesystem 
> should be possible, but, is this discouraged for any reason? How GPFS 
> would behave with that and is there a technical reason for avoiding this 

> setup?
> 
> Another alternative would be independent mount point + symlink, but I 
> really would prefer to avoid symlinks.

This has all the hallmarks of either a Windows admin or a newbie 
Linux/Unix admin :-)

Simply put /projects is mounted on top of whatever file system is 
providing the root file system in the first place LOL.

Linux/Unix and/or GPFS does not give a monkeys about mounting another 
file system *ANYWHERE* in it period because there is no other way of 
doing it.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From S.J.Thompson at bham.ac.uk  Thu Nov 19 19:37:52 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 19 Nov 2020 19:37:52 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0963457f2dfd418eabf8e1681ef2f801@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
	<20201119173807.kormirvbweqs3un6@thargelion>
	<0963457f2dfd418eabf8e1681ef2f801@psi.ch>
Message-ID: <738D41AC-6A07-453E-A2D1-C1882BE52EDC@bham.ac.uk>

My understanding was that this was perfectly acceptable in a GPFS system. i.e. mounting parts of file-systems in others. It has been suggested to us as a way of using different vendor GPFS systems (e.g. an ESS with someone elses) as a way of working round the licensing rules about ESS and anything else, but still giving a single user ?name space?. We didn?t go that route, and of course I might have misunderstood what was being suggested.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "marc.caubet at psi.ch" <marc.caubet at psi.ch>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 November 2020 at 18:13
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem


Hi all,


thanks a lot for your comments. Agreed, I better avoid it for now. I was concerned about how GPFS would behave in such case. For production I will take the safe route, but, just out of curiosity, I'll give it a try on a couple of test filesystems.


Thanks a lot for your help, it was very helpful,

Marc
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Skylar Thompson <skylar2 at uw.edu>
Sent: Thursday, November 19, 2020 6:38:07 PM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

Agreed, not sure how the GPFS tools would react. An alternative to symlinks
would be bind mounts, if for some reason a tool doesn't behave properly
with a symlink in the path.

On Thu, Nov 19, 2020 at 06:34:05PM +0100, Jan-Frode Myklebust wrote:
> I would not mount a GPFS filesystem within a GPFS filesystem. Technically
> it should work, but I???d expect it to cause surprises if ever the lower
> filesystem experienced problems. Alone, a filesystem might recover
> automatically by remounting. But if there???s another filesystem mounted
> within, I expect it will be a problem..
>
> Much better to use symlinks.
>
>
>
>   -jf
>
> tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
> marc.caubet at psi.ch>:
>
> > Hi Simon,
> >
> >
> > that's a very good point, thanks a lot :) I have it remotely mounted on a
> > client cluster, so I will consider priorities when mounting the filesystems
> > with remote cluster mount. That's very useful.
> >
> > Also, as far as I saw, same approach can be also applied to local mounts
> > (via mmchfs) during daemon startup with the same option --mount-priority.
> >
> >
> > Thanks a lot for the hints, these are very useful. I'll test that.
> >
> >
> > Cheers,
> >
> > Marc
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> > Forschungsstrasse, 111
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > ------------------------------
> > *From:* gpfsug-discuss-bounces at spectrumscale.org <
> > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> > S.J.Thompson at bham.ac.uk>
> > *Sent:* Thursday, November 19, 2020 5:42:07 PM
> > *To:* gpfsug main discussion list
> > *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> > If it is a remote cluster mount from your clients (hopefully!), you might
> > want to look at priority to order mounting of the file-systems. I don???t
> > know what would happen if the overmounted file-system went away, you would
> > likely want to test.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> > marc.caubet at psi.ch" <marc.caubet at psi.ch>
> > *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> > gpfsug-discuss at spectrumscale.org>
> > *Date: *Thursday, 19 November 2020 at 15:39
> > *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> > >
> > *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under /projects),
> > each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects of
> > it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem should
> > be possible, but, is this discouraged for any reason? How GPFS would behave
> > with that and is there a technical reason for avoiding this setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
> >
> >
> >
> > Thanks a lot,
> >
> > Marc
> >
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> >
> > Forschungsstrasse, 111
> >
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/b648e4d1/attachment.htm>

From Kamil.Czauz at Squarepoint-Capital.com  Fri Nov 20 19:13:41 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Fri, 20 Nov 2020 19:13:41 +0000
Subject: [gpfsug-discuss] Poor client performance with high
	cpu	usage	of	mmfsd process
In-Reply-To: <OF2441E9DD.746C6053-ONC1258622.00456CFA-C1258622.004B9E2D@notes.na.collabserv.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com><BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com><OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain><OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
	<BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OF2441E9DD.746C6053-ONC1258622.00456CFA-C1258622.004B9E2D@notes.na.collabserv.com>
Message-ID: <BL0PR07MB38910770F6FD0D1D7E4626AEACFF0@BL0PR07MB3891.namprd07.prod.outlook.com>

Here is the output of waiters on 2 hosts that were having the issue today:

HOST 1
 [2020-11-20 09:07:53 root at nyzls149m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===
Waiting 0.0035 sec since 09:08:07, monitored, thread 135497 FileBlockReadFetchHandlerThread: on ThCond 0x7F615C152468 (MsgRecordCondvar), reason 'RPC wait' for NSD I/O completion on node 10.64.44.180 <c0n105>
Waiting 0.0036 sec since 09:08:07, monitored, thread 139228 PrefetchWorkerThread: on ThCond 0x7F627000D5D8 (MsgRecordCondvar), reason 'RPC wait' for NSD I/O completion on node 10.64.44.181 <c0n106>

[2020-11-20 09:08:07 root at nyzls149m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===

HOST 2
[2020-11-20 09:08:49 root at nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===
Waiting 0.0034 sec since 09:08:50, monitored, thread 345318 SharedHashTabFetchHandlerThread: on ThCond 0x7F049C001F08 (MsgRecordCondvar), reason 'RPC wait' for NSD I/O completion on node 10.64.44.133 <c1n2>

[2020-11-20 09:08:50 root at nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===

[2020-11-20 09:08:52 root at nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===


You can see the waiters go from 0 to 1-2 , but they are hardly blocking.

Yes there are separate pools for metadata for all of the filesystems here.


I did another trace today when the problem was happening - this time I was able to get a longer trace using the following command:
/usr/lpp/mmfs/bin/mmtracectl --start --trace=io --trace-file-size=512M --tracedev-write-mode=blocking --tracedev-buffer-size=64M -N nyzls149m


This is what the trsum output looks like:

Elapsed trace time:                                   62.412092000 seconds
Elapsed trace time from first VFS call to last:       62.412091999
Time idle between VFS calls:                           0.002913000 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  readpage                   0.003487000         9      387.444
  rdwr                       0.273721000       183     1495.743
  read_inode2                0.007304000       325       22.474
  follow_link                0.013952000        58      240.552
  pagein                     0.025974000        66      393.545
  getattr                    0.002792000        26      107.385
  revalidate                 0.009406000      2172        4.331
  create                    66.194479000         3 22064826.333
  open                       1.725505000        88    19608.011
  unlink                    18.685099000         1 18685099.000
  setattr                    0.011627000        14      830.500
  lookup                  2379.215514000       502  4739473.135
  delete_inode               0.015553000       328       47.418
  rename                    98.099073000         5 19619814.600
  release                    0.050574000        89      568.247
  permission                 0.007454000        73      102.110
  getxattr                   0.002346000        32       73.312
  statfs                     0.000081000         6       13.500
  mmap                       0.049809000        18     2767.167
  removexattr                0.000827000        14       59.071
  llseek                     0.000441000        47        9.383
  readdir                    0.002667000        34       78.441
Ops      4093 Secs      62.409178999  Ops/Sec       65.583

MaxFilesToCache is set to 12000 :
[common]
maxFilesToCache 12000


I only see gpfs_i_lookup in the tracefile, no gpfs_v_lookups
#  grep gpfs_i_lookup trcrpt.2020-11-20_09.20.38.283986.nyzls149m |wc -l
1097

They mostly look like this -

  62.346560 238895 TRACE_VNODE: gpfs_i_lookup exit: new inode 0xFFFF922178971A40 iNum 21980113 (0x14F63D1) cnP 0xFFFF922178971C88 retP 0x0 code 0 rc 0
  62.346955 238895 TRACE_VNODE: gpfs_i_lookup enter: diP 0xFFFF91A8A4991E00 dentryP 0xFFFF92C545A93500 name '20170323.txt' d_flags 0x80 d_count 1 unhashed 1
  62.367701 218442 TRACE_VNODE: gpfs_i_lookup exit: new inode 0xFFFF922071300000 iNum 29629892 (0x1C41DC4) cnP 0xFFFF922071300248 retP 0x0 code 0 rc 0
  62.367734 218444 TRACE_VNODE: gpfs_i_lookup enter: diP 0xFFFF9193CF457800 dentryP 0xFFFF9229527A89C0 name 'node.py' d_flags 0x80 d_count 1 unhashed 1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Monday, November 16, 2020 8:46 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi,
while the other nodes can well block the local one, as Frederick suggests,  there should at least be something visible locally waiting for these other nodes.
Looking at all waiters might be a good thing, but this case looks strange in other ways. Mind statement there are almost no local waiters and none of them gets older than 10 ms.

I am no developer nor do I have the code, so don't expect too much.  Can you tell what lookups you see (check in the trcrpt file, could be like gpfs_i_lookup or gpfs_v_lookup)?
Lookups are metadata ops, do you have a separate pool for your metadata?
How is that pool set up (doen to the physical block devices)?
Your trcsum down revealed 36 lookups, each one on avg taking >30ms. That is a lot (albeit the respective waiters won't show up at first glance as suspicious ...).
So, which waiters did you see  (hope you saved them, if not, do it next time).

What are the node you see this on and the whole cluster used for? What is the MaxFilesToCache setting (for that node and for others)? what HW is that, how big are your nodes (memory,CPU)?
To check the unreasonably short trace capture time: how large are the trcrpt files you obtain?


Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 14:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance
with high cpu   usage   of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

Regarding your previous message - waiters were coming / going with just
1-2 waiters when I ran the mmdiag command, with very low wait times (<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and this was the header result:

Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 cycles Measured cycle count update rate to be 2600001271 per second <---- using this value OS reported cycle count update rate as 2599999000 per second Trace milestones:
kernel trace enabled Fri Nov 13 08:20:01.800558000 2020 (TOD 1605273601.800558, cycles 20807897445779444) daemon trace enabled Fri Nov 13 08:20:01.910017000 2020 (TOD 1605273601.910017, cycles 20807897730372442) all streams included Fri Nov 13 08:20:26.423085049 2020 (TOD 1605273626.423085, cycles 20807961464381068) <---- useful part of trace extends from here trace quiesced Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, cycles 20807965037900696) <---- to here Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled with lookup commands / large lookup times. Can you help debug what those long lookup operations mean?

Unfinished operations:

27967 ***************** pagein ************** 1.362382116
27967 ***************** readpage ************** 1.362381516
139130 1.362448448 ********* Unfinished IO: buffer/disk 3002F670000
20:107498951168^\archive_data_16
104686 1.362022068 ********* Unfinished IO: buffer/disk 50011878000
1:47169618944^\archive_data_1
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFE
341710 1.362423815 ********* Unfinished IO: buffer/disk 20022218000
19:107498951680^\archive_data_15
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 18:3452986648^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFF
139150 1.361122006 ********* Unfinished IO: buffer/disk 50012018000
2:47169622016^\archive_data_2
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\00000000FFFFFFFF
95782 1.361112791 ********* Unfinished IO: buffer/disk 40016300000
20:107498950656^\archive_data_16
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\00000000FFFFFFFF
271076 1.361579585 ********* Unfinished IO: buffer/disk 20023DB8000
4:47169606656^\archive_data_4
341676 1.362018599 ********* Unfinished IO: buffer/disk 40038140000
5:47169614336^\archive_data_5
139150 1.361131599 MSG FSnd: nsdMsgReadExt msg_id 2930654492 Sduration
13292.382 + us
341676 1.362027104 MSG FSnd: nsdMsgReadExt msg_id 2930654495 Sduration
12396.877 + us
95782 1.361124739 MSG FSnd: nsdMsgReadExt msg_id 2930654491 Sduration
13299.242 + us
271076 1.361587653 MSG FSnd: nsdMsgReadExt msg_id 2930654493 Sduration
12836.328 + us
92182 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
341710 1.362429643 MSG FSnd: nsdMsgReadExt msg_id 2930654497 Sduration
11994.338 + us
341662 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
139130 1.362458376 MSG FSnd: nsdMsgReadExt msg_id 2930654498 Sduration
11965.605 + us
104686 1.362028772 MSG FSnd: nsdMsgReadExt msg_id 2930654496 Sduration
12395.209 + us
412373 0.775676657 MSG FRep: nsdMsgReadExt msg_id 304915249 Rduration
598747.324 us Rlen 262144 Hduration 598752.112 + us
341770 0.589739579 MSG FRep: nsdMsgReadExt msg_id 338079050 Rduration
784684.402 us Rlen 4 Hduration 784692.651 + us
143315 0.536252844 MSG FRep: nsdMsgReadExt msg_id 631945522 Rduration
838171.137 us Rlen 233472 Hduration 838174.299 + us
341878 0.134331812 MSG FRep: nsdMsgReadExt msg_id 338079023 Rduration
1240092.169 us Rlen 262144 Hduration 1240094.403 + us
175478 0.587353287 MSG FRep: nsdMsgReadExt msg_id 338079047 Rduration
787070.694 us Rlen 262144 Hduration 787073.990 + us
139558 0.633517347 MSG FRep: nsdMsgReadExt msg_id 631945538 Rduration
740906.634 us Rlen 102400 Hduration 740910.172 + us
143308 0.958832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration
415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time: 1.374423981 seconds
Elapsed trace time from first VFS call to last: 1.374423980 Time idle between VFS calls: 0.001603738 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs readpage 1.151660085 1874 614.546 rdwr 0.431456904 581 742.611
read_inode2 0.001180648 934 1.264
follow_link 0.000029502 7 4.215
getattr 0.000048413 9 5.379
revalidate 0.000007080 67 0.106
pagein 1.149699537 1877 612.520
create 0.007664829 9 851.648
open 0.001032657 19 54.350
unlink 0.002563726 14 183.123
delete_inode 0.000764598 826 0.926
lookup 0.312847947 953 328.277
setattr 0.020651226 824 25.062
permission 0.000015018 1 15.018
rename 0.000529023 4 132.256
release 0.001613800 22 73.355
getxattr 0.000030494 6 5.082
mmap 0.000054767 1 54.767
llseek 0.000001130 4 0.283
readdir 0.000033947 2 16.973
removexattr 0.002119736 820 2.585

User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
42625 0.000000138 0.000031017 0.44% 99.56% 3
42378 0.000586959 0.011596801 4.82% 95.18% 32
42627 0.000000272 0.000013421 1.99% 98.01% 2
42641 0.003284590 0.012593594 20.69% 79.31% 35
42628 0.001522335 0.000002748 99.82% 0.18% 2
25464 0.003462795 0.500281914 0.69% 99.31% 12
301420 0.000016711 0.052848218 0.03% 99.97% 38
95103 0.000000544 0.000000000 100.00% 0.00% 1
145858 0.000000659 0.000794896 0.08% 99.92% 2
42221 0.000011484 0.000039445 22.55% 77.45% 5
371718 0.000000707 0.001805425 0.04% 99.96% 2
95109 0.000000880 0.008998763 0.01% 99.99% 2
95337 0.000010330 0.503057866 0.00% 100.00% 8
42700 0.002442175 0.012504429 16.34% 83.66% 35
189680 0.003466450 0.500128627 0.69% 99.31% 9
42681 0.006685396 0.000391575 94.47% 5.53% 16
42702 0.000048203 0.000000500 98.97% 1.03% 2
42703 0.000033280 0.140102087 0.02% 99.98% 9
224423 0.000000195 0.000000000 100.00% 0.00% 1
42706 0.000541098 0.000014713 97.35% 2.65% 3
106275 0.000000456 0.000000000 100.00% 0.00% 1
42721 0.000372857 0.000000000 100.00% 0.00% 1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi Kamil,
in my mail just a few minutes ago I'd overlooked that the buffer size in your trace was indeed 128M (I suppose the trace file is adapting that size if not set in particular). That is very strange, even under high load, the trace should then capture some longer time than 10 secs, and , most of all, it should contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when everything is normal and check that out - you should see many records, the trcsum.awk should list just a small portion of unfinished ops at the end, ... If that is fine, then the tracing itself is affected by your crritical condition (never experienced that before - rather GPFS grinds to a halt than the trace is abandoned), and that might well be worth a support ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: Uwe Falke/Germany/IBM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 10:21
Subject: Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process


Hi, Kamil,
looks your tracefile setting has been too low:
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here means you effectively captured a period of about 5ms only ... you can't see much from that.

I'd assumed the default trace file size would be sufficient here but it doesn't seem to.
try running with something like
mmtracectl --start --trace-file-size=512M --trace=io --tracedev-write-mode=overwrite -N <your_critical_node>.

However, if you say "no major waiter" - how many waiters did you see at any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload.

The very short cyle time of the trace points, OTOH, to high activity, OTOH the trace file setting appears quite low (trace=io doesnt' collect many trace infos, just basic IO stuff).
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 03:33
Subject: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 100.019054 seconds and
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using this value OS reported cycle count update rate as 2599999000 per second Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220) daemon trace enabled Thu Nov
12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles
20701293488095152) all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893 Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771 Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete. This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science. I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier. Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201120/8259daa6/attachment.htm>

From hooft at natlab.research.philips.com  Sat Nov 21 00:37:01 2020
From: hooft at natlab.research.philips.com (Peter van Hooft)
Date: Sat, 21 Nov 2020 01:37:01 +0100
Subject: [gpfsug-discuss] mmchdisk /dev/fs start -a progress
Message-ID: <20201121003701.GA32509@pc67340132.natlab.research.philips.com>


Hello,

Is it possible to find out the progress of the 'mmchdisk /dev/fs start -a'
command when the controlling terminal had been lost?

We can see the task running on the fs manager node with 'mmdiag --commands' with
attributes 'hold PIT/disk waitTime 0'
We are starting to worry the mmchdisk is taking too long, and see continuously waiters like
Waiting 3.1946 sec since 01:28:23, ignored, thread 22092 TSCHDISKCmdThread: on ThCond 0x180267573D0 (SGManagementMgrDataCondvar), reason 'waiting for stripe group to recover'

Thanks for any hints.

Peter van Hooft
Philips Research


From jonathan.buzzard at strath.ac.uk  Sat Nov 21 10:13:42 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 21 Nov 2020 10:13:42 +0000
Subject: [gpfsug-discuss] mmchdisk /dev/fs start -a progress
In-Reply-To: <20201121003701.GA32509@pc67340132.natlab.research.philips.com>
References: <20201121003701.GA32509@pc67340132.natlab.research.philips.com>
Message-ID: <ecf94819-9cd5-7868-4d7f-13e9e43bc87a@strath.ac.uk>

On 21/11/2020 00:37, Peter van Hooft wrote:
> 
> Hello,
> 
> Is it possible to find out the progress of the 'mmchdisk /dev/fs start -a'
> command when the controlling terminal had been lost?
> 

I don't think so. You are lucky it is still running

> We can see the task running on the fs manager node with 'mmdiag --commands' with
> attributes 'hold PIT/disk waitTime 0'
> We are starting to worry the mmchdisk is taking too long, and see continuously waiters like
> Waiting 3.1946 sec since 01:28:23, ignored, thread 22092 TSCHDISKCmdThread: on ThCond 0x180267573D0 (SGManagementMgrDataCondvar), reason 'waiting for stripe group to recover'
> 
> Thanks for any hints.
> 

Not that this is going to help this time, but it is why you should 
*ALWAYS* without exception run these sorts of commands within a 
screen/tmux session so when you loose the connection to the server you 
can just reconnect and pick it up again.

This is introductory system administration 101. No critical or long 
running command should ever be dependant on a remote controlling 
terminal. If you can't run them locally then run them in a screen or 
tmux session.

There are plenty of good howto's for both screen and tmux on the 
internet. Depending on which distribution you use I would note that 
RedHat have very annoyingly and for completely specious reasons removed 
screen from RHEL8 and left tmux. So if you are starting from scratch 
tmux is the one to learn :-(


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From robert.horton at icr.ac.uk  Mon Nov 23 15:06:05 2020
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Mon, 23 Nov 2020 15:06:05 +0000
Subject: [gpfsug-discuss] AFM experiences?
Message-ID: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>

Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

From novosirj at rutgers.edu  Mon Nov 23 15:30:47 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Mon, 23 Nov 2020 15:30:47 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>

We use it similar to how you describe it. We now run 5.0.4.1 on the client side (I mean actual client nodes, not the home or cache clusters). Before that, we had reliability problems (failure to cache libraries of programs that were executing, etc.). The storage clusters in our case are 5.0.3-2.3.

We also got bit by the quotas thing. You have to set them the same on both sides, or you will have problems. It seems a little silly that they are not kept in sync by GPFS, but that?s how it is. If memory serves, the result looked like an AFM failure (queue not being cleared), but it turned out to be that the files just could not be written at the home cluster because the user was over quota there. I also think I?ve seen load average increase due to this sort of thing, but I may be mixing that up with another problem scenario.

We monitor via Nagios which I believe monitors using mmafmctl commands. Really can?t think of a single time, apart from the other day, where the queue backed up. The instance the other day only lasted a few minutes (if you suddenly create many small files, like installing new software, it may not catch up instantly).

--
#BlackLivesMatter
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Nov 23, 2020, at 10:19, Robert Horton <robert.horton at icr.ac.uk> wrote:

?Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

--
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201123/a5efe006/attachment.htm>

From dean.flanders at fmi.ch  Mon Nov 23 17:58:12 2020
From: dean.flanders at fmi.ch (Flanders, Dean)
Date: Mon, 23 Nov 2020 17:58:12 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
	<440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>
Message-ID: <b37dedb470bb426692cdcc9c560d7b09@ex2013mbx2.fmi.ch>

Hello Rob,

We looked at AFM years ago for DR, but after reading the bug reports, we avoided it, and also have had seen a case where it had to be removed from one customer, so we have kept things simple. Now looking again a few years later there are still issues, IBM Spectrum Scale Active File Management (AFM) issues which may result in undetected data corruption<https://www.ibm.com/support/pages/ibm-spectrum-scale-active-file-management-afm-issues-which-may-result-undetected-data-corruption>, and that was just my first google hit. We have kept it simple, and use a parallel rsync process with policy engine and can hit wire speed for copying of millions of small files in order to have isolation between the sites at GB/s. I am not saying it is bad, just that it needs an appropriate risk/reward ratio to implement as it increases overall complexity.

Kind regards,

Dean

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
Sent: Monday, November 23, 2020 4:31 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM experiences?

We use it similar to how you describe it. We now run 5.0.4.1 on the client side (I mean actual client nodes, not the home or cache clusters). Before that, we had reliability problems (failure to cache libraries of programs that were executing, etc.). The storage clusters in our case are 5.0.3-2.3.

We also got bit by the quotas thing. You have to set them the same on both sides, or you will have problems. It seems a little silly that they are not kept in sync by GPFS, but that?s how it is. If memory serves, the result looked like an AFM failure (queue not being cleared), but it turned out to be that the files just could not be written at the home cluster because the user was over quota there. I also think I?ve seen load average increase due to this sort of thing, but I may be mixing that up with another problem scenario.

We monitor via Nagios which I believe monitors using mmafmctl commands. Really can?t think of a single time, apart from the other day, where the queue backed up. The instance the other day only lasted a few minutes (if you suddenly create many small files, like installing new software, it may not catch up instantly).

--
#BlackLivesMatter
____
|| \\UTGERS<file://UTGERS>,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'


On Nov 23, 2020, at 10:19, Robert Horton <robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk>> wrote:
?Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

--
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk<http://www.icr.ac.uk> |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201123/eb7b3f63/attachment.htm>

From abeattie at au1.ibm.com  Mon Nov 23 21:54:39 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Mon, 23 Nov 2020 21:54:39 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <OFEDB69982.350EFD12-ON00258629.00785C1F-1606168479528@notes.na.collabserv.com>


Rob,

Talk to Jake Carroll from the University of Queensland, he has done a
number of presentations at Scale User Groups of UQ?s MeDiCI data fabric
which is based on Spectrum Scale and does very aggressive use of AFM.

Their use of AFM is not only on campus, but to remote Storage clusters
between 30km and 1500km away from their Home cluster.  They have also
tested AFM between Australia, Japan, and USA

Sent from my iPhone

> On 24 Nov 2020, at 01:20, Robert Horton <robert.horton at icr.ac.uk> wrote:
>
> ?Hi all,
>
> We're thinking about deploying AFM and would be interested in hearing
> from anyone who has used it in anger - particularly independent writer.
>
> Our scenario is we have a relatively large but slow (mainly because it
> is stretched over two sites with a 10G link) cluster for long/medium-
> term storage and a smaller but faster cluster for scratch storage in
> our HPC system. What we're thinking of doing is using some/all of the
> scratch capacity as an IW cache of some/all of the main cluster, the
> idea to reduce the need for people to manually move data between the
> two.
>
> It seems to generally work as expected in a small test environment,
> although we have a few concerns:
>
> - Quota management on the home cluster - we need a way of ensuring
> people don't write data to the cache which can't be accomodated on
> home. Probably not insurmountable but needs a bit of thought...
>
> - It seems inodes on the cache only get freed when they are deleted on
> the cache cluster - not if they get deleted from the home cluster or
> when the blocks are evicted from the cache. Does this become an issue
> in time?
>
> If anyone has done anything similar I'd be interested to hear how you
> got on. It would be intresting to know if you created a cache fileset
> for each home fileset or just one for the whole lot, as well as any
> other pearls of wisdom you may have to offer.
>
> Thanks!
> Rob
>
> --
> Robert Horton | Research Data Storage Lead
> The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
> T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
> Twitter @ICR_London
> Facebook: www.facebook.com/theinstituteofcancerresearch
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable
Company Limited by Guarantee, Registered in England under Company No.
534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the addressee only.
If the message is received by anyone other than the addressee, please
return the message to the sender by replying to it and then delete the
message from your computer and network.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201123/0d01cf78/attachment.htm>

From novosirj at rutgers.edu  Mon Nov 23 23:14:08 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Mon, 23 Nov 2020 23:14:08 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <OFEDB69982.350EFD12-ON00258629.00785C1F-1606168479528@notes.na.collabserv.com>
References: <OFEDB69982.350EFD12-ON00258629.00785C1F-1606168479528@notes.na.collabserv.com>
Message-ID: <2C7317A6-B9DF-450A-92A6-AE156396204A@rutgers.edu>

Ours are about 50 and 100 km from the home cluster, but it?s over 100Gb fiber.

> On Nov 23, 2020, at 4:54 PM, Andrew Beattie <abeattie at au1.ibm.com> wrote:
> 
> Rob,
> 
> Talk to Jake Carroll from the University of Queensland, he has done a number of presentations at Scale User Groups of UQ?s MeDiCI data fabric which is based on Spectrum Scale and does very aggressive use of AFM.
> 
> Their use of AFM is not only on campus, but to remote Storage clusters between 30km and 1500km away from their Home cluster. They have also tested AFM between Australia, Japan, and USA
> 
> Sent from my iPhone
> 
> > On 24 Nov 2020, at 01:20, Robert Horton <robert.horton at icr.ac.uk> wrote:
> > 
> > ?Hi all,
> > 
> > We're thinking about deploying AFM and would be interested in hearing
> > from anyone who has used it in anger - particularly independent writer.
> > 
> > Our scenario is we have a relatively large but slow (mainly because it
> > is stretched over two sites with a 10G link) cluster for long/medium-
> > term storage and a smaller but faster cluster for scratch storage in
> > our HPC system. What we're thinking of doing is using some/all of the
> > scratch capacity as an IW cache of some/all of the main cluster, the
> > idea to reduce the need for people to manually move data between the
> > two.
> > 
> > It seems to generally work as expected in a small test environment,
> > although we have a few concerns:
> > 
> > - Quota management on the home cluster - we need a way of ensuring
> > people don't write data to the cache which can't be accomodated on
> > home. Probably not insurmountable but needs a bit of thought...
> > 
> > - It seems inodes on the cache only get freed when they are deleted on
> > the cache cluster - not if they get deleted from the home cluster or
> > when the blocks are evicted from the cache. Does this become an issue
> > in time?
> > 
> > If anyone has done anything similar I'd be interested to hear how you
> > got on. It would be intresting to know if you created a cache fileset
> > for each home fileset or just one for the whole lot, as well as any
> > other pearls of wisdom you may have to offer.
> > 
> > Thanks!
> > Rob
> > 
> > -- 
> > Robert Horton | Research Data Storage Lead
> > The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
> > T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
> > Twitter @ICR_London
> > Facebook: www.facebook.com/theinstituteofcancerresearch
> > 
> > The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
> > 
> > This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'


From vpuvvada at in.ibm.com  Tue Nov 24 02:32:01 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Tue, 24 Nov 2020 08:02:01 +0530
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <OF7C863960.89C105B0-ON6525862A.000D46CF-6525862A.000DEB0B@notes.na.collabserv.com>

>- Quota management on the home cluster - we need a way of ensuring
>people don't write data to the cache which can't be accomodated on
>home. Probably not insurmountable but needs a bit of thought...

You could set same quotas between cache and home clusters. AFM does not 
support replication of filesystem metadata like quotas, fileset 
configuration etc...

>- It seems inodes on the cache only get freed when they are deleted on
>the cache cluster - not if they get deleted from the home cluster or
>when the blocks are evicted from the cache. Does this become an issue
>in time?

AFM periodically revalidates with home cluster. If the files/dirs were 
already deleted at home cluster, AFM moves them to <fileset path>/.ptrash 
directory at cache cluster during the revalidation. These files can be 
removed manually by user or auto eviction process. If the .ptrash 
directory is not cleaned up on time, it might result into quota issues at 
cache cluster.

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/23/2020 08:51 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM experiences?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only.  If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/71e61c21/attachment.htm>

From vpuvvada at in.ibm.com  Tue Nov 24 02:37:18 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Tue, 24 Nov 2020 08:07:18 +0530
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <b37dedb470bb426692cdcc9c560d7b09@ex2013mbx2.fmi.ch>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk><440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>
	<b37dedb470bb426692cdcc9c560d7b09@ex2013mbx2.fmi.ch>
Message-ID: <OFE7699146.DC34213F-ON6525862A.000DF453-6525862A.000E66F1@notes.na.collabserv.com>

Dean,

This is one of the corner case which is associated with sparse files at 
the home cluster. You could try with latest versions of scale, AFM 
indepedent-writer mode have many performance/functional improvements in 
newer releases. 

~Venkat (vpuvvada at in.ibm.com)


From:   "Flanders, Dean" <dean.flanders at fmi.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   11/23/2020 11:44 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] AFM experiences?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello Rob,
 
We looked at AFM years ago for DR, but after reading the bug reports, we 
avoided it, and also have had seen a case where it had to be removed from 
one customer, so we have kept things simple. Now looking again a few years 
later there are still issues, IBM Spectrum Scale Active File Management 
(AFM) issues which may result in undetected data corruption, and that was 
just my first google hit. We have kept it simple, and use a parallel rsync 
process with policy engine and can hit wire speed for copying of millions 
of small files in order to have isolation between the sites at GB/s. I am 
not saying it is bad, just that it needs an appropriate risk/reward ratio 
to implement as it increases overall complexity.
 
Kind regards,
 
Dean
 
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
Sent: Monday, November 23, 2020 4:31 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM experiences?
 
We use it similar to how you describe it. We now run 5.0.4.1 on the client 
side (I mean actual client nodes, not the home or cache clusters). Before 
that, we had reliability problems (failure to cache libraries of programs 
that were executing, etc.). The storage clusters in our case are 
5.0.3-2.3. 
 
We also got bit by the quotas thing. You have to set them the same on both 
sides, or you will have problems. It seems a little silly that they are 
not kept in sync by GPFS, but that?s how it is. If memory serves, the 
result looked like an AFM failure (queue not being cleared), but it turned 
out to be that the files just could not be written at the home cluster 
because the user was over quota there. I also think I?ve seen load average 
increase due to this sort of thing, but I may be mixing that up with 
another problem scenario. 

We monitor via Nagios which I believe monitors using mmafmctl commands. 
Really can?t think of a single time, apart from the other day, where the 
queue backed up. The instance the other day only lasted a few minutes (if 
you suddenly create many small files, like installing new software, it may 
not catch up instantly). 
 
-- 
#BlackLivesMatter
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS 
Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, 
Newark
    `'


On Nov 23, 2020, at 10:19, Robert Horton <robert.horton at icr.ac.uk> wrote:
?Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only.  If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/e89b392c/attachment.htm>

From vpuvvada at in.ibm.com  Tue Nov 24 02:41:21 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Tue, 24 Nov 2020 08:11:21 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>
Message-ID: <OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>

AFM provides near zero downtime for migration.  As of today,  AFM 
migration does not support ACLs or other EAs migration from non scale 
(GPFS) source.

https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1ins_uc_migrationusingafmmigrationenhancements.htm

~Venkat (vpuvvada at in.ibm.com)


From:   "Frederick Stock" <stockf at us.ibm.com>
To:     gpfsug-discuss at spectrumscale.org
Cc:     gpfsug-discuss at spectrumscale.org
Date:   11/17/2020 03:14 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over       NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Have you considered using the AFM feature of Spectrum Scale?  I doubt it 
will provide any speed improvement but it would allow for data to be 
accessed as it was being migrated.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com
 
 
----- Original message -----
From: Andi Christiansen <andi at christiansen.xxx>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon 
to Scale over NFS?
Date: Mon, Nov 16, 2020 2:44 PM
 
Hi all,
 
i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes...
 
as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 
 
does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit...
 
 
if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/b7fd7863/attachment.htm>

From luke.raimbach at googlemail.com  Tue Nov 24 12:16:55 2020
From: luke.raimbach at googlemail.com (Luke Raimbach)
Date: Tue, 24 Nov 2020 12:16:55 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <CAAGb8Nu4wATFNhUnva0r7gw-N3t7zP9GFR-bs+-Nhe4kwNafVA@mail.gmail.com>

Hi Rob,

Some things to think about from experiences a year or so ago...

If you intend to perform any HPC workload (writing / updating / deleting
files) inside a cache, then appropriately specified gateway nodes will be
your friend:

1. When creating, updating or deleting files in the cache, each operation
requires acknowledgement from the gateway handling that particular cache,
before returning ACK to the application. This will add a latency overhead
to the workload - if your storage is IB connected to the compute cluster
and using verbsRdmaSend for example, this will increase your happiness.
Connecting low-spec gateway nodes over 10GbE with the expectation that they
will "drain down" over time was a sore learning experience in the early
days of AFM for me.

2. AFM queues can quickly eat up memory. I think around 350bytes of memory
is consumed for each operation in the AFM queue, so if you have huge file
churn inside a cache then the queue will grow very quickly. If you run out
of memory, the node dies and you enter cache recovery when it comes back up
(or another node takes over). This can end up cycling the node as it tries
to revalidate a cache and keep up with any other queues. Get more memory!

I've not used AFM for a while now and I think the latter enormity has some
mitigation against create / delete cycles (i.e. the create operation is
expunged from the queue instead of two operations being played back to the
home). I expect IBM experts will tell you more about those improvements.
Also, several smaller caches are better than one large one (parallel
execution of queues helps utilise the available bandwidth and you have a
better failover spread if you have multiple gateways, for example).

Independent Writer mode comes with some small danger (user error or
impatience mainly) inasmuch as whoever updates a file last will win; e.g.
home user A writes a file, then cache user B updates the file after reading
it and tells user A the update is complete, when really the gateway queue
is long and the change is waiting to go back home. User A uses the file
expecting the changes are made, then updates it with some results.
Meanwhile the AFM queue drains down and user B's change arrives after user
A has completed their changes. The interim version of the file user B
modified will persist at home and user A's latest changes are lost. Some
careful thought about workflow (or good user training about eventual
consistency) will save some potential misery on this front.

Hope this helps,
Luke


On Mon, 23 Nov 2020 at 15:19, Robert Horton <robert.horton at icr.ac.uk> wrote:

> Hi all,
>
> We're thinking about deploying AFM and would be interested in hearing
> from anyone who has used it in anger - particularly independent writer.
>
> Our scenario is we have a relatively large but slow (mainly because it
> is stretched over two sites with a 10G link) cluster for long/medium-
> term storage and a smaller but faster cluster for scratch storage in
> our HPC system. What we're thinking of doing is using some/all of the
> scratch capacity as an IW cache of some/all of the main cluster, the
> idea to reduce the need for people to manually move data between the
> two.
>
> It seems to generally work as expected in a small test environment,
> although we have a few concerns:
>
> - Quota management on the home cluster - we need a way of ensuring
> people don't write data to the cache which can't be accomodated on
> home. Probably not insurmountable but needs a bit of thought...
>
> - It seems inodes on the cache only get freed when they are deleted on
> the cache cluster - not if they get deleted from the home cluster or
> when the blocks are evicted from the cache. Does this become an issue
> in time?
>
> If anyone has done anything similar I'd be interested to hear how you
> got on. It would be intresting to know if you created a cache fileset
> for each home fileset or just one for the whole lot, as well as any
> other pearls of wisdom you may have to offer.
>
> Thanks!
> Rob
>
> --
> Robert Horton | Research Data Storage Lead
> The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
> T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
> Twitter @ICR_London
> Facebook: www.facebook.com/theinstituteofcancerresearch
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable
> Company Limited by Guarantee, Registered in England under Company No.
> 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the addressee only.  If
> the message is received by anyone other than the addressee, please return
> the message to the sender by replying to it and then delete the message
> from your computer and network.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/0c2dc53e/attachment.htm>

From yeep at robust.my  Tue Nov 24 14:09:34 2020
From: yeep at robust.my (T.A. Yeep)
Date: Tue, 24 Nov 2020 22:09:34 +0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>
	<OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>
Message-ID: <CAFaHHOzDk9rxZJSouJoHOm++Z0HuMKqUPrngKMwXOZknOS8Y+Q@mail.gmail.com>

Hi Venkat,

If ACLs and other EAs migration from non scale is not supported by AFM, is
there any 3rd party tool that could complement that when paired with AFM?

On Tue, Nov 24, 2020 at 10:41 AM Venkateswara R Puvvada <vpuvvada at in.ibm.com>
wrote:

> AFM provides near zero downtime for migration.  As of today,  AFM
> migration does not support ACLs or other EAs migration from non scale
> (GPFS) source.
>
>
> https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1ins_uc_migrationusingafmmigrationenhancements.htm
>
> ~Venkat (vpuvvada at in.ibm.com)
>
>
>
> From:        "Frederick Stock" <stockf at us.ibm.com>
> To:        gpfsug-discuss at spectrumscale.org
> Cc:        gpfsug-discuss at spectrumscale.org
> Date:        11/17/2020 03:14 AM
> Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data
> from Isilon to Scale over        NFS?
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Have you considered using the AFM feature of Spectrum Scale?  I doubt it
> will provide any speed improvement but it would allow for data to be
> accessed as it was being migrated.
>
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> stockf at us.ibm.com
>
>
> ----- Original message -----
> From: Andi Christiansen <andi at christiansen.xxx>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
> Cc:
> Subject: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon
> to Scale over NFS?
> Date: Mon, Nov 16, 2020 2:44 PM
>
> Hi all,
>
> i have got a case where a customer wants 700TB migrated from isilon to
> Scale and the only way for him is exporting the same directory on NFS from
> two different nodes...
>
> as of now we are using multiple rsync processes on different parts of
> folders within the main directory. this is really slow and will take
> forever.. right now 14 rsync processes spread across 3 nodes fetching from
> 2..
>
> does anyone know of a way to speed it up? right now we see from 1Gbit to
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from
> scale nodes and 20Gbits from isilon so we should be able to reach just
> under 20Gbit...
>
>
> if anyone have any ideas they are welcome!
>
>
> Thanks in advance
> Andi Christiansen
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Best regards

*T.A. Yeep*Mobile: +6-016-719 8506 | Tel: +6-03-7628 0526 |
www.robusthpc.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/02bbc9ee/attachment.htm>

From chair at spectrumscale.org  Tue Nov 24 09:39:47 2020
From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair))
Date: Tue, 24 Nov 2020 09:39:47 +0000
Subject: [gpfsug-discuss] SSUG::Digital with CIUK
Message-ID: <>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/d3275165/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: meeting.ics
Type: text/calendar
Size: 2623 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/d3275165/attachment.ics>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 3499622 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/d3275165/attachment.png>

From prasad.surampudi at theatsgroup.com  Tue Nov 24 16:05:19 2020
From: prasad.surampudi at theatsgroup.com (Prasad Surampudi)
Date: Tue, 24 Nov 2020 16:05:19 +0000
Subject: [gpfsug-discuss] mmhealth reports fserrinvalid errors on CNFS
	servers
Message-ID: <MN2PR13MB297645D9A920B7E5E2D16C8F9EFB0@MN2PR13MB2976.namprd13.prod.outlook.com>

We are seeing fserrinvalid error on couple of filesystems in Spectrum Scale cluster. These errors are reported but mmhealth  only couple of nodes  (CNFS servers) in the cluster, but mmhealth on other nodes shows no issues. Any idea what this error means? And why its reported on CNFS servers and not on other nodes? What need to be done to fix this issue?

sudo /usr/lpp/mmfs/bin/mmhealth node show FILESYSTEM -v

Node name:      cnfs05-gpfs

Component      Status              Reasons
-------------------------------------------------------------------
FILESYSTEM     DEGRADED            fserrinvalid(vol)
  argus        HEALTHY             -
  dytech       HEALTHY             -
  enlnt_E      HEALTHY             -
  enlnt_Es     HEALTHY             -
  haaforfs     HEALTHY             -
  haaforfs2    HEALTHY             -
  historical   HEALTHY             -
  prcfs        HEALTHY             -
  qmtfs        HEALTHY             -
  research     HEALTHY             -
  research2    HEALTHY             -
  schon_raw    HEALTHY             -
  uhdb_vol1    HEALTHY             -
  vol          DEGRADED            fserrinvalid(vol)

Event                Parameter      Severity            Event Message
----------------------------------------------------------------------------------------------------------
fserrinvalid         vol            ERROR               FS=vol,ErrNo=1124,Unknown error=0464000000010000000180A108BC000079B4000000000000003400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/ee724fed/attachment.htm>

From NSCHULD at de.ibm.com  Tue Nov 24 16:44:35 2020
From: NSCHULD at de.ibm.com (Norbert Schuld)
Date: Tue, 24 Nov 2020 17:44:35 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?mmhealth_reports_fserrinvalid_errors_o?=
 =?utf-8?q?n_CNFS=09servers?=
In-Reply-To: <MN2PR13MB297645D9A920B7E5E2D16C8F9EFB0@MN2PR13MB2976.namprd13.prod.outlook.com>
References: <MN2PR13MB297645D9A920B7E5E2D16C8F9EFB0@MN2PR13MB2976.namprd13.prod.outlook.com>
Message-ID: <OFCA885747.7355EDC3-ONC125862A.005B0FC2-C125862A.005BF90D@notes.na.collabserv.com>


To get an explanation for any event one can ask the system:

# mmhealth event show fserrinvalid

Event Name:              fserrinvalid

Event ID:                999338

Description:             Unrecognized FSSTRUCT error received. Check
documentation

Cause:                   A filesystem corruption detected

User Action:             Check error message for details and the
mmfs.log.latest log for further details. See the topic Checking and
repairing a file system in the IBM Spectrum Scale documentation:
Administering. Managing file systems. If the file system is severely
damaged, the best course of action is to follow the procedures in section:
Additional information to collect for file system corruption or
MMFS_FSSTRUCT errors
Severity:                ERROR

State:                   DEGRADED

The event is triggered by a callback which may not fire on all nodes, that
is why only a subset of nodes have the information.
Depending on the version of scale the procedure to remove the event varies:
For newer release please use

# mmhealth event resolve
Missing arguments.
Usage:
  mmhealth event resolve {EventName} [Identifier]

For older releases it is described here:
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_fsstruc.htm
mmsysmonc event filesystem fsstruct_fixed <filesystem_name>
<filesystem_name>


Mit freundlichen Gr??en / Kind regards

Norbert Schuld

M925:IBM Spectrum Scale Software Development
                                                                                                                
                                                                                                                
 Phone:            +49-160 70 70 335                   IBM Deutschland Research & Development                   
                                                      GmbH                                                      
                                                                                                                
 Email:            nschuld at de.ibm.com                  Am Weiher 24                                             
                                                                                                                
                                                       65451 Kelsterbach                                        
                                                                                                                
 Knowing is not                                                                                                 
 enough; we must                                                                                                
 apply. Willing is                                                                                              
 not enough; we                                                                                                 
 must do.                                                                                                       
                                                                                                                
                                                                                                                
 IBM Data Privacy                                                                                               
 Statement                                                                                                      
                                                                                                                
 IBM Deutschland                                                                                                
 Research &                                                                                                     
 Development                                                                                                    
 GmbH /                                                                                                         
 Vorsitzender des                                                                                               
 Aufsichtsrats:                                                                                                 
 Gregor Pillen                                                                                                  
 Gesch?ftsf?hrung:                                                                                              
 Dirk Wittkopp                                                                                                  
 Sitz der                                                                                                       
 Gesellschaft:                                                                                                  
 B?blingen /                                                                                                    
 Registergericht:                                                                                               
 Amtsgericht                                                                                                    
 Stuttgart, HRB                                                                                                 
 243294                                                                                                         
                                                                                                                

From:	Prasad Surampudi <prasad.surampudi at theatsgroup.com>
To:	"gpfsug-discuss at spectrumscale.org"
            <gpfsug-discuss at spectrumscale.org>
Date:	24.11.2020 17:05
Subject:	[EXTERNAL] [gpfsug-discuss] mmhealth reports fserrinvalid
            errors on CNFS	servers
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


We are seeing fserrinvalid error on couple of filesystems in Spectrum Scale
cluster. These errors are reported but mmhealth  only couple of nodes
(CNFS servers) in the cluster, but mmhealth on other nodes shows no issues.
Any idea what this error means? And why its reported on CNFS servers and
not on other nodes? What need to be done to fix this issue?

sudo /usr/lpp/mmfs/bin/mmhealth node show FILESYSTEM -v

Node name:      cnfs05-gpfs

Component      Status              Reasons
-------------------------------------------------------------------
FILESYSTEM     DEGRADED            fserrinvalid(vol)
  argus        HEALTHY             -
  dytech       HEALTHY             -
  enlnt_E      HEALTHY             -
  enlnt_Es     HEALTHY             -
  haaforfs     HEALTHY             -
  haaforfs2    HEALTHY             -
  historical   HEALTHY             -
  prcfs        HEALTHY             -
  qmtfs        HEALTHY             -
  research     HEALTHY             -
  research2    HEALTHY             -
  schon_raw    HEALTHY             -
  uhdb_vol1    HEALTHY             -
  vol          DEGRADED            fserrinvalid(vol)

Event                Parameter      Severity            Event Message
----------------------------------------------------------------------------------------------------------
fserrinvalid         vol            ERROR
FS=vol,ErrNo=1124,Unknown
error=0464000000010000000180A108BC000079B4000000000000003400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1D963707.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0002.gif>

From jake.carroll at uq.edu.au  Wed Nov 25 21:29:24 2020
From: jake.carroll at uq.edu.au (Jake Carroll)
Date: Wed, 25 Nov 2020 21:29:24 +0000
Subject: [gpfsug-discuss] IB routers in ESS configuration + 3 different
	subnets - valid config?
Message-ID: <SYYP282MB12486F10A6010E8EC3C19B69D8FA0@SYYP282MB1248.AUSP282.PROD.OUTLOOK.COM>

Hi.

I am just in the process of sanity-checking a potential future configuration.

Let's say I have an ESS 5000 and an ESS 3000 placed on the data centre floor to form the basis of a new scratch array.

Let's then suppose that I have three existing supercomputers in that same location. Each of those supercomputers has a separate IB subnet and their networks are unrelated to each other, IB-wise.

My understanding is that it is valid and possible to use MLNX EDR IB *routers* in order to be able to transport NSD communications back and forth across those separate subnets, back to the ESS (which lives on its own unique subnet). So at this point, I've got four unique subnets - one for the ESS, one for each super. As I understand it, there is an upper limit of *SIX* unique subnets on those EDR IB routers.

As I understand it - for IPoIB transport, I'd also need some "gateway" boxes more or less - essentially some decent servers which I put EDR/HDR cards in as dog legs that act as an IPoIB gateway interface to each subnet.

I appreciate that there is devil in the detail - but what I'm asking is if it is valid to "route" NSD with IB Routers (not switches) this way to separate subnets.

Colleagues at IBM have all said "yeah....should work....we've not done it....but should be fine?"

Colleagues at Mellanox (uhhh...nvidia...) say "Yes, this is valid and does exactly as the IB Router should and there is nothing unusual about this".

If someone has experience doing this or could call out any oddity/weirdness/gotchas, I'd be very appreciative. I'm fairly sure this is all very low risk - but given nobody locally could tell me "Yeah, all certified and valid!" I'd like the wisdom of the wider crowd.

Thank you.

--jc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201125/f088aa8d/attachment.htm>

From vpuvvada at in.ibm.com  Fri Nov 27 11:46:05 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 27 Nov 2020 17:16:05 +0530
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
In-Reply-To: <CAFaHHOzDk9rxZJSouJoHOm++Z0HuMKqUPrngKMwXOZknOS8Y+Q@mail.gmail.com>
References: <1388247256.209171.1605555854969@privateemail.com><OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com><OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>
	<CAFaHHOzDk9rxZJSouJoHOm++Z0HuMKqUPrngKMwXOZknOS8Y+Q@mail.gmail.com>
Message-ID: <OFE214CC39.AED577C8-ON6525862B.0045BE44-6525862D.0040A4A5@notes.na.collabserv.com>

Hi Yeep,

>If ACLs and other EAs migration from non scale is not supported by AFM, 
is there any 3rd party tool that could complement that when paired with 
AFM?

rsync can be used to just fix metadata like ACLs and EAs.  AFM does not 
revalidate the files with source system if rsync changes the ACLs on them. 
So ACLs can only be fixed after or during the cutover.  ACL inheritance 
may be used by setting on ACLs on required parent dirs upfront if this 
option is sufficient, there was an user who migrated to scale using this 
method.

~Venkat (vpuvvada at in.ibm.com)


From:   "T.A. Yeep" <yeep at robust.my>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:     gpfsug-discuss-bounces at spectrumscale.org
Date:   11/24/2020 07:40 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Venkat,

If ACLs and other EAs migration from non scale is not supported by AFM, is 
there any 3rd party tool that could complement that when paired with AFM?

On Tue, Nov 24, 2020 at 10:41 AM Venkateswara R Puvvada <
vpuvvada at in.ibm.com> wrote:
AFM provides near zero downtime for migration.  As of today,  AFM 
migration does not support ACLs or other EAs migration from non scale 
(GPFS) source.

https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1ins_uc_migrationusingafmmigrationenhancements.htm


~Venkat (vpuvvada at in.ibm.com)


From:        "Frederick Stock" <stockf at us.ibm.com>
To:        gpfsug-discuss at spectrumscale.org
Cc:        gpfsug-discuss at spectrumscale.org
Date:        11/17/2020 03:14 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over        NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Have you considered using the AFM feature of Spectrum Scale?  I doubt it 
will provide any speed improvement but it would allow for data to be 
accessed as it was being migrated.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com
 
 
----- Original message -----
From: Andi Christiansen <andi at christiansen.xxx>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon 
to Scale over NFS?
Date: Mon, Nov 16, 2020 2:44 PM
 
Hi all,
 
i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes...
 
as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 
 
does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit...
 
 
if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Best regards 
T.A. Yeep
Mobile: +6-016-719 8506 | Tel: +6-03-7628 0526 | www.robusthpc.com


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201127/aaf80327/attachment.htm>

From carlz at us.ibm.com  Mon Nov 30 13:49:12 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Mon, 30 Nov 2020 13:49:12 +0000
Subject: [gpfsug-discuss] Licensing costs for data lakes (SSUG follow-up)
Message-ID: <C85DC92F-1E02-4763-8E94-FF44DDC0AA6C@us.ibm.com>

I am seeking some help on a topic I know many of you care deeply about: licensing costs

I am trying to gather some more information about a request that has come up a couple of times, pricing for ?data lakes?. I would like to understand better what people are looking for here.

- Is it as simple as ?much steeper discounts for very large deployments?? Or is a ?data lake? something specific, e.g. a large deployment that is not performance/latency sensitive; a storage pool that is [primarily] HDD; a tier that has specific read/write patterns such as moving entire large datasets in or out; or something else? Bear in mind that if we have special licensing for data lakes, we need a rigorous definition so that both you and we know whether your use of that licensing is compliant. Nobody likes ambiguity in licensing!

- Are you expecting pricing to get very flat/discounting to get steep for large deployments? Or a different price tier/structure for ?data lakes? if we can rigorously define what one means? Do you agree or disagree with the proposition that if you keep adding storage hardware/capacity, that the software licensing cost should rise in proportion (even if that proportion is much smaller for a ?data lake? than for a performance tier)?

- Feel free to be creative and imaginative. For example, would you be interested in a low-cost pricing model for storage that is an AFM Home and is _only_ accessed by using AFM to move data in and out of an AFM Cache (probably on the performance tier)? This would be conceptually similar to the way you can now (5.1) use AFM-Object to park data in a cheap object store.

- Also feel free to answer questions I didn?t ask?


If you prefer to discuss this in Slack rather than email, I started a discussion there a little while ago (please thread your comments!): https://ssug-poweraiug.slack.com/archives/CEVVCEE8M/p1605815075188800


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_1545794140]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201130/71cf82fc/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201130/71cf82fc/attachment.png>

From david_johnson at brown.edu  Mon Nov 30 21:41:30 2020
From: david_johnson at brown.edu (David Johnson)
Date: Mon, 30 Nov 2020 16:41:30 -0500
Subject: [gpfsug-discuss] internal details on GPFS inode expansion
Message-ID: <D0E447B2-778A-41D6-8282-9BDCEEF4A583@brown.edu>

When GPFS needs to add inodes to the filesystem, it seems to pre-create about 4 million of them.
Judging by the logs, it seems it only takes a few (13 maybe) seconds to do this.
However we are suspecting that this might only be to request the additional inodes and 
that there is some background activity for some time afterwards.  
Would someone who has knowledge of the actual internals be willing to confirm or deny this,
and if there is background activity, is it on all nodes in the cluster, NSD nodes, "default worker nodes"?

Thanks,
 -- ddj
Dave Johnson
ddj at brown.edu

From madhu.punjabi at in.ibm.com  Mon Nov  2 08:17:23 2020
From: madhu.punjabi at in.ibm.com (Madhu P Punjabi)
Date: Mon, 2 Nov 2020 08:17:23 +0000
Subject: [gpfsug-discuss] [NFS-Ganesha-Support] 'ganesha_mgr
	display_export - client not listed
In-Reply-To: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch>
References: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch>
Message-ID: <OFB3FCC929.2377A4AA-ON00258614.002C5D33-00258614.002D8995@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201102/e249fee6/attachment-0001.htm>

From christian.vieser at 1und1.de  Mon Nov  2 13:44:50 2020
From: christian.vieser at 1und1.de (Christian Vieser)
Date: Mon, 2 Nov 2020 14:44:50 +0100
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <1109480230.484366.1603799162955@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
Message-ID: <1dffa509-1bcc-0d5a-bf79-fa82746dca07@1und1.de>

Hi Andi,

we suffer from the same issue. IBM support told me that Spectrum Scale
5.1 will come with a new release of the underlying Openstack components,
so we still hope that some/most of limitations will vanish then. But I
already know, that the new S3 policies won't be available, only the
"legacy" S3 ACLs.

We also tried MinIO but deemed that it's not "production ready". It's
fine for quickly setting up a S3 service for development, but they
release too often and with breaking changes, and documentation is
lacking all aspects regarding maintenance.

Regards,

Christian

Am 27.10.20 um 12:46 schrieb Andi Christiansen:
> Hi all,
>
> We have over a longer period used the S3 API within spectrum Scale..
> And that has shown that it does not support very many applications
> because of limitations of the API..

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201102/6b1be66e/attachment-0001.htm>

From jmtick at us.ibm.com  Tue Nov  3 00:21:43 2020
From: jmtick at us.ibm.com (Jacob M Tick)
Date: Tue, 3 Nov 2020 00:21:43 +0000
Subject: [gpfsug-discuss] Use cases for file audit logging and clustered
	watch folder
Message-ID: <OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201103/caba8727/attachment-0001.htm>

From S.J.Thompson at bham.ac.uk  Tue Nov  3 17:00:54 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 3 Nov 2020 17:00:54 +0000
Subject: [gpfsug-discuss] SSUG::Digital Scalable multi-node training for AI
 workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
Message-ID: <a48cee803cee419abd8574a0b71dda44@bham.ac.uk>

Apologies, looks like the calendar invite for this week?s SSUG::Digital didn?t get sent!


Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.


>>>&nbsp;Join&nbsp;Here&nbsp;<<<<https://ibm.webex.com/ibm/onstage/g.php?MTID=e896290a1eef7e81ab4b411669138a17e>


This episode will start 15 minutes later as usual.


   *   San Francisco, USA at 08:15 PST

   *   New York, USA at 11:15 EST

   *   London, United Kingdom at 16:15 GMT

   *   Frankfurt, Germany at 17:15 CET

   *   Pune, India at 21:45 IST


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201103/0c3c01a2/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2488 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201103/0c3c01a2/attachment-0001.ics>

From andi at christiansen.xxx  Wed Nov  4 07:14:41 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Wed, 4 Nov 2020 08:14:41 +0100 (CET)
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <1dffa509-1bcc-0d5a-bf79-fa82746dca07@1und1.de>
References: <1109480230.484366.1603799162955@privateemail.com>
	<1dffa509-1bcc-0d5a-bf79-fa82746dca07@1und1.de>
Message-ID: <1512108314.679947.1604474081488@privateemail.com>

Hi Christian,

Thanks for the information! My question also triggered IBM to tell me the same so i think we will stay on S3 with Scale and hoping the same with the new release..

Yes, MinIO is really lacking some good documentation.. but definatly a cool software package that i will keep an eye on in the future...

Best Regards
Andi Christiansen


>     On 11/02/2020 2:44 PM Christian Vieser <christian.vieser at 1und1.de> wrote:
> 
> 
> 
>     Hi Andi,
> 
>     we suffer from the same issue. IBM support told me that Spectrum Scale 5.1 will come with a new release of the underlying Openstack components, so we still hope that some/most of limitations will vanish then. But I already know, that the new S3 policies won't be available, only the "legacy" S3 ACLs.
> 
>     We also tried MinIO but deemed that it's not "production ready". It's fine for quickly setting up a S3 service for development, but they release too often and with breaking changes, and documentation is lacking all aspects regarding maintenance.
> 
>     Regards,
> 
>     Christian
> 
>     Am 27.10.20 um 12:46 schrieb Andi Christiansen:
> 
>         > >         Hi all,
> > 
> >         We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..
> > 
> >     >     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/a492d2e2/attachment-0001.htm>

From joe at excelero.com  Wed Nov  4 12:19:07 2020
From: joe at excelero.com (joe at excelero.com)
Date: Wed, 4 Nov 2020 06:19:07 -0600
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 106, Issue 3
Message-ID: <924bb673-0b2a-420a-8ce2-be24c5e6e4e8@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/fdc15501/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/fdc15501/attachment-0001.bin>

From oluwasijibomi.saula at ndsu.edu  Wed Nov  4 16:05:50 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Wed, 4 Nov 2020 16:05:50 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 106, Issue 3
In-Reply-To: <mailman.1.1604491201.3061627.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1604491201.3061627.gpfsug-discuss@spectrumscale.org>
Message-ID: <PH0PR08MB6598EA67BBF1D74990C2441B98EF0@PH0PR08MB6598.namprd08.prod.outlook.com>

Could someone share the password for the event today? Thanks!


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Wednesday, November 4, 2020 6:00 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 106, Issue 3

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. SSUG::Digital Scalable multi-node training for AI workloads
      on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
      (Simon Thompson)
   2. Re: Alternative to Scale S3 API. (Andi Christiansen)


----------------------------------------------------------------------

Message: 1
Date: Tue, 3 Nov 2020 17:00:54 +0000
From: Simon Thompson <S.J.Thompson at bham.ac.uk>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] SSUG::Digital Scalable multi-node training
        for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum
        Scale
Message-ID: <a48cee803cee419abd8574a0b71dda44 at bham.ac.uk>
Content-Type: text/plain; charset="utf-8"

Apologies, looks like the calendar invite for this week?s SSUG::Digital didn?t get sent!


Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.


>>>&nbsp;Join&nbsp;Here&nbsp;<<<<https://ibm.webex.com/ibm/onstage/g.php?MTID=e896290a1eef7e81ab4b411669138a17e>


This episode will start 15 minutes later as usual.


   *   San Francisco, USA at 08:15 PST

   *   New York, USA at 11:15 EST

   *   London, United Kingdom at 16:15 GMT

   *   Frankfurt, Germany at 17:15 CET

   *   Pune, India at 21:45 IST


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201103/0c3c01a2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2488 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201103/0c3c01a2/attachment-0001.ics>

------------------------------

Message: 2
Date: Wed, 4 Nov 2020 08:14:41 +0100 (CET)
From: Andi Christiansen <andi at christiansen.xxx>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
        Christian Vieser <christian.vieser at 1und1.de>
Subject: Re: [gpfsug-discuss] Alternative to Scale S3 API.
Message-ID: <1512108314.679947.1604474081488 at privateemail.com>
Content-Type: text/plain; charset="utf-8"

Hi Christian,

Thanks for the information! My question also triggered IBM to tell me the same so i think we will stay on S3 with Scale and hoping the same with the new release..

Yes, MinIO is really lacking some good documentation.. but definatly a cool software package that i will keep an eye on in the future...

Best Regards
Andi Christiansen


>     On 11/02/2020 2:44 PM Christian Vieser <christian.vieser at 1und1.de> wrote:
>
>
>
>     Hi Andi,
>
>     we suffer from the same issue. IBM support told me that Spectrum Scale 5.1 will come with a new release of the underlying Openstack components, so we still hope that some/most of limitations will vanish then. But I already know, that the new S3 policies won't be available, only the "legacy" S3 ACLs.
>
>     We also tried MinIO but deemed that it's not "production ready". It's fine for quickly setting up a S3 service for development, but they release too often and with breaking changes, and documentation is lacking all aspects regarding maintenance.
>
>     Regards,
>
>     Christian
>
>     Am 27.10.20 um 12:46 schrieb Andi Christiansen:
>
>         > >         Hi all,
> >
> >         We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..
> >
> >     >     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201104/a492d2e2/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 106, Issue 3
**********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/ef8b79cd/attachment-0001.htm>

From herrmann at sprintmail.com  Sat Nov  7 21:10:36 2020
From: herrmann at sprintmail.com (Ron H)
Date: Sat, 7 Nov 2020 16:10:36 -0500
Subject: [gpfsug-discuss] Use cases for file audit logging and
	clusteredwatch folder
In-Reply-To: <OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>
References: <OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>
Message-ID: <8F771847BDEB4447919D30A16FE48FAB@rone8PC>

Hi Jacob,

Can you point me to a good overview of each of these features?   I know File Audit and Watch is part of the DME Scale edition license, but I can?t seem to find
a good explanation of what these features can offer.

Thanks

Ron


From: Jacob M Tick 
Sent: Monday, November 02, 2020 7:21 PM
To: gpfsug-discuss at spectrumscale.org 
Cc: April Brown 
Subject: [gpfsug-discuss] Use cases for file audit logging and clusteredwatch folder

Hi All, 

I am reaching out on behalf of the Spectrum Scale development team to get some insight on how our customers are using the file audit logging and the clustered watch folder features. If you have it enabled in your test or production environment, could you please elaborate on how and why you are using the feature? Also, knowing how you have the function configured (ie: watching or auditing for certain events, only enabling on certain filesets, ect..) would help us out. Please respond back to April, John (both on CC), and I with any info you are willing to provide. Thanks in advance!

Regards,

Jake Tick
Manager
Spectrum Scale - Scalable Data Interfaces
IBM Systems Group

Email:jmtick at us.ibm.com

IBM


--------------------------------------------------------------------------------
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201107/5b5e649f/attachment-0001.htm>

From jmtick at us.ibm.com  Mon Nov  9 17:31:00 2020
From: jmtick at us.ibm.com (Jacob M Tick)
Date: Mon, 9 Nov 2020 17:31:00 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Use_cases_for_file_audit_logging_and?=
 =?utf-8?q?=09clusteredwatch_folder?=
In-Reply-To: <8F771847BDEB4447919D30A16FE48FAB@rone8PC>
References: <8F771847BDEB4447919D30A16FE48FAB@rone8PC>,
	<OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>
Message-ID: <OF4405685B.5F90D85B-ON0025861B.005F4A68-0025861B.00603915@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201109/6b387bbe/attachment-0001.htm>

From Kamil.Czauz at Squarepoint-Capital.com  Wed Nov 11 22:29:31 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Wed, 11 Nov 2020 22:29:31 +0000
Subject: [gpfsug-discuss] Poor client performance with high cpu usage of
	mmfsd process
Message-ID: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>

We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete.   This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science.  I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier.  Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201111/99c10ebe/attachment-0001.htm>

From UWEFALKE at de.ibm.com  Thu Nov 12 01:56:46 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 12 Nov 2020 02:56:46 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage_of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>

Hi, Kamil, 
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes. 
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client. 

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters 

That might give you a first idea what is taking long and which components 
are involved. 

Also, 
mmdiag --iohist 
shows you the last IOs and some stats (service time, size) for them. 

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example). 
GPFS has a nice tracing which you can configure or just run the default 
trace. 

Running a dedicated (low-level) io trace can be achieved by 
mmtracectl --start --trace=io  --tracedev-write-mode=overwrite -N 
<your_critical_node>
then, when the issue is seen, stop the trace by 
mmtracectl --stop   -N <your_critical_node>

Do not wait  to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output). 


There you should see lines with  FIO which carry the inode of the related 
file after the "tag" keyword.
example: 
0.000745100  25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path. 
you need to build this first if not yet done: 
cd /usr/lpp/mmfs/samples/util ; make
, then run 
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool : 
/usr/lpp/mmfs/samples/debugtools/trsum.awk. 

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit. 

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/11/2020 23:36
Subject:        [EXTERNAL] [gpfsug-discuss] Poor client performance with 
high cpu usage of       mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a 
ls could take a few minutes to complete.   This affects every gpfs mount 
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.
 
The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science.  I often end up killing many processes before finding the 
offending one.
 
My question is really about finding the offending process easier.  Is 
there something similar to iotop or a trace that I can enable that can 
tell me what files/processes and being heavily used by the mmfsd process 
on the client?
 
-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation. 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From luis.bolinches at fi.ibm.com  Thu Nov 12 13:19:05 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Thu, 12 Nov 2020 13:19:05 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage_of=09mmfsd_process?=
In-Reply-To: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
References: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>,
	<BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OF56A60072.6FB30E63-ON0025861E.00491448-0025861E.004928D6@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b3d403a/attachment-0001.htm>

From jyyum at kr.ibm.com  Thu Nov 12 14:10:17 2020
From: jyyum at kr.ibm.com (Jae Yoon Yum)
Date: Thu, 12 Nov 2020 14:10:17 +0000
Subject: [gpfsug-discuss] Question about the Clearing Spectrum Scale GUI
	event
Message-ID: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b2087b7/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163322.png
Type: image/png
Size: 262 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b2087b7/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163323.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b2087b7/attachment-0001.jpg>

From Eric.Wendel at ibm.com  Thu Nov 12 15:43:46 2020
From: Eric.Wendel at ibm.com (Eric Wendel - Eric.Wendel@ibm.com)
Date: Thu, 12 Nov 2020 15:43:46 +0000
Subject: [gpfsug-discuss] Problems reading emails to the mailing list
Message-ID: <31233620a4324240885aed7ad18a729a@ibm.com>

Hi Folks,

As you are no doubt aware, Lotus Notes and its ecosystem is virtually extinct.

For those of us who have moved on to more modern email clients (including an increasing number of IBMERs like me), the email links we receive from SSUG (for example)  'OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4 at notes.na.collabserv.com are useless because they can only be read if you have the Notes client installed.  This is especially problematic for Linux users as the Linux client for Notes is discontinued.

It would be very helpful if the SSUG could move to a modern email platform.

Thanks,

Eric Wendel
eric.wendel at ibm.com  

 
-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of gpfsug-discuss-request at spectrumscale.org
Sent: Thursday, November 12, 2020 8:10 AM
To: gpfsug-discuss at spectrumscale.org
Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 106, Issue 8

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
	gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Poor client performance with high cpu usage of	mmfsd
      process (Luis Bolinches)
   2. Question about the Clearing Spectrum Scale GUI	event
      (Jae Yoon Yum)


----------------------------------------------------------------------

Message: 1
Date: Thu, 12 Nov 2020 13:19:05 +0000
From: "Luis Bolinches" <luis.bolinches at fi.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu
	usage of	mmfsd process
Message-ID:
	<OF56A60072.6FB30E63-ON0025861E.00491448-0025861E.004928D6 at notes.na.collabserv.com>
	
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b3d403a/attachment-0001.html >

------------------------------

Message: 2
Date: Thu, 12 Nov 2020 14:10:17 +0000
From: "Jae Yoon Yum" <jyyum at kr.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Question about the Clearing Spectrum Scale
	GUI	event
Message-ID:
	<OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4 at notes.na.collabserv.com>
	
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b2087b7/attachment.html >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163322.png
Type: image/png
Size: 262 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b2087b7/attachment.png >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163323.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b2087b7/attachment.jpg >

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


End of gpfsug-discuss Digest, Vol 106, Issue 8
**********************************************


From stefan.roth at de.ibm.com  Thu Nov 12 17:13:38 2020
From: stefan.roth at de.ibm.com (Stefan Roth)
Date: Thu, 12 Nov 2020 18:13:38 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Question_about_the_Clearing_Spectrum_S?=
 =?utf-8?q?cale_GUI=09event?=
In-Reply-To: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>
References: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>
Message-ID: <OF81AFB144.75684F13-ONC125861E.005DE5FA-C125861E.005EA21B@notes.na.collabserv.com>

Hello Jay,

as long as those errors are still shown by "mmhealth node show" CLI
command, they will again appear in the GUI.

In the GUI events table you can show an "Event Type" column which is hidden
by default.
Events that have event type "Notice" can be cleared by the "Mark as Read"
action.
Events that have event type "State" can not be cleared by the "Mark as
Read" action. They have to disappear by solving the problem.
If a problem is solved the error should disappear from "mmhealth node show"
and after that it will disappear from the GUI as well.

Mit freundlichen Gr??en / Kind regards

Stefan Roth

Spectrum Scale Developement
                                                                                                                
                                                                                                                
 Phone:            +49 162 4159934                     IBM Deutschland Research & Development                   
                                                      GmbH                                                      
                                                                                                                
 Email:            stefan.roth at de.ibm.com              Am Weiher 24                                             
                                                                                                                
                                                       65451 Kelsterbach                                        
                                                                                                                
                                                                                                                
 IBM Data Privacy                                                                                               
 Statement                                                                                                      
                                                                                                                
 IBM Deutschland                                                                                                
 Research &                                                                                                     
 Development                                                                                                    
 GmbH /                                                                                                         
 Vorsitzender des                                                                                               
 Aufsichtsrats:                                                                                                 
 Gregor Pillen                                                                                                  
 Gesch?ftsf?hrung:                                                                                              
 Dirk Wittkopp                                                                                                  
 Sitz der                                                                                                       
 Gesellschaft:                                                                                                  
 B?blingen /                                                                                                    
 Registergericht:                                                                                               
 Amtsgericht                                                                                                    
 Stuttgart, HRB                                                                                                 
 243294                                                                                                         
                                                                                                                

From:	"Jae Yoon Yum" <jyyum at kr.ibm.com>
To:	gpfsug-discuss at spectrumscale.org
Date:	12.11.2020 15:10
Subject:	[EXTERNAL] [gpfsug-discuss] Question about the Clearing
            Spectrum Scale GUI	event
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi Team,
I hope you all stay safe from COVID 19,

One of my client wants to clear their ?ERROR? events on the Scale GUI.
As you know, there is ?mark as read? for ?warning? messages but there isn?t
for ?ERROR?. (In fact, the ?mark as read? button is exist but it does not
work.)

So I sent him to run this command on cli.
/usr/lpp/mmfs/gui/cli/lshealth --reset

On my test VM, all of the error messages has been cleared when I run the
command?.
But, for the client?s system, client said that  ?All of the error / warning
messages had been appeared again include the one which I had delete by
clicking ?mark as read?.?

Does anyone who has similar experience like this? and How Could I solve
this problem?

Or, Is there any way to clear the event one by one?

* I sent the same message to the Slack 'scale-help' channel.


Thanks.

Jay.


Best Regards,


 JaeYoon(Jay)                              IBM Korea, Three IFC,                            
 Yum                                                                                        
                                                                                            
                                           10 Gukjegeumyung-ro,                             
                                          Yeongdeungpo-gu,                                  
                                                                                            
 IBM Systems                               Seoul, Korea                                     
 Hardware,                                                                                  
 Storage                                                                                    
 Technical Sales                                                                            
                                                                                            
 Mobile :        +82-10-4995-4814          07326                                            
                                                                                            
 e-mail:         jyyum at kr.ibm.com                                                           
                                                                                            

 ? ??? ??? ??, ??? ??, ???? ???? ???? ?????. ???   
 ??? ??? ??? ????, ????? ??? ?? ?????,??IBM ??? ?  
 ?? ??? ????, ????? ???? ????. (If you don't wish to receive   
 e-mail from sender, please send e-mail directly. For IBM e-mail, please click       
 here). ??? ???? ??,??, ?? ??? ???????(??: 02-3781-7800,?  
 ?? mktg at kr.ibm.com )? ?? ?? ???? ?? ???? ? ????.              
                                                                                     

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1E506389.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0005.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0006.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1E764757.gif
Type: image/gif
Size: 262 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0007.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1E982001.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0001.jpg>

From arc at b4restore.com  Thu Nov 12 17:33:01 2020
From: arc at b4restore.com (=?utf-8?B?QW5kaSBOw7hyIENocmlzdGlhbnNlbg==?=)
Date: Thu, 12 Nov 2020 17:33:01 +0000
Subject: [gpfsug-discuss] Question about the Clearing Spectrum Scale
	GUI	event
In-Reply-To: <OF81AFB144.75684F13-ONC125861E.005DE5FA-C125861E.005EA21B@notes.na.collabserv.com>
References: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>
	<OF81AFB144.75684F13-ONC125861E.005DE5FA-C125861E.005EA21B@notes.na.collabserv.com>
Message-ID: <PR3P194MB0570EF4EC3BFFE22A7BA8C59FBE70@PR3P194MB0570.EURP194.PROD.OUTLOOK.COM>

Hi Jay,

First of you need to make sure your system is actually healthy. Events that are not fixed will reappear.

I have had a lot of ?stale? entries happening over the last years and more often than not ?/usr/lpp/mmfs/gui/cli/lshealth ?reset? clears the entries if they are not actual faults..

As Stefan says if the errors/warnings are shown in ?mmhealth node show or mmhealth cluster show? they will reappear as they should. (I have sometimes seen stale entries there aswell)

When I have encountered stale entries which wasn?t cleared with ?lshealth ?reset? I could clear them with ?mmsysmoncontrol restart?.

I think I actually run that command maybe once or twice every month because of stale entries in the GUI og mmhealth itself.. don?t know why they happen but they seem to appear more frequently for me atleast.. I have high hopes for the 5.1.0.0/5.1.0.1 release as I have heard there should be some new things for the GUI as well.. not sure what they are yet though &#128522;

Hope this helps.

Cheers
A. Christiansen

Fra: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> P? vegne af Stefan Roth
Sendt: Thursday, November 12, 2020 6:14 PM
Til: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Emne: Re: [gpfsug-discuss] Question about the Clearing Spectrum Scale GUI event


Hello Jay,

as long as those errors are still shown by "mmhealth node show" CLI command, they will again appear in the GUI.

In the GUI events table you can show an "Event Type" column which is hidden by default.
Events that have event type "Notice" can be cleared by the "Mark as Read" action.
Events that have event type "State" can not be cleared by the "Mark as Read" action. They have to disappear by solving the problem.
If a problem is solved the error should disappear from "mmhealth node show" and after that it will disappear from the GUI as well.

Mit freundlichen Gr??en / Kind regards

Stefan Roth

Spectrum Scale Developement

________________________________


Phone:

+49 162 4159934

IBM Deutschland Research & Development GmbH

[cid:image002.gif at 01D6B922.3FE99E70]

Email:

stefan.roth at de.ibm.com<mailto:stefan.roth at de.ibm.com>

Am Weiher 24


65451 Kelsterbach

________________________________

IBM Data Privacy Statement<https://www.ibm.com/privacy/us/en/>

IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


[cid:image003.gif at 01D6B922.3FE99E70]"Jae Yoon Yum" ---12.11.2020 15:10:35---Hi Team, I hope you all stay safe from COVID 19, One of my client wants to clear their ?ERROR? ev

From: "Jae Yoon Yum" <jyyum at kr.ibm.com<mailto:jyyum at kr.ibm.com>>
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Date: 12.11.2020 15:10
Subject: [EXTERNAL] [gpfsug-discuss] Question about the Clearing Spectrum Scale GUI event
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>

________________________________


Hi Team,
I hope you all stay safe from COVID 19,

One of my client wants to clear their ?ERROR? events on the Scale GUI.
As you know, there is ?mark as read? for ?warning? messages but there isn?t for ?ERROR?. (In fact, the ?mark as read? button is exist but it does not work.)

So I sent him to run this command on cli.
/usr/lpp/mmfs/gui/cli/lshealth --reset

On my test VM, all of the error messages has been cleared when I run the command?.
But, for the client?s system, client said that ?All of the error / warning messages had been appeared again include the one which I had delete by clicking ?mark as read?.?

Does anyone who has similar experience like this? and How Could I solve this problem?

Or, Is there any way to clear the event one by one?

* I sent the same message to the Slack 'scale-help' channel.


Thanks.

Jay.


Best Regards,


JaeYoon(Jay) Yum

IBM Korea, Three IFC,

[cid:image005.jpg at 01D6B922.3FE99E70]


10 Gukjegeumyung-ro, Yeongdeungpo-gu,

IBM Systems Hardware, Storage Technical Sales

Seoul, Korea

Mobile :

+82-10-4995-4814

07326

e-mail:

jyyum at kr.ibm.com<mailto:jyyum at kr.ibm.com>


? ??? ??? ??, ??? ??, ???? ???? ???? ?????. ??? ??? ??? ??? ????, ????? ??? ?? ?????,??IBM ??? ??? ??? ????, ????? ???? ????. (If you don't wish to receive e-mail from sender, please send e-mail directly. For IBM e-mail, please click here). ??? ???? ??,??, ?? ??? ???????(??: 02-3781-7800,??? mktg at kr.ibm.com<mailto:mktg at kr.ibm.com> )? ?? ?? ???? ?? ???? ? ????.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 1851 bytes
Desc: image002.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.gif
Type: image/gif
Size: 105 bytes
Desc: image003.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: image005.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.png
Type: image/png
Size: 166 bytes
Desc: image006.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.png
Type: image/png
Size: 616 bytes
Desc: image007.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0003.png>

From Kamil.Czauz at Squarepoint-Capital.com  Fri Nov 13 02:33:17 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Fri, 13 Nov 2020 02:33:17 +0000
Subject: [gpfsug-discuss] Poor client performance with high cpu usage
	of	mmfsd process
In-Reply-To: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
Message-ID: <BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>

Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the iohist report.  Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening.  I took 2 traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
  buffer size: 134217728
   64 kernel trace streams, indices 0-63 (selected by low bits of processor ID)
  128 daemon trace streams, indices 64-191 (selected by low bits of thread ID)
Interval for calibrating clock rate was 100.019054 seconds and 260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
  kernel trace enabled  Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220)
  daemon trace enabled  Thu Nov 12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 20701293488095152)
  all streams included  Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here
  trace quiesced        Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <----   to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it?  The 'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

   21234 ***************** lookup               ************** 0.165851604
  290020 ***************** lookup               ************** 0.151032241
  302757 ***************** lookup               ************** 0.168723402
  301677 ***************** lookup               ************** 0.070016530
  230983 ***************** lookup               ************** 0.127699082
   21233 ***************** lookup               ************** 0.060357257
  309046 ***************** lookup               ************** 0.157124551
  301643 ***************** lookup               ************** 0.165543982
  304042 ***************** lookup               ************** 0.172513838
  167794 ***************** lookup               ************** 0.056056815
  189680 ***************** lookup               ************** 0.062022237
  362216 ***************** lookup               ************** 0.072063619
  406314 ***************** lookup               ************** 0.114121838
  167776 ***************** lookup               ************** 0.114899642
  303016 ***************** lookup               ************** 0.144491120
  290021 ***************** lookup               ************** 0.142311603
  167762 ***************** lookup               ************** 0.144240366
  248530 ***************** lookup               ************** 0.168728131
       0       0.000000000 *********  Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF

Elapsed trace time:                                    0.182617894 seconds
Elapsed trace time from first VFS call to last:        0.182617893
Time idle between VFS calls:                           0.000006317 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  rdwr                       0.012021696        35      343.477
  read_inode2                0.000100787        43        2.344
  follow_link                0.000050609         8        6.326
  pagein                     0.000097806        10        9.781
  revalidate                 0.000010884       156        0.070
  open                       0.001001824        18       55.657
  lookup                     1.152449696        36    32012.492
  delete_inode               0.000036816        38        0.969
  permission                 0.000080574        14        5.755
  release                    0.000470096        18       26.116
  mmap                       0.000340095         9       37.788
  llseek                     0.000001903         9        0.211


User thread stats: GPFS-time(sec)    Appl-time  GPFS-%  Appl-%     Ops
    221919       0.000000244       0.050064080   0.00% 100.00%       4
    167794       0.000011891       0.000069707  14.57%  85.43%       4
    309046       0.147664569       0.000074663  99.95%   0.05%       9
    349767       0.000000070       0.000000000 100.00%   0.00%       1
    301677       0.017638372       0.048741086  26.57%  73.43%      12
     84407       0.000010448       0.000016977  38.10%  61.90%       3
    406314       0.000002279       0.000122367   1.83%  98.17%       7
     25464       0.043270937       0.000006200  99.99%   0.01%       2
    362216       0.000005617       0.000017498  24.30%  75.70%       2
    379982       0.000000626       0.000000000 100.00%   0.00%       1
    230983       0.123947465       0.000056796  99.95%   0.05%       6
     21233       0.047877661       0.004887113  90.74%   9.26%      17
    302757       0.154486003       0.010695642  93.52%   6.48%      24
    248530       0.000006763       0.000035442  16.02%  83.98%       3
    303016       0.014678039       0.000013098  99.91%   0.09%       2
    301643       0.088025575       0.054036566  61.96%  38.04%      33
      3339       0.000034997       0.178199426   0.02%  99.98%      35
     21234       0.164240073       0.000262711  99.84%   0.16%      39
    167762       0.000011886       0.000041865  22.11%  77.89%       3
    336006       0.000001246       0.100519562   0.00% 100.00%      16
    304042       0.121322325       0.019218406  86.33%  13.67%      33
    301644       0.054325242       0.087715613  38.25%  61.75%      37
    301680       0.000015005       0.020838281   0.07%  99.93%       9
    290020       0.147713357       0.000121422  99.92%   0.08%      19
    290021       0.000476072       0.000085833  84.72%  15.28%      10
     44777       0.040819757       0.000010957  99.97%   0.03%       3
    189680       0.000000044       0.000002376   1.82%  98.18%       1
    241759       0.000000698       0.000000000 100.00%   0.00%       1
    184839       0.000001621       0.150341986   0.00% 100.00%      28
    362220       0.000010818       0.000020949  34.05%  65.95%       2
    104687       0.000000495       0.000000000 100.00%   0.00%       1

# total App-read/write = 45 Average duration = 0.000269322 sec
#  time(sec)  count         %     %ile       read      write  avgBytesR  avgBytesW
0.000500         34  0.755556 0.755556         34          0      32889          0
0.001000         10  0.222222 0.977778         10          0     108136          0
0.004000          1  0.022222 1.000000          1          0          8          0

# max concurrant App-read/write = 2
# conc    count         %     %ile
   1         38  0.844444 0.844444
   2          7  0.155556 1.000000


Capture 2

Unfinished operations:

  335096 ***************** lookup               ************** 0.289127895
  334691 ***************** lookup               ************** 0.225380797
  362246 ***************** lookup               ************** 0.052106493
  334694 ***************** lookup               ************** 0.048567769
  362220 ***************** lookup               ************** 0.054825580
  333972 ***************** lookup               ************** 0.275355791
  406314 ***************** lookup               ************** 0.283219905
  334686 ***************** lookup               ************** 0.285973208
  289606 ***************** lookup               ************** 0.064608288
   21233 ***************** lookup               ************** 0.074923689
  189680 ***************** lookup               ************** 0.089702578
  335100 ***************** lookup               ************** 0.151553955
  334685 ***************** lookup               ************** 0.117808430
  167700 ***************** lookup               ************** 0.119441314
  336813 ***************** lookup               ************** 0.120572137
  334684 ***************** lookup               ************** 0.124718126
   21234 ***************** lookup               ************** 0.131124745
   84407 ***************** lookup               ************** 0.132442945
  334696 ***************** lookup               ************** 0.140938740
  335094 ***************** lookup               ************** 0.201637910
  167735 ***************** lookup               ************** 0.164059859
  334687 ***************** lookup               ************** 0.252930745
  334695 ***************** lookup               ************** 0.278037098
  341818       0.291815990 *********  Unfinished IO: buffer/disk 50000015000 3:439888512^\scratch_metadata_5
  341818       0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 199.688 + us
  100041       0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time:                                    0.292021772 seconds
Elapsed trace time from first VFS call to last:        0.292021771
Time idle between VFS calls:                           0.001436519 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  rdwr                       0.000831801         4      207.950
  read_inode2                0.000082347        31        2.656
  pagein                     0.000033905         3       11.302
  revalidate                 0.000013109       156        0.084
  open                       0.000237969        22       10.817
  lookup                     1.233407280        10   123340.728
  delete_inode               0.000013877        33        0.421
  permission                 0.000046486         8        5.811
  release                    0.000172456        21        8.212
  mmap                       0.000064411         2       32.206
  llseek                     0.000000391         2        0.196
  readdir                    0.000213657        36        5.935


User thread stats: GPFS-time(sec)    Appl-time  GPFS-%  Appl-%     Ops
    335094       0.053506265       0.000170270  99.68%   0.32%      16
    167700       0.000008522       0.000027547  23.63%  76.37%       2
    167776       0.000008293       0.000019462  29.88%  70.12%       2
    334684       0.000023562       0.000160872  12.78%  87.22%       8
    349767       0.000000467       0.250029787   0.00% 100.00%       5
     84407       0.000000230       0.000017947   1.27%  98.73%       2
    334685       0.000028543       0.000094147  23.26%  76.74%       8
    406314       0.221755229       0.000009720 100.00%   0.00%       2
    334694       0.000024913       0.000125229  16.59%  83.41%      10
    335096       0.254359005       0.000240785  99.91%   0.09%      18
    334695       0.000028966       0.000127823  18.47%  81.53%      10
    334686       0.223770082       0.000267271  99.88%   0.12%      24
    334687       0.000031265       0.000132905  19.04%  80.96%       9
    334696       0.000033808       0.000131131  20.50%  79.50%       9
    129075       0.000000102       0.000000000 100.00%   0.00%       1
    341842       0.000000318       0.000000000 100.00%   0.00%       1
    335100       0.059518133       0.000287934  99.52%   0.48%      19
    224423       0.000000471       0.000000000 100.00%   0.00%       1
    336812       0.000042720       0.000193294  18.10%  81.90%      10
     21233       0.000556984       0.000083399  86.98%  13.02%      11
    289606       0.000000088       0.000018043   0.49%  99.51%       2
    362246       0.014440188       0.000046516  99.68%   0.32%       4
     21234       0.000524848       0.000162353  76.37%  23.63%      13
    336813       0.000046426       0.000175666  20.90%  79.10%       9
      3339       0.000011816       0.272396876   0.00% 100.00%      29
    341818       0.000000778       0.000000000 100.00%   0.00%       1
    167735       0.000007866       0.000049468  13.72%  86.28%       3
    175480       0.000000278       0.000000000 100.00%   0.00%       1
    336006       0.000001170       0.250020470   0.00% 100.00%      16
     44777       0.000000367       0.250149757   0.00% 100.00%       6
    189680       0.000002717       0.000006518  29.42%  70.58%       1
    184839       0.000003001       0.250144214   0.00% 100.00%      35
    145858       0.000000687       0.000000000 100.00%   0.00%       1
    333972       0.218656404       0.000043897  99.98%   0.02%       4
    334691       0.187695040       0.000295117  99.84%   0.16%      25

# total App-read/write = 7 Average duration = 0.000123672 sec
#  time(sec)  count         %     %ile       read      write  avgBytesR  avgBytesW
0.000500          7  1.000000 1.000000          7          0       1172          0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io  --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by
mmtracectl --stop   -N <your_critical_node>

Do not wait  to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with  FIO which carry the inode of the related file after the "tag" keyword.
example:
0.000745100  25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date:   11/11/2020 23:36
Subject:        [EXTERNAL] [gpfsug-discuss] Poor client performance with
high cpu usage of       mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a
ls could take a few minutes to complete.   This affects every gpfs mount
on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science.  I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier.  Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201113/20e7414e/attachment-0001.htm>

From UWEFALKE at de.ibm.com  Fri Nov 13 09:21:17 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Fri, 13 Nov 2020 10:21:17 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage=09of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
	<BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.00336328@notes.na.collabserv.com>

Hi, Kamil, 
looks your tracefile setting has been too low: 
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
means you effectively captured a period of about 5ms only ... you can't 
see much from that. 

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to. 
try running with something like 
 mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N <your_critical_node>. 

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload. 

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff). 
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 03:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu usage     of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 100.019054 seconds and 
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using 
this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 
1605232600.114080, cycles 20701293141385220)
daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 
1605232600.247430, cycles 20701293488095152)
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893
Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000 
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771
Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components 
are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default 
trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl 
--start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> 
then, when the issue is seen, stop the trace by
mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related 
file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with
high cpu usage of mmfsd process
Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a
ls could take a few minutes to complete. This affects every gpfs mount
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science. I often end up killing many processes before finding the 
offending one.

My question is really about finding the offending process easier. Is there 
something similar to iotop or a trace that I can enable that can tell me 
what files/processes and being heavily used by the mmfsd process on the 
client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From UWEFALKE at de.ibm.com  Fri Nov 13 09:37:04 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Fri, 13 Nov 2020 10:37:04 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage=09of=09mmfsd_process?=
In-Reply-To: <OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
	<BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain>
Message-ID: <OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>

Hi Kamil, 
in my mail just a few minutes ago  I'd overlooked that the buffer size in 
your trace was indeed 128M (I suppose the trace file is adapting that size 
if not set in particular). That is very strange, even under high load, the 
trace should then capture some longer time than 10 secs, and , most of 
all, it should contain much more activities than just these few you had. 
That is very mysterious. 
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when 
everything is normal and check that out - you should see many records, the 
trcsum.awk should list just a small portion of unfinished ops at the end, 
... If that is fine, then the tracing itself is affected by your crritical 
condition (never experienced that before - rather GPFS grinds to a halt 
than the trace is abandoned), and that might well be worth a support 
ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Uwe Falke/Germany/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 10:21
Subject:        Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client 
performance with high cpu usage of      mmfsd process


Hi, Kamil, 
looks your tracefile setting has been too low: 
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
means you effectively captured a period of about 5ms only ... you can't 
see much from that. 

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to. 
try running with something like 
 mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N <your_critical_node>. 

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload. 

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff). 
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 03:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu usage     of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 100.019054 seconds and 
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using 
this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 
1605232600.114080, cycles 20701293141385220)
daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 
1605232600.247430, cycles 20701293488095152)
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893
Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000 
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771
Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components 
are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default 
trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl 
--start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> 
then, when the issue is seen, stop the trace by
mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related 
file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with
high cpu usage of mmfsd process
Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a
ls could take a few minutes to complete. This affects every gpfs mount
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science. I often end up killing many processes before finding the 
offending one.

My question is really about finding the offending process easier. Is there 
something similar to iotop or a trace that I can enable that can tell me 
what files/processes and being heavily used by the mmfsd process on the 
client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From Kamil.Czauz at Squarepoint-Capital.com  Fri Nov 13 13:31:21 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Fri, 13 Nov 2020 13:31:21 +0000
Subject: [gpfsug-discuss] Poor client performance with high cpu
	usage	of	mmfsd process
In-Reply-To: <OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
	<BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain>
	<OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
Message-ID: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>

Hi Uwe -

Regarding your previous message - waiters were coming / going with just 1-2 waiters when I ran the mmdiag command, with very low wait times (<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and this was the header result:

Overwrite trace parameters:
  buffer size: 134217728
   64 kernel trace streams, indices 0-63 (selected by low bits of processor ID)
  128 daemon trace streams, indices 64-191 (selected by low bits of thread ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 cycles
Measured cycle count update rate to be 2600001271 per second <---- using this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
  kernel trace enabled  Fri Nov 13 08:20:01.800558000 2020 (TOD 1605273601.800558, cycles 20807897445779444)
  daemon trace enabled  Fri Nov 13 08:20:01.910017000 2020 (TOD 1605273601.910017, cycles 20807897730372442)
  all streams included  Fri Nov 13 08:20:26.423085049 2020 (TOD 1605273626.423085, cycles 20807961464381068) <---- useful part of trace extends from here
  trace quiesced        Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, cycles 20807965037900696) <----   to here
Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled with lookup commands  / large lookup times.  Can you help debug what those long lookup operations mean?

Unfinished operations:

   27967 ***************** pagein               ************** 1.362382116
   27967 ***************** readpage             ************** 1.362381516
  139130       1.362448448 *********  Unfinished IO: buffer/disk 3002F670000 20:107498951168^\archive_data_16
  104686       1.362022068 *********  Unfinished IO: buffer/disk 50011878000 1:47169618944^\archive_data_1
       0       0.000000000 *********  Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFE
  341710       1.362423815 *********  Unfinished IO: buffer/disk 20022218000 19:107498951680^\archive_data_15
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 18:3452986648^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFF
  139150       1.361122006 *********  Unfinished IO: buffer/disk 50012018000 2:47169622016^\archive_data_2
       0       0.000000000 *********  Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\00000000FFFFFFFF
   95782       1.361112791 *********  Unfinished IO: buffer/disk 40016300000 20:107498950656^\archive_data_16
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\00000000FFFFFFFF
  271076       1.361579585 *********  Unfinished IO: buffer/disk 20023DB8000 4:47169606656^\archive_data_4
  341676       1.362018599 *********  Unfinished IO: buffer/disk 40038140000 5:47169614336^\archive_data_5
  139150       1.361131599 MSG FSnd: nsdMsgReadExt msg_id 2930654492 Sduration 13292.382 + us
  341676       1.362027104 MSG FSnd: nsdMsgReadExt msg_id 2930654495 Sduration 12396.877 + us
   95782       1.361124739 MSG FSnd: nsdMsgReadExt msg_id 2930654491 Sduration 13299.242 + us
  271076       1.361587653 MSG FSnd: nsdMsgReadExt msg_id 2930654493 Sduration 12836.328 + us
   92182       0.000000000 MSG FSnd:  msg_id 0 Sduration 0.000 + us
  341710       1.362429643 MSG FSnd: nsdMsgReadExt msg_id 2930654497 Sduration 11994.338 + us
  341662       0.000000000 MSG FSnd:  msg_id 0 Sduration 0.000 + us
  139130       1.362458376 MSG FSnd: nsdMsgReadExt msg_id 2930654498 Sduration 11965.605 + us
  104686       1.362028772 MSG FSnd: nsdMsgReadExt msg_id 2930654496 Sduration 12395.209 + us
  412373       0.775676657 MSG FRep: nsdMsgReadExt msg_id 304915249 Rduration 598747.324 us Rlen 262144 Hduration 598752.112 + us
  341770       0.589739579 MSG FRep: nsdMsgReadExt msg_id 338079050 Rduration 784684.402 us Rlen 4 Hduration 784692.651 + us
  143315       0.536252844 MSG FRep: nsdMsgReadExt msg_id 631945522 Rduration 838171.137 us Rlen 233472 Hduration 838174.299 + us
  341878       0.134331812 MSG FRep: nsdMsgReadExt msg_id 338079023 Rduration 1240092.169 us Rlen 262144 Hduration 1240094.403 + us
  175478       0.587353287 MSG FRep: nsdMsgReadExt msg_id 338079047 Rduration 787070.694 us Rlen 262144 Hduration 787073.990 + us
  139558       0.633517347 MSG FRep: nsdMsgReadExt msg_id 631945538 Rduration 740906.634 us Rlen 102400 Hduration 740910.172 + us
  143308       0.958832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration 415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time:                                    1.374423981 seconds
Elapsed trace time from first VFS call to last:        1.374423980
Time idle between VFS calls:                           0.001603738 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  readpage                   1.151660085      1874      614.546
  rdwr                       0.431456904       581      742.611
  read_inode2                0.001180648       934        1.264
  follow_link                0.000029502         7        4.215
  getattr                    0.000048413         9        5.379
  revalidate                 0.000007080        67        0.106
  pagein                     1.149699537      1877      612.520
  create                     0.007664829         9      851.648
  open                       0.001032657        19       54.350
  unlink                     0.002563726        14      183.123
  delete_inode               0.000764598       826        0.926
  lookup                     0.312847947       953      328.277
  setattr                    0.020651226       824       25.062
  permission                 0.000015018         1       15.018
  rename                     0.000529023         4      132.256
  release                    0.001613800        22       73.355
  getxattr                   0.000030494         6        5.082
  mmap                       0.000054767         1       54.767
  llseek                     0.000001130         4        0.283
  readdir                    0.000033947         2       16.973
  removexattr                0.002119736       820        2.585

User thread stats: GPFS-time(sec)    Appl-time  GPFS-%  Appl-%     Ops
     42625       0.000000138       0.000031017   0.44%  99.56%       3
     42378       0.000586959       0.011596801   4.82%  95.18%      32
     42627       0.000000272       0.000013421   1.99%  98.01%       2
     42641       0.003284590       0.012593594  20.69%  79.31%      35
     42628       0.001522335       0.000002748  99.82%   0.18%       2
     25464       0.003462795       0.500281914   0.69%  99.31%      12
    301420       0.000016711       0.052848218   0.03%  99.97%      38
     95103       0.000000544       0.000000000 100.00%   0.00%       1
    145858       0.000000659       0.000794896   0.08%  99.92%       2
     42221       0.000011484       0.000039445  22.55%  77.45%       5
    371718       0.000000707       0.001805425   0.04%  99.96%       2
     95109       0.000000880       0.008998763   0.01%  99.99%       2
     95337       0.000010330       0.503057866   0.00% 100.00%       8
     42700       0.002442175       0.012504429  16.34%  83.66%      35
    189680       0.003466450       0.500128627   0.69%  99.31%       9
     42681       0.006685396       0.000391575  94.47%   5.53%      16
     42702       0.000048203       0.000000500  98.97%   1.03%       2
     42703       0.000033280       0.140102087   0.02%  99.98%       9
    224423       0.000000195       0.000000000 100.00%   0.00%       1
     42706       0.000541098       0.000014713  97.35%   2.65%       3
    106275       0.000000456       0.000000000 100.00%   0.00%       1
     42721       0.000372857       0.000000000 100.00%   0.00%       1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi Kamil,
in my mail just a few minutes ago  I'd overlooked that the buffer size in your trace was indeed 128M (I suppose the trace file is adapting that size if not set in particular). That is very strange, even under high load, the trace should then capture some longer time than 10 secs, and , most of all, it should contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when everything is normal and check that out - you should see many records, the trcsum.awk should list just a small portion of unfinished ops at the end, ... If that is fine, then the tracing itself is affected by your crritical condition (never experienced that before - rather GPFS grinds to a halt than the trace is abandoned), and that might well be worth a support ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Uwe Falke/Germany/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 10:21
Subject:        Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client
performance with high cpu usage of      mmfsd process


Hi, Kamil,
looks your tracefile setting has been too low:
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here means you effectively captured a period of about 5ms only ... you can't see much from that.

I'd assumed the default trace file size would be sufficient here but it doesn't seem to.
try running with something like
 mmtracectl --start --trace-file-size=512M --trace=io --tracedev-write-mode=overwrite -N <your_critical_node>.

However, if you say "no major waiter" - how many waiters did you see at any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload.

The very short cyle time of the trace points, OTOH, to high activity, OTOH the trace file setting appears quite low (trace=io doesnt' collect many trace infos, just basic IO stuff).
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 03:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance
with high cpu usage     of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 100.019054 seconds and
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using this value OS reported cycle count update rate as 2599999000 per second Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220) daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 20701293488095152) all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893 Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771 Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete. This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science. I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier. Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201113/c702b31c/attachment-0001.htm>

From stockf at us.ibm.com  Fri Nov 13 13:38:48 2020
From: stockf at us.ibm.com (Frederick Stock)
Date: Fri, 13 Nov 2020 13:38:48 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu?=
 =?utf-8?q?=09usage=09of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>,
	<BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com><BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com><OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain><OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
Message-ID: <OF0260E625.4DAA590F-ON0025861F.004AF58C-0025861F.004AF6E7@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201113/6be5bbb6/attachment-0001.htm>

From kkr at lbl.gov  Fri Nov 13 21:11:16 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Fri, 13 Nov 2020 13:11:16 -0800
Subject: [gpfsug-discuss] REMINDER - SC20 Sessions - Monday Nov. 16 and
	Wednesday Nov. 18
Message-ID: <7B85E526-88D4-44AE-B034-4EC5A61E524C@lbl.gov>

Hi all,

	A Reminder to attend and also submit any panel questions for the Wednesday session. So far, there are 3 questions around these topics:

1)  excessive prefetch when reading small fractions of many large files
2)  improved the integration between TSM and GPFS
3) number of security vulnerabilities in GPFS, the GUI, ESS, or something else related

	Bring on your tough questions and make it interesting.

Cheers,
Kristy


?original email---

	The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is:

"Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST 

and the second one is 

"What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST.  

Please see the calendar at https://www.spectrumscaleug.org/eventslist/2020-11/ and register by clicking on a session on the calendar and then the "Please register here to join the session" link.

Best,
Kristy

Kristy Kallback-Rose
Senior HPC Storage Systems Analyst
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory


From UWEFALKE at de.ibm.com  Mon Nov 16 13:45:57 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Mon, 16 Nov 2020 14:45:57 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu?=
 =?utf-8?q?=09usage=09of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com><BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com><OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain><OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
	<BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OF2441E9DD.746C6053-ONC1258622.00456CFA-C1258622.004B9E2D@notes.na.collabserv.com>

Hi, 
while the other nodes can well block the local one, as Frederick suggests, 
 there should at least be something visible locally waiting for these 
other nodes. 
Looking at all waiters might be a good thing, but this case looks strange 
in other ways. Mind statement there are almost no local waiters and none 
of them gets older than 10 ms.

I am no developer nor do I have the code, so don't expect too much.  Can 
you tell what lookups you see (check in the trcrpt file, could be like 
gpfs_i_lookup or gpfs_v_lookup)? 
Lookups are metadata ops, do you have a separate pool for your metadata? 
How is that pool set up (doen to the physical block devices)?
Your trcsum down revealed 36 lookups, each one on avg taking >30ms. That 
is a lot (albeit the respective waiters won't show up at first glance as 
suspicious ...). 
So, which waiters did you see  (hope you saved them, if not, do it next 
time).

What are the node you see this on and the whole cluster used for? What is 
the MaxFilesToCache setting (for that node and for others)? what HW is 
that, how big are your nodes (memory,CPU)?
To check the unreasonably short trace capture time: how large are the 
trcrpt files you obtain?


Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 14:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu   usage   of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

Regarding your previous message - waiters were coming / going with just 
1-2 waiters when I ran the mmdiag command, with very low wait times 
(<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and 
this was the header result:

Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 
cycles
Measured cycle count update rate to be 2600001271 per second <---- using 
this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
kernel trace enabled Fri Nov 13 08:20:01.800558000 2020 (TOD 
1605273601.800558, cycles 20807897445779444)
daemon trace enabled Fri Nov 13 08:20:01.910017000 2020 (TOD 
1605273601.910017, cycles 20807897730372442)
all streams included Fri Nov 13 08:20:26.423085049 2020 (TOD 
1605273626.423085, cycles 20807961464381068) <---- useful part of trace 
extends from here
trace quiesced Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, 
cycles 20807965037900696) <---- to here
Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled 
with lookup commands / large lookup times. Can you help debug what those 
long lookup operations mean?

Unfinished operations:

27967 ***************** pagein ************** 1.362382116
27967 ***************** readpage ************** 1.362381516
139130 1.362448448 ********* Unfinished IO: buffer/disk 3002F670000 
20:107498951168^\archive_data_16
104686 1.362022068 ********* Unfinished IO: buffer/disk 50011878000 
1:47169618944^\archive_data_1
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\FFFFFFFE
341710 1.362423815 ********* Unfinished IO: buffer/disk 20022218000 
19:107498951680^\archive_data_15
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
18:3452986648^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
18:3452986648^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
18:3452986648^\FFFFFFFF
139150 1.361122006 ********* Unfinished IO: buffer/disk 50012018000 
2:47169622016^\archive_data_2
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\00000000FFFFFFFF
95782 1.361112791 ********* Unfinished IO: buffer/disk 40016300000 
20:107498950656^\archive_data_16
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\00000000FFFFFFFF
271076 1.361579585 ********* Unfinished IO: buffer/disk 20023DB8000 
4:47169606656^\archive_data_4
341676 1.362018599 ********* Unfinished IO: buffer/disk 40038140000 
5:47169614336^\archive_data_5
139150 1.361131599 MSG FSnd: nsdMsgReadExt msg_id 2930654492 Sduration 
13292.382 + us
341676 1.362027104 MSG FSnd: nsdMsgReadExt msg_id 2930654495 Sduration 
12396.877 + us
95782 1.361124739 MSG FSnd: nsdMsgReadExt msg_id 2930654491 Sduration 
13299.242 + us
271076 1.361587653 MSG FSnd: nsdMsgReadExt msg_id 2930654493 Sduration 
12836.328 + us
92182 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
341710 1.362429643 MSG FSnd: nsdMsgReadExt msg_id 2930654497 Sduration 
11994.338 + us
341662 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
139130 1.362458376 MSG FSnd: nsdMsgReadExt msg_id 2930654498 Sduration 
11965.605 + us
104686 1.362028772 MSG FSnd: nsdMsgReadExt msg_id 2930654496 Sduration 
12395.209 + us
412373 0.775676657 MSG FRep: nsdMsgReadExt msg_id 304915249 Rduration 
598747.324 us Rlen 262144 Hduration 598752.112 + us
341770 0.589739579 MSG FRep: nsdMsgReadExt msg_id 338079050 Rduration 
784684.402 us Rlen 4 Hduration 784692.651 + us
143315 0.536252844 MSG FRep: nsdMsgReadExt msg_id 631945522 Rduration 
838171.137 us Rlen 233472 Hduration 838174.299 + us
341878 0.134331812 MSG FRep: nsdMsgReadExt msg_id 338079023 Rduration 
1240092.169 us Rlen 262144 Hduration 1240094.403 + us
175478 0.587353287 MSG FRep: nsdMsgReadExt msg_id 338079047 Rduration 
787070.694 us Rlen 262144 Hduration 787073.990 + us
139558 0.633517347 MSG FRep: nsdMsgReadExt msg_id 631945538 Rduration 
740906.634 us Rlen 102400 Hduration 740910.172 + us
143308 0.958832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration 
415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time: 1.374423981 seconds
Elapsed trace time from first VFS call to last: 1.374423980
Time idle between VFS calls: 0.001603738 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
readpage 1.151660085 1874 614.546
rdwr 0.431456904 581 742.611
read_inode2 0.001180648 934 1.264
follow_link 0.000029502 7 4.215
getattr 0.000048413 9 5.379
revalidate 0.000007080 67 0.106
pagein 1.149699537 1877 612.520
create 0.007664829 9 851.648
open 0.001032657 19 54.350
unlink 0.002563726 14 183.123
delete_inode 0.000764598 826 0.926
lookup 0.312847947 953 328.277
setattr 0.020651226 824 25.062
permission 0.000015018 1 15.018
rename 0.000529023 4 132.256
release 0.001613800 22 73.355
getxattr 0.000030494 6 5.082
mmap 0.000054767 1 54.767
llseek 0.000001130 4 0.283
readdir 0.000033947 2 16.973
removexattr 0.002119736 820 2.585

User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
42625 0.000000138 0.000031017 0.44% 99.56% 3
42378 0.000586959 0.011596801 4.82% 95.18% 32
42627 0.000000272 0.000013421 1.99% 98.01% 2
42641 0.003284590 0.012593594 20.69% 79.31% 35
42628 0.001522335 0.000002748 99.82% 0.18% 2
25464 0.003462795 0.500281914 0.69% 99.31% 12
301420 0.000016711 0.052848218 0.03% 99.97% 38
95103 0.000000544 0.000000000 100.00% 0.00% 1
145858 0.000000659 0.000794896 0.08% 99.92% 2
42221 0.000011484 0.000039445 22.55% 77.45% 5
371718 0.000000707 0.001805425 0.04% 99.96% 2
95109 0.000000880 0.008998763 0.01% 99.99% 2
95337 0.000010330 0.503057866 0.00% 100.00% 8
42700 0.002442175 0.012504429 16.34% 83.66% 35
189680 0.003466450 0.500128627 0.69% 99.31% 9
42681 0.006685396 0.000391575 94.47% 5.53% 16
42702 0.000048203 0.000000500 98.97% 1.03% 2
42703 0.000033280 0.140102087 0.02% 99.98% 9
224423 0.000000195 0.000000000 100.00% 0.00% 1
42706 0.000541098 0.000014713 97.35% 2.65% 3
106275 0.000000456 0.000000000 100.00% 0.00% 1
42721 0.000372857 0.000000000 100.00% 0.00% 1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi Kamil,
in my mail just a few minutes ago I'd overlooked that the buffer size in 
your trace was indeed 128M (I suppose the trace file is adapting that size 
if not set in particular). That is very strange, even under high load, the 
trace should then capture some longer time than 10 secs, and , most of 
all, it should contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when 
everything is normal and check that out - you should see many records, the 
trcsum.awk should list just a small portion of unfinished ops at the end, 
... If that is fine, then the tracing itself is affected by your crritical 
condition (never experienced that before - rather GPFS grinds to a halt 
than the trace is abandoned), and that might well be worth a support 
ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: Uwe Falke/Germany/IBM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 10:21
Subject: Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client
performance with high cpu usage of mmfsd process


Hi, Kamil,
looks your tracefile setting has been too low:
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 
1605232700.000133, cycles 20701553190681534) <---- to here means you 
effectively captured a period of about 5ms only ... you can't see much 
from that.

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to.
try running with something like
mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N <your_critical_node>.

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload.

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff).
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 03:33
Subject: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance
with high cpu usage of mmfsd process
Sent by: gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 100.019054 seconds and
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using 
this value OS reported cycle count update rate as 2599999000 per second 
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 
1605232600.114080, cycles 20701293141385220) daemon trace enabled Thu Nov 
12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 
20701293488095152) all streams included Thu Nov 12 20:58:19.950515266 2020 
(TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of 
trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 
(TOD 1605232700.000133, cycles 20701553190681534) <---- to here 
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893 Time idle 
between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs 
rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec # time(sec) 
count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771 Time idle 
between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs 
rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) 
count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components 
are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default 
trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl 
--start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> 
then, when the issue is seen, stop the trace by mmtracectl --stop -N 
<your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related 
file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu 
usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a 
ls could take a few minutes to complete. This affects every gpfs mount on 
the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science. I often end up killing many processes before finding the 
offending one.

My question is really about finding the offending process easier. Is there 
something similar to iotop or a trace that I can enable that can tell me 
what files/processes and being heavily used by the mmfsd process on the 
client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From andi at christiansen.xxx  Mon Nov 16 19:44:14 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Mon, 16 Nov 2020 20:44:14 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over
	NFS?
Message-ID: <1388247256.209171.1605555854969@privateemail.com>

Hi all,

i have got a case where a customer wants 700TB migrated from isilon to Scale and the only way for him is exporting the same directory on NFS from two different nodes...

as of now we are using multiple rsync processes on different parts of folders within the main directory. this is really slow and will take forever.. right now 14 rsync processes spread across 3 nodes fetching from 2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from scale nodes and 20Gbits from isilon so we should be able to reach just under 20Gbit...


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201116/db8d01a1/attachment-0001.htm>

From stockf at us.ibm.com  Mon Nov 16 21:44:30 2020
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 16 Nov 2020 21:44:30 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201116/a17a56c4/attachment-0001.htm>

From skylar2 at uw.edu  Mon Nov 16 21:58:19 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Mon, 16 Nov 2020 13:58:19 -0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <20201116215819.wda6nophekamzs3v@thargelion>

When we did a similar (though larger, at ~2.5PB) migration, we used rsync
as well, but ran one rsync process per Isilon node, and made sure the NFS
clients were hitting separate Isilon nodes for their reads. We also didn't
have more than one rsync process running per client, as the Linux NFS
client (at least in CentOS 6) was terrible when it came to concurrent access.

Whatever method you end up using, I can guarantee you will be much happier
once you are on GPFS. :)

On Mon, Nov 16, 2020 at 08:44:14PM +0100, Andi Christiansen wrote:
> Hi all,
> 
> i have got a case where a customer wants 700TB migrated from isilon to Scale and the only way for him is exporting the same directory on NFS from two different nodes...
> 
> as of now we are using multiple rsync processes on different parts of folders within the main directory. this is really slow and will take forever.. right now 14 rsync processes spread across 3 nodes fetching from 2.. 
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from scale nodes and 20Gbits from isilon so we should be able to reach just under 20Gbit...
> 
> 
> if anyone have any ideas they are welcome! 
> 
> 
> Thanks in advance 
> Andi Christiansen

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From jonathan.buzzard at strath.ac.uk  Mon Nov 16 22:58:49 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 16 Nov 2020 22:58:49 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <4de1fa02-a074-0901-cf12-31be9e843f5f@strath.ac.uk>

On 16/11/2020 19:44, Andi Christiansen wrote:
> Hi all,
> 
> i have got a case where a customer wants 700TB migrated from isilon to 
> Scale and the only way for him is exporting the same directory on NFS 
> from two different nodes...
> 
> as of now we are using multiple rsync processes on different parts of 
> folders within the main directory. this is really slow and will take 
> forever.. right now 14 rsync processes spread across 3 nodes fetching 
> from 2..
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit 
> from scale nodes and 20Gbits from isilon so we should be able to reach 
> just under 20Gbit...
> 
> 
> if anyone have any ideas they are welcome!
> 

My biggest recommendation when doing this is to use a sqlite database to 
keep track of what is going on.

The main issue is that you are almost certainly going to need to do more 
than one rsync pass unless your source Isilon system has no user 
activity, and with 700TB to move that seems unlikely. Typically you do 
an initial rsync to move the bulk of the data while the users are still 
live, then shutdown user access to the source system and do the final 
rsync which hopefully has a significantly smaller amount of data to 
actually move.

So this is what I have done on a number of occasions now. I create a 
very simple sqlite DB with a list of source and destination folders and 
a status code. Initially the status code is set to -1.

Then I have a perl script which looks at the sqlite DB, picks a row with 
a status code of -1, and sets the status code to -2, aka that directory 
is in progress. It then proceeds to run the rsync and when it finishes 
it updates the status code to the exit code of the rsync process.

As long as all the rsync processes have access to the same copy of the 
sqlite DB (simplest to put it on either the source or destination file 
system) then all is good. You can fire off multiple rsync's on multiple 
nodes and they will all keep churning away till there is no more work to 
be done.

The advantage is you can easily interrogate the DB to find out the state 
of play. That is how many of your transfers have completed, how many are 
yet to be done, which ones are currently being transferred etc. without 
logging onto multiple nodes.

*MOST* importantly you can see if any of the rsync's had an error, by 
simply looking for status codes greater than zero. I cannot stress how 
important this is. Noting that if the source is still active you will 
see errors down to files being deleted on the source file system before 
rsync has a chance to copy them. However this has a specific exit code 
(24) so is easy to spot and not worry about.

Finally it is also very simple to set the status codes to -1 again and 
set the process away again. So the final run is easier to do.

If you want to mail me off list I can dig out a copy of the perl code I 
used if your interested. There are several version as I have tended to 
tailor to each transfer.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Mon Nov 16 23:12:47 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 16 Nov 2020 23:12:47 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <20201116215819.wda6nophekamzs3v@thargelion>
References: <1388247256.209171.1605555854969@privateemail.com>
	<20201116215819.wda6nophekamzs3v@thargelion>
Message-ID: <8d4d2987-77dd-e3e1-1c98-a635f1b96ddd@strath.ac.uk>

On 16/11/2020 21:58, Skylar Thompson wrote:
> When we did a similar (though larger, at ~2.5PB) migration, we used rsync
> as well, but ran one rsync process per Isilon node, and made sure the NFS
> clients were hitting separate Isilon nodes for their reads. We also didn't
> have more than one rsync process running per client, as the Linux NFS
> client (at least in CentOS 6) was terrible when it came to concurrent access.
> 

The million dollar question IMHO is the number of files and their sizes.

Basically if you have a million 1KB files to move it is going to take 
much longer than a 100 1GB files. That is the overhead of dealing with 
each file is a real bitch and kills your attainable transfer speed stone 
dead.

One option I have used in the past is to use your last backup and 
restore to the new system, then rsync in the changes. That way you don't 
impact the source file system which is live.

Another option I have used is to inform users in advance that data will 
be transferred based on a metric of how many files and how much data 
they have. So the less data and fewer files the quicker you will get 
access to the new system once access to the old system is turned off.

It is amazing how much users clear up junk under this scenario. Last 
time I did this a single user went from over 17 million files to 11 
thousand! In total many many TB of data just vanished from the system 
(around half of the data when puff) as users actually got around to some 
house keeping LOL. Moving less data and files is always less painful.

> Whatever method you end up using, I can guarantee you will be much happier
> once you are on GPFS. :)
> 
Goes without saying :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From UWEFALKE at de.ibm.com  Tue Nov 17 08:50:56 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Tue, 17 Nov 2020 09:50:56 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309BB9@notes.na.collabserv.com>

Hi Andi, 

what about leaving NFS completeley out and using rsync  (multiple rsyncs 
in parallel, of course) directly between your source and target servers? 
I am not sure how many TCP connections (suppose it is NFS4) in parallel 
are opened between client and server, using a 2x bonded interface well 
requires at least two.  That combined with the DB approach suggested by 
Jonathan to control the activity of the rsync streams would be my best 
guess.
If you have many small files, the overhead might still kill you. Tarring 
them up into larger aggregates for transfer would help a lot, but then you 
must be sure they won't change or you need to implement your own version 
control for that class of files.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Andi Christiansen <andi at christiansen.xxx>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   16/11/2020 20:44
Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
Isilon to Scale over    NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi all, 

i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes... 

as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit... 


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From UWEFALKE at de.ibm.com  Tue Nov 17 08:57:07 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Tue, 17 Nov 2020 09:57:07 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
Message-ID: <OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>

Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but 
it is over two nodes, so no bonding. But still, I'd expect to open several 
TCP connections in parallel per source-target pair  (like with several 
rsyncs per source node) would bear an advantage (and still I thing NFS 
doesn't do that, but I can be wrong). 
If more nodes have access to the Isilon data they could also participate 
(and don't need NFS exports for that).

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Uwe Falke/Germany/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   17/11/2020 09:50
Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over       NFS?


Hi Andi, 

what about leaving NFS completeley out and using rsync  (multiple rsyncs 
in parallel, of course) directly between your source and target servers? 
I am not sure how many TCP connections (suppose it is NFS4) in parallel 
are opened between client and server, using a 2x bonded interface well 
requires at least two.  That combined with the DB approach suggested by 
Jonathan to control the activity of the rsync streams would be my best 
guess.
If you have many small files, the overhead might still kill you. Tarring 
them up into larger aggregates for transfer would help a lot, but then you 
must be sure they won't change or you need to implement your own version 
control for that class of files.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Andi Christiansen <andi at christiansen.xxx>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   16/11/2020 20:44
Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
Isilon to Scale over    NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi all, 

i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes... 

as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit... 


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From andi at christiansen.xxx  Tue Nov 17 11:51:58 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 17 Nov 2020 12:51:58 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over	NFS?
In-Reply-To: <OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
Message-ID: <616234716.258600.1605613918767@privateemail.com>

Hi all,

thanks for all the information, there was some interesting things amount it..

I kept on going with rsync and ended up making a file with all top level user directories and splitting them into chunks of 347 per rsync session(total 42000 ish folders). yesterday we had only 14 sessions with 3000 folders in each and that was too much work for one rsync session..

i divided them out among all GPFS nodes to have them fetch an area each and actually doing that 3 times on each node and that has now boosted the bandwidth usage from 3Gbit to around 16Gbit in total..

all nodes have been seing doing work above 7Gbit individual which is actually near to what i was expecting without any modifications to the NFS server or TCP tuning..

CPU is around 30-50% on each server and mostly below or around 30% so it seems like it could have handled abit more sessions..

Small files are really a killer but with all 96+ sessions we have now its not often all sessions are handling small files at the same time so we have an average of about 10-12Gbit bandwidth usage.

Thanks all! ill keep you in mind if for some reason we see it slowing down again but for now i think we will try to see if it will go the last mile with a bit more sessions on each :)

Best Regards
Andi Christiansen

> On 11/17/2020 9:57 AM Uwe Falke <uwefalke at de.ibm.com> wrote:
> 
>  
> Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but 
> it is over two nodes, so no bonding. But still, I'd expect to open several 
> TCP connections in parallel per source-target pair  (like with several 
> rsyncs per source node) would bear an advantage (and still I thing NFS 
> doesn't do that, but I can be wrong). 
> If more nodes have access to the Isilon data they could also participate 
> (and don't need NFS exports for that).
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Uwe Falke
> IT Specialist
> Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
> Services
> +49 175 575 2877 Mobile
> Rathausstr. 7, 09111 Chemnitz, Germany
> uwefalke at de.ibm.com
> 
> IBM Services
> 
> IBM Data Privacy Statement
> 
> IBM Deutschland Business & Technology Services GmbH
> Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
> 
> 
> 
> From:   Uwe Falke/Germany/IBM
> To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:   17/11/2020 09:50
> Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data 
> from Isilon to Scale over       NFS?
> 
> 
> Hi Andi, 
> 
> what about leaving NFS completeley out and using rsync  (multiple rsyncs 
> in parallel, of course) directly between your source and target servers? 
> I am not sure how many TCP connections (suppose it is NFS4) in parallel 
> are opened between client and server, using a 2x bonded interface well 
> requires at least two.  That combined with the DB approach suggested by 
> Jonathan to control the activity of the rsync streams would be my best 
> guess.
> If you have many small files, the overhead might still kill you. Tarring 
> them up into larger aggregates for transfer would help a lot, but then you 
> must be sure they won't change or you need to implement your own version 
> control for that class of files.
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Uwe Falke
> IT Specialist
> Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
> Services
> +49 175 575 2877 Mobile
> Rathausstr. 7, 09111 Chemnitz, Germany
> uwefalke at de.ibm.com
> 
> IBM Services
> 
> IBM Data Privacy Statement
> 
> IBM Deutschland Business & Technology Services GmbH
> Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
> 
> 
> 
> 
> From:   Andi Christiansen <andi at christiansen.xxx>
> To:     "gpfsug-discuss at spectrumscale.org" 
> <gpfsug-discuss at spectrumscale.org>
> Date:   16/11/2020 20:44
> Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
> Isilon to Scale over    NFS?
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> Hi all, 
> 
> i have got a case where a customer wants 700TB migrated from isilon to 
> Scale and the only way for him is exporting the same directory on NFS from 
> two different nodes... 
> 
> as of now we are using multiple rsync processes on different parts of 
> folders within the main directory. this is really slow and will take 
> forever.. right now 14 rsync processes spread across 3 nodes fetching from 
> 2.. 
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
> scale nodes and 20Gbits from isilon so we should be able to reach just 
> under 20Gbit... 
> 
> 
> if anyone have any ideas they are welcome! 
> 
> 
> Thanks in advance 
> Andi Christiansen _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From janfrode at tanso.net  Tue Nov 17 12:07:30 2020
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Tue, 17 Nov 2020 13:07:30 +0100
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <616234716.258600.1605613918767@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
Message-ID: <CAHwPatjXpM_2B_FZd8b_ETQXL2AS3Es6oJmq7VVzoxa-8kp48g@mail.gmail.com>

Nice to see it working well!

But, what about ACLs? Does you rsync pull in all needed metadata, or do you
also need to sync ACLs ? Any plans for how to solve that ?

On Tue, Nov 17, 2020 at 12:52 PM Andi Christiansen <andi at christiansen.xxx>
wrote:

> Hi all,
>
> thanks for all the information, there was some interesting things amount
> it..
>
> I kept on going with rsync and ended up making a file with all top level
> user directories and splitting them into chunks of 347 per rsync
> session(total 42000 ish folders). yesterday we had only 14 sessions with
> 3000 folders in each and that was too much work for one rsync session..
>
> i divided them out among all GPFS nodes to have them fetch an area each
> and actually doing that 3 times on each node and that has now boosted the
> bandwidth usage from 3Gbit to around 16Gbit in total..
>
> all nodes have been seing doing work above 7Gbit individual which is
> actually near to what i was expecting without any modifications to the NFS
> server or TCP tuning..
>
> CPU is around 30-50% on each server and mostly below or around 30% so it
> seems like it could have handled abit more sessions..
>
> Small files are really a killer but with all 96+ sessions we have now its
> not often all sessions are handling small files at the same time so we have
> an average of about 10-12Gbit bandwidth usage.
>
> Thanks all! ill keep you in mind if for some reason we see it slowing down
> again but for now i think we will try to see if it will go the last mile
> with a bit more sessions on each :)
>
> Best Regards
> Andi Christiansen
>
> > On 11/17/2020 9:57 AM Uwe Falke <uwefalke at de.ibm.com> wrote:
> >
> >
> > Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons,
> but
> > it is over two nodes, so no bonding. But still, I'd expect to open
> several
> > TCP connections in parallel per source-target pair  (like with several
> > rsyncs per source node) would bear an advantage (and still I thing NFS
> > doesn't do that, but I can be wrong).
> > If more nodes have access to the Isilon data they could also participate
> > (and don't need NFS exports for that).
> >
> > Mit freundlichen Gr??en / Kind regards
> >
> > Dr. Uwe Falke
> > IT Specialist
> > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> > Services
> > +49 175 575 2877 Mobile
> > Rathausstr. 7, 09111 Chemnitz, Germany
> > uwefalke at de.ibm.com
> >
> > IBM Services
> >
> > IBM Data Privacy Statement
> >
> > IBM Deutschland Business & Technology Services GmbH
> > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> > Sitz der Gesellschaft: Ehningen
> > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >
> >
> >
> > From:   Uwe Falke/Germany/IBM
> > To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Date:   17/11/2020 09:50
> > Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data
> > from Isilon to Scale over       NFS?
> >
> >
> > Hi Andi,
> >
> > what about leaving NFS completeley out and using rsync  (multiple rsyncs
> > in parallel, of course) directly between your source and target servers?
> > I am not sure how many TCP connections (suppose it is NFS4) in parallel
> > are opened between client and server, using a 2x bonded interface well
> > requires at least two.  That combined with the DB approach suggested by
> > Jonathan to control the activity of the rsync streams would be my best
> > guess.
> > If you have many small files, the overhead might still kill you. Tarring
> > them up into larger aggregates for transfer would help a lot, but then
> you
> > must be sure they won't change or you need to implement your own version
> > control for that class of files.
> >
> > Mit freundlichen Gr??en / Kind regards
> >
> > Dr. Uwe Falke
> > IT Specialist
> > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> > Services
> > +49 175 575 2877 Mobile
> > Rathausstr. 7, 09111 Chemnitz, Germany
> > uwefalke at de.ibm.com
> >
> > IBM Services
> >
> > IBM Data Privacy Statement
> >
> > IBM Deutschland Business & Technology Services GmbH
> > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> > Sitz der Gesellschaft: Ehningen
> > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >
> >
> >
> >
> > From:   Andi Christiansen <andi at christiansen.xxx>
> > To:     "gpfsug-discuss at spectrumscale.org"
> > <gpfsug-discuss at spectrumscale.org>
> > Date:   16/11/2020 20:44
> > Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from
> > Isilon to Scale over    NFS?
> > Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> >
> >
> >
> > Hi all,
> >
> > i have got a case where a customer wants 700TB migrated from isilon to
> > Scale and the only way for him is exporting the same directory on NFS
> from
> > two different nodes...
> >
> > as of now we are using multiple rsync processes on different parts of
> > folders within the main directory. this is really slow and will take
> > forever.. right now 14 rsync processes spread across 3 nodes fetching
> from
> > 2..
> >
> > does anyone know of a way to speed it up? right now we see from 1Gbit to
> > 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit
> from
> > scale nodes and 20Gbits from isilon so we should be able to reach just
> > under 20Gbit...
> >
> >
> > if anyone have any ideas they are welcome!
> >
> >
> > Thanks in advance
> > Andi Christiansen _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201117/1fba22bb/attachment-0001.htm>

From andi at christiansen.xxx  Tue Nov 17 12:24:22 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 17 Nov 2020 13:24:22 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <CAHwPatjXpM_2B_FZd8b_ETQXL2AS3Es6oJmq7VVzoxa-8kp48g@mail.gmail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<CAHwPatjXpM_2B_FZd8b_ETQXL2AS3Es6oJmq7VVzoxa-8kp48g@mail.gmail.com>
Message-ID: <1023406427.259407.1605615862969@privateemail.com>

Hi Jan,

We are syncing ACLs, groups, owners and timestamps aswell :)

/Andi Christiansen

>     On 11/17/2020 1:07 PM Jan-Frode Myklebust <janfrode at tanso.net> wrote:
> 
> 
>     Nice to see it working well!
> 
>     But, what about ACLs? Does you rsync pull in all needed metadata, or do you also need to sync ACLs ? Any plans for how to solve that ?
> 
>     On Tue, Nov 17, 2020 at 12:52 PM Andi Christiansen <andi at christiansen.xxx> wrote:
> 
>         > > Hi all,
> > 
> >         thanks for all the information, there was some interesting things amount it..
> > 
> >         I kept on going with rsync and ended up making a file with all top level user directories and splitting them into chunks of 347 per rsync session(total 42000 ish folders). yesterday we had only 14 sessions with 3000 folders in each and that was too much work for one rsync session..
> > 
> >         i divided them out among all GPFS nodes to have them fetch an area each and actually doing that 3 times on each node and that has now boosted the bandwidth usage from 3Gbit to around 16Gbit in total..
> > 
> >         all nodes have been seing doing work above 7Gbit individual which is actually near to what i was expecting without any modifications to the NFS server or TCP tuning..
> > 
> >         CPU is around 30-50% on each server and mostly below or around 30% so it seems like it could have handled abit more sessions..
> > 
> >         Small files are really a killer but with all 96+ sessions we have now its not often all sessions are handling small files at the same time so we have an average of about 10-12Gbit bandwidth usage.
> > 
> >         Thanks all! ill keep you in mind if for some reason we see it slowing down again but for now i think we will try to see if it will go the last mile with a bit more sessions on each :)
> > 
> >         Best Regards
> >         Andi Christiansen
> > 
> >         > On 11/17/2020 9:57 AM Uwe Falke <uwefalke at de.ibm.com mailto:uwefalke at de.ibm.com > wrote:
> >         >
> >         > 
> >         > Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but
> >         > it is over two nodes, so no bonding. But still, I'd expect to open several
> >         > TCP connections in parallel per source-target pair  (like with several
> >         > rsyncs per source node) would bear an advantage (and still I thing NFS
> >         > doesn't do that, but I can be wrong).
> >         > If more nodes have access to the Isilon data they could also participate
> >         > (and don't need NFS exports for that).
> >         >
> >         > Mit freundlichen Gr??en / Kind regards
> >         >
> >         > Dr. Uwe Falke
> >         > IT Specialist
> >         > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> >         > Services
> >         > +49 175 575 2877 Mobile
> >         > Rathausstr. 7, 09111 Chemnitz, Germany
> >         > uwefalke at de.ibm.com mailto:uwefalke at de.ibm.com
> >         >
> >         > IBM Services
> >         >
> >         > IBM Data Privacy Statement
> >         >
> >         > IBM Deutschland Business & Technology Services GmbH
> >         > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> >         > Sitz der Gesellschaft: Ehningen
> >         > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >         >
> >         >
> >         >
> >         > From:   Uwe Falke/Germany/IBM
> >         > To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org mailto:gpfsug-discuss at spectrumscale.org >
> >         > Date:   17/11/2020 09:50
> >         > Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data
> >         > from Isilon to Scale over       NFS?
> >         >
> >         >
> >         > Hi Andi,
> >         >
> >         > what about leaving NFS completeley out and using rsync  (multiple rsyncs
> >         > in parallel, of course) directly between your source and target servers?
> >         > I am not sure how many TCP connections (suppose it is NFS4) in parallel
> >         > are opened between client and server, using a 2x bonded interface well
> >         > requires at least two.  That combined with the DB approach suggested by
> >         > Jonathan to control the activity of the rsync streams would be my best
> >         > guess.
> >         > If you have many small files, the overhead might still kill you. Tarring
> >         > them up into larger aggregates for transfer would help a lot, but then you
> >         > must be sure they won't change or you need to implement your own version
> >         > control for that class of files.
> >         >
> >         > Mit freundlichen Gr??en / Kind regards
> >         >
> >         > Dr. Uwe Falke
> >         > IT Specialist
> >         > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> >         > Services
> >         > +49 175 575 2877 Mobile
> >         > Rathausstr. 7, 09111 Chemnitz, Germany
> >         > uwefalke at de.ibm.com mailto:uwefalke at de.ibm.com
> >         >
> >         > IBM Services
> >         >
> >         > IBM Data Privacy Statement
> >         >
> >         > IBM Deutschland Business & Technology Services GmbH
> >         > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> >         > Sitz der Gesellschaft: Ehningen
> >         > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >         >
> >         >
> >         >
> >         >
> >         > From:   Andi Christiansen <andi at christiansen.xxx>
> >         > To:     "gpfsug-discuss at spectrumscale.org mailto:gpfsug-discuss at spectrumscale.org "
> >         > <gpfsug-discuss at spectrumscale.org mailto:gpfsug-discuss at spectrumscale.org >
> >         > Date:   16/11/2020 20:44
> >         > Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from
> >         > Isilon to Scale over    NFS?
> >         > Sent by:        gpfsug-discuss-bounces at spectrumscale.org mailto:gpfsug-discuss-bounces at spectrumscale.org
> >         >
> >         >
> >         >
> >         > Hi all,
> >         >
> >         > i have got a case where a customer wants 700TB migrated from isilon to
> >         > Scale and the only way for him is exporting the same directory on NFS from
> >         > two different nodes...
> >         >
> >         > as of now we are using multiple rsync processes on different parts of
> >         > folders within the main directory. this is really slow and will take
> >         > forever.. right now 14 rsync processes spread across 3 nodes fetching from
> >         > 2..
> >         >
> >         > does anyone know of a way to speed it up? right now we see from 1Gbit to
> >         > 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from
> >         > scale nodes and 20Gbits from isilon so we should be able to reach just
> >         > under 20Gbit...
> >         >
> >         >
> >         > if anyone have any ideas they are welcome!
> >         >
> >         >
> >         > Thanks in advance
> >         > Andi Christiansen _______________________________________________
> >         > gpfsug-discuss mailing list
> >         > gpfsug-discuss athttp://spectrumscale.org
> >         > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         > _______________________________________________
> >         > gpfsug-discuss mailing list
> >         > gpfsug-discuss athttp://spectrumscale.org
> >         > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >         _______________________________________________
> >         gpfsug-discuss mailing list
> >         gpfsug-discuss athttp://spectrumscale.org
> >         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> >     >     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201117/e8883d92/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Tue Nov 17 13:53:43 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 17 Nov 2020 13:53:43 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <616234716.258600.1605613918767@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
Message-ID: <12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>

On 17/11/2020 11:51, Andi Christiansen wrote:
> Hi all,
> 
> thanks for all the information, there was some interesting things
> amount it..
> 
> I kept on going with rsync and ended up making a file with all top
> level user directories and splitting them into chunks of 347 per
> rsync session(total 42000 ish folders). yesterday we had only 14
> sessions with 3000 folders in each and that was too much work for one
> rsync session..

Unless you use something similar to my DB suggestion it is almost 
inevitable that some of those rsync sessions are going to have issues 
and you will have no way to track it or even know it has happened unless 
you do a single final giant catchup/check rsync.

I should add that a copy of the sqlite DB is cover your backside 
protection when a user pops up claiming that you failed to transfer one 
of their vitally important files six months down the line and the old 
system is turned off and scrapped.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From skylar2 at uw.edu  Tue Nov 17 14:59:43 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Tue, 17 Nov 2020 06:59:43 -0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
Message-ID: <20201117145943.5cxyfpfyrk7udmn4@thargelion>

On Tue, Nov 17, 2020 at 01:53:43PM +0000, Jonathan Buzzard wrote:
> On 17/11/2020 11:51, Andi Christiansen wrote:
> > Hi all,
> > 
> > thanks for all the information, there was some interesting things
> > amount it..
> > 
> > I kept on going with rsync and ended up making a file with all top
> > level user directories and splitting them into chunks of 347 per
> > rsync session(total 42000 ish folders). yesterday we had only 14
> > sessions with 3000 folders in each and that was too much work for one
> > rsync session..
> 
> Unless you use something similar to my DB suggestion it is almost inevitable
> that some of those rsync sessions are going to have issues and you will have
> no way to track it or even know it has happened unless you do a single final
> giant catchup/check rsync.
> 
> I should add that a copy of the sqlite DB is cover your backside protection
> when a user pops up claiming that you failed to transfer one of their
> vitally important files six months down the line and the old system is
> turned off and scrapped.

That's not a bad idea, and I like it more than the method I setup where we
captured the output of find from both sides of the transfer and preserved
it for posterity, but obviously did require a hard-stop date on the source.

Fortunately, we seem committed to GPFS so it might be we never have to do
another bulk transfer outside of the filesystem...

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From S.J.Thompson at bham.ac.uk  Tue Nov 17 15:55:41 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 17 Nov 2020 15:55:41 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <20201117145943.5cxyfpfyrk7udmn4@thargelion>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
	<20201117145943.5cxyfpfyrk7udmn4@thargelion>
Message-ID: <55E3401C-2F59-4B47-A176-CDF7BCACBE2E@bham.ac.uk>


>    Fortunately, we seem committed to GPFS so it might be we never have to do
>    another bulk transfer outside of the filesystem...

Until you want to move a v3 or v4 created file-system to v5 block sizes __

I hopes we won't be doing that sort of thing again...

Simon


From jonathan.buzzard at strath.ac.uk  Tue Nov 17 19:45:29 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 17 Nov 2020 19:45:29 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <55E3401C-2F59-4B47-A176-CDF7BCACBE2E@bham.ac.uk>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
	<20201117145943.5cxyfpfyrk7udmn4@thargelion>
	<55E3401C-2F59-4B47-A176-CDF7BCACBE2E@bham.ac.uk>
Message-ID: <1a1be12b-a4f2-f2b3-4cdf-e34bc5eace24@strath.ac.uk>

On 17/11/2020 15:55, Simon Thompson wrote:
> 
>>     Fortunately, we seem committed to GPFS so it might be we never have to do
>>     another bulk transfer outside of the filesystem...
> 
> Until you want to move a v3 or v4 created file-system to v5 block sizes __

You forget the v2 to v3 for more than two billion files switch. Either 
that or you where not using it back then. Then there was the v3.2 if you 
ever want to mount it on Windows.

> 
> I hopes we won't be doing that sort of thing again...
> 

Yep, going to be recycling my scripts in the coming week for a v4 to v5 
with capacity upgrade on our DSS-G. That basically involves a trashing 
of the file system and a restore from backup.

Going to be doing the your data will be restored based on a metric of 
how many files and how much data you have ploy again :-)

I too hope that will be the last time I have to do anything similar but 
my experience of the last couple of decades says that is likely to be a 
forlorn hope :-(

I speculate that one day the 10,000 file set limit will be lifted, but 
only if you reformat your file system...

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From andi at christiansen.xxx  Tue Nov 17 20:40:39 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 17 Nov 2020 21:40:39 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
Message-ID: <82434297.276248.1605645639435@privateemail.com>

Hi Jonathan,

yes you are correct! but we plan to resync this once or twice every week for the next 3-4months to be sure everything is as it should be.

Right now we are focused on getting them synced up and then we will run scheduled resyncs/checks once or twice a week depending on the data growth :)

Thanks
Andi Christiansen

> On 11/17/2020 2:53 PM Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
>  
> On 17/11/2020 11:51, Andi Christiansen wrote:
> > Hi all,
> > 
> > thanks for all the information, there was some interesting things
> > amount it..
> > 
> > I kept on going with rsync and ended up making a file with all top
> > level user directories and splitting them into chunks of 347 per
> > rsync session(total 42000 ish folders). yesterday we had only 14
> > sessions with 3000 folders in each and that was too much work for one
> > rsync session..
> 
> Unless you use something similar to my DB suggestion it is almost 
> inevitable that some of those rsync sessions are going to have issues 
> and you will have no way to track it or even know it has happened unless 
> you do a single final giant catchup/check rsync.
> 
> I should add that a copy of the sqlite DB is cover your backside 
> protection when a user pops up claiming that you failed to transfer one 
> of their vitally important files six months down the line and the old 
> system is turned off and scrapped.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From chris.schlipalius at pawsey.org.au  Tue Nov 17 23:17:18 2020
From: chris.schlipalius at pawsey.org.au (Chris Schlipalius)
Date: Wed, 18 Nov 2020 07:17:18 +0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
Message-ID: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>

So at my last job we used to rsync data between isilons across campus, and isilon to Windows File Cluster (and back).

I recommend using dry run to generate a list of files and then use this to run with rysnc.

This allows you also to be able to break up the transfer into batches, and check if files have changed before sync (say if your isilon files are not RO.

Also ensure you have a recent version of rsync that preserves extended attributes and check your ACLS.

 
A dry run example:

https://unix.stackexchange.com/a/261372

 
I always felt more comfortable having a list of files before a sync?.

 
Regards,

Chris Schlipalius

 
Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey Supercomputing Centre (CSIRO)

1 Bryce Avenue

Kensington  WA  6151

Australia

 
Tel  +61 8 6436 8815 

Email  chris.schlipalius at pawsey.org.au

Web  www.pawsey.org.au

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201118/c99c2fb1/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Wed Nov 18 11:48:52 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 18 Nov 2020 11:48:52 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
References: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
Message-ID: <7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>

On 17/11/2020 23:17, Chris Schlipalius wrote:
> So at my last job we used to rsync data between isilons across campus, 
> and isilon to Windows File Cluster (and back).
> 
> I recommend using dry run to generate a list of files and then use this 
> to run with rysnc.
> 
> This allows you also to be able to break up the transfer into batches, 
> and check if files have changed before sync (say if your isilon files 
> are not RO.
> 
> Also ensure you have a recent version of rsync that preserves extended 
> attributes and check your ACLS.
> 
> A dry run example:
> 
> https://unix.stackexchange.com/a/261372 
> 
> I always felt more comfortable having a list of files before a sync?.
> 

I would counsel in the strongest possible terms against that approach.

Basically you have to be assured that none of your file names have 
"wacky" characters in them, because handling "wacky" characters in file 
names is exceedingly difficult. I cannot stress how hard it is and the 
above example does not handle all "wacky" characters in file names.

So what do I mean by "wacky" characters. Well remember a file name can 
have just about anything in it on Linux with the exception of '/', and 
users especially when using a GUI, and even more so if they are Mac 
users can and do use what I will call "wacky" characters in their file 
names.

The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too. 
Then there is the use of the wildcard characters, especially '?' but 
also '*'.

Not too difficult to handle you might say. Right now deal with a file 
name with a newline character in it :-) Don't ask me how or why you even 
do that but let me assure you that I have seen them on more than one 
occasion. And now your dry run list is broken...

Not only that if you have a few hundred million files to move a list 
just becomes unwieldy anyway.

One thing I didn't mention is that I would run anything with in a screen 
(or tmux if that is your poison) and turn on logging.

For those interested I am in the process of cleaning up the script a bit 
and will post it somewhere in due course.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From andi at christiansen.xxx  Wed Nov 18 11:54:47 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Wed, 18 Nov 2020 12:54:47 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
References: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
	<7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
Message-ID: <1947408989.293430.1605700487095@privateemail.com>

Hi Jonathan,

i would be very interested in seeing your scripts when they are posted. Let me know where to get them!

Thanks a bunch!
Andi Christiansen

> On 11/18/2020 12:48 PM Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
>  
> On 17/11/2020 23:17, Chris Schlipalius wrote:
> > So at my last job we used to rsync data between isilons across campus, 
> > and isilon to Windows File Cluster (and back).
> > 
> > I recommend using dry run to generate a list of files and then use this 
> > to run with rysnc.
> > 
> > This allows you also to be able to break up the transfer into batches, 
> > and check if files have changed before sync (say if your isilon files 
> > are not RO.
> > 
> > Also ensure you have a recent version of rsync that preserves extended 
> > attributes and check your ACLS.
> > 
> > A dry run example:
> > 
> > https://unix.stackexchange.com/a/261372 
> > 
> > I always felt more comfortable having a list of files before a sync?.
> > 
> 
> I would counsel in the strongest possible terms against that approach.
> 
> Basically you have to be assured that none of your file names have 
> "wacky" characters in them, because handling "wacky" characters in file 
> names is exceedingly difficult. I cannot stress how hard it is and the 
> above example does not handle all "wacky" characters in file names.
> 
> So what do I mean by "wacky" characters. Well remember a file name can 
> have just about anything in it on Linux with the exception of '/', and 
> users especially when using a GUI, and even more so if they are Mac 
> users can and do use what I will call "wacky" characters in their file 
> names.
> 
> The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too. 
> Then there is the use of the wildcard characters, especially '?' but 
> also '*'.
> 
> Not too difficult to handle you might say. Right now deal with a file 
> name with a newline character in it :-) Don't ask me how or why you even 
> do that but let me assure you that I have seen them on more than one 
> occasion. And now your dry run list is broken...
> 
> Not only that if you have a few hundred million files to move a list 
> just becomes unwieldy anyway.
> 
> One thing I didn't mention is that I would run anything with in a screen 
> (or tmux if that is your poison) and turn on logging.
> 
> For those interested I am in the process of cleaning up the script a bit 
> and will post it somewhere in due course.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From cal.sawyer at framestore.com  Wed Nov 18 12:18:57 2020
From: cal.sawyer at framestore.com (Cal Sawyer)
Date: Wed, 18 Nov 2020 12:18:57 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 106, Issue 21
In-Reply-To: <mailman.1.1605700802.2437096.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1605700802.2437096.gpfsug-discuss@spectrumscale.org>
Message-ID: <CAD-C0PF=uzNL15-_hU1pQ-EWVktTGJrKSKBd358_oBvHeOGiGQ@mail.gmail.com>

Hello

Not a Scale user per se (we run a 3rdparty offshoot of Scale).  In a past
life managing Nexenta with OpenSolaris DR storage, I used nc/netcat for
bulk data sync, which is far more efficient than rsync.  With a bit of
planning and analysis of directory structure on the target, nc runs could
be parallelised as well, although not quite in the same way as running
rsync via parallels. Of course, nc has to be available on Isilon but i have
no experience with that platform. The only caveat in using nc is the amount
of change to the target data as copying progresses (is the target datastore
static or still seeing changes?). nc has to be followed with rsync to apply
any changes and/or verify the integrity of the bulk copy.

https://nakkaya.com/2009/04/15/using-netcat-for-file-transfers/

Are your Isilon and Scale systems located in the same network space?

I'd also suggest that if possible, add a quad-port 10GbE (or larger:
25/100GbE) NIC to your servers to gain a wider data path and conduct your
copy operations on those interfaces

regards

[image: Framestore]
Cal Sawyer ? Senior Systems Engineer   London ? New York ? Los Angeles ?
Chicago ? Montr?al ? Mumbai
28 Chancery Lane
London WC2A 1LB
[T] +44 (0)20 7344 8000
W3W: warm.soil.patio


On Wed, 18 Nov 2020 at 12:00, <gpfsug-discuss-request at spectrumscale.org>
wrote:

> Send gpfsug-discuss mailing list submissions to
>         gpfsug-discuss at spectrumscale.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
>         gpfsug-discuss-request at spectrumscale.org
>
> You can reach the person managing the list at
>         gpfsug-discuss-owner at spectrumscale.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>    1. Re: Migrate/syncronize data from Isilon to Scale  over NFS?
>       (Chris Schlipalius)
>    2. Re: Migrate/syncronize data from Isilon to Scale over NFS?
>       (Jonathan Buzzard)
>    3. Re: Migrate/syncronize data from Isilon to Scale over NFS?
>       (Andi Christiansen)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 18 Nov 2020 07:17:18 +0800
> From: Chris Schlipalius <chris.schlipalius at pawsey.org.au>
> To: <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to
>         Scale   over NFS?
> Message-ID: <578BE691-DEE4-43AC-97D2-546AC406E14A at pawsey.org.au>
> Content-Type: text/plain; charset="utf-8"
>
> So at my last job we used to rsync data between isilons across campus, and
> isilon to Windows File Cluster (and back).
>
> I recommend using dry run to generate a list of files and then use this to
> run with rysnc.
>
> This allows you also to be able to break up the transfer into batches, and
> check if files have changed before sync (say if your isilon files are not
> RO.
>
> Also ensure you have a recent version of rsync that preserves extended
> attributes and check your ACLS.
>
>
>
> A dry run example:
>
> https://unix.stackexchange.com/a/261372
>
>
>
> I always felt more comfortable having a list of files before a sync?.
>
>
>
>
>
>
>
> Regards,
>
> Chris Schlipalius
>
>
>
> Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey
> Supercomputing Centre (CSIRO)
>
> 1 Bryce Avenue
>
> Kensington  WA  6151
>
> Australia
>
>
>
> Tel  +61 8 6436 8815
>
> Email  chris.schlipalius at pawsey.org.au
>
> Web  www.pawsey.org.au
>
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201118/c99c2fb1/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Wed, 18 Nov 2020 11:48:52 +0000
> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to
>         Scale over NFS?
> Message-ID: <7983810e-f51c-8cf7-a750-5c3285870bd4 at strath.ac.uk>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> On 17/11/2020 23:17, Chris Schlipalius wrote:
> > So at my last job we used to rsync data between isilons across campus,
> > and isilon to Windows File Cluster (and back).
> >
> > I recommend using dry run to generate a list of files and then use this
> > to run with rysnc.
> >
> > This allows you also to be able to break up the transfer into batches,
> > and check if files have changed before sync (say if your isilon files
> > are not RO.
> >
> > Also ensure you have a recent version of rsync that preserves extended
> > attributes and check your ACLS.
> >
> > A dry run example:
> >
> > https://unix.stackexchange.com/a/261372
> >
> > I always felt more comfortable having a list of files before a sync?.
> >
>
> I would counsel in the strongest possible terms against that approach.
>
> Basically you have to be assured that none of your file names have
> "wacky" characters in them, because handling "wacky" characters in file
> names is exceedingly difficult. I cannot stress how hard it is and the
> above example does not handle all "wacky" characters in file names.
>
> So what do I mean by "wacky" characters. Well remember a file name can
> have just about anything in it on Linux with the exception of '/', and
> users especially when using a GUI, and even more so if they are Mac
> users can and do use what I will call "wacky" characters in their file
> names.
>
> The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too.
> Then there is the use of the wildcard characters, especially '?' but
> also '*'.
>
> Not too difficult to handle you might say. Right now deal with a file
> name with a newline character in it :-) Don't ask me how or why you even
> do that but let me assure you that I have seen them on more than one
> occasion. And now your dry run list is broken...
>
> Not only that if you have a few hundred million files to move a list
> just becomes unwieldy anyway.
>
> One thing I didn't mention is that I would run anything with in a screen
> (or tmux if that is your poison) and turn on logging.
>
> For those interested I am in the process of cleaning up the script a bit
> and will post it somewhere in due course.
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 18 Nov 2020 12:54:47 +0100 (CET)
> From: Andi Christiansen <andi at christiansen.xxx>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
>         Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> Subject: Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to
>         Scale over NFS?
> Message-ID: <1947408989.293430.1605700487095 at privateemail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hi Jonathan,
>
> i would be very interested in seeing your scripts when they are posted.
> Let me know where to get them!
>
> Thanks a bunch!
> Andi Christiansen
>
> > On 11/18/2020 12:48 PM Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> wrote:
> >
> >
> > On 17/11/2020 23:17, Chris Schlipalius wrote:
> > > So at my last job we used to rsync data between isilons across campus,
> > > and isilon to Windows File Cluster (and back).
> > >
> > > I recommend using dry run to generate a list of files and then use
> this
> > > to run with rysnc.
> > >
> > > This allows you also to be able to break up the transfer into batches,
> > > and check if files have changed before sync (say if your isilon files
> > > are not RO.
> > >
> > > Also ensure you have a recent version of rsync that preserves extended
> > > attributes and check your ACLS.
> > >
> > > A dry run example:
> > >
> > > https://unix.stackexchange.com/a/261372
> > >
> > > I always felt more comfortable having a list of files before a sync?.
> > >
> >
> > I would counsel in the strongest possible terms against that approach.
> >
> > Basically you have to be assured that none of your file names have
> > "wacky" characters in them, because handling "wacky" characters in file
> > names is exceedingly difficult. I cannot stress how hard it is and the
> > above example does not handle all "wacky" characters in file names.
> >
> > So what do I mean by "wacky" characters. Well remember a file name can
> > have just about anything in it on Linux with the exception of '/', and
> > users especially when using a GUI, and even more so if they are Mac
> > users can and do use what I will call "wacky" characters in their file
> > names.
> >
> > The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too.
> > Then there is the use of the wildcard characters, especially '?' but
> > also '*'.
> >
> > Not too difficult to handle you might say. Right now deal with a file
> > name with a newline character in it :-) Don't ask me how or why you even
> > do that but let me assure you that I have seen them on more than one
> > occasion. And now your dry run list is broken...
> >
> > Not only that if you have a few hundred million files to move a list
> > just becomes unwieldy anyway.
> >
> > One thing I didn't mention is that I would run anything with in a screen
> > (or tmux if that is your poison) and turn on logging.
> >
> > For those interested I am in the process of cleaning up the script a bit
> > and will post it somewhere in due course.
> >
> >
> > JAB.
> >
> > --
> > Jonathan A. Buzzard                         Tel: +44141-5483420
> > HPC System Administrator, ARCHIE-WeSt.
> > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 106, Issue 21
> ***********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201118/2fc06f24/attachment-0001.htm>

From valdis.kletnieks at vt.edu  Wed Nov 18 23:05:40 2020
From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)
Date: Wed, 18 Nov 2020 18:05:40 -0500
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
In-Reply-To: <7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
References: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
	<7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
Message-ID: <39863.1605740740@turing-police>

On Wed, 18 Nov 2020 11:48:52 +0000, Jonathan Buzzard said:

> So what do I mean by "wacky" characters. Well remember a file name can
> have just about anything in it on Linux with the exception of '/', and

You want to see some fireworks?  At least at one time, it was possible to use
a file system debugger that's all too trusting of hexadecimal input and create
a directory entry of '../'. Let's just say that fs/namei.c was also far too trusting,
and fsck was more than happy to make *different* errors than the kernel was....

> The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too.
> Then there is the use of the wildcard characters, especially '?' but
> also '*'.

Don't forget ESC, CR, LF, backticks, forward ticks, semicolons, and pretty much
anything else that will give a shell indigestion. SQL isn't the only thing prone to
injection attacks.. :)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201118/3a3689b5/attachment-0001.sig>

From chris.schlipalius at pawsey.org.au  Wed Nov 18 23:57:26 2020
From: chris.schlipalius at pawsey.org.au (Chris Schlipalius)
Date: Thu, 19 Nov 2020 07:57:26 +0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
In-Reply-To: <B81D7B2F-512E-4394-8D48-D6289B550432@pawsey.org.au>
References: <B81D7B2F-512E-4394-8D48-D6289B550432@pawsey.org.au>
Message-ID: <6288DF78-A9DF-4BE9-B166-4478EF8C2A20@pawsey.org.au>

?  I would counsel in the strongest possible terms against that approach.

?  Basically you have to be assured that none of your file names have "wacky" characters in them, because handling "wacky" characters in file

?  names is exceedingly difficult. I cannot stress how hard it is and the above example does not handle all "wacky" characters in file names.

 
Well that?s indeed another kettle of fish if you have irregular/special naming of files, no I didn?t cover that and if you have millions of files, yes a list would be unwieldy, then I would be tarring up dirs. before moving? and then untarring on GPFS ?or breaking up the list into sets or sub lists. 

If you have these wacky types of file names well there are fixes as in the rsync manpages? yes not easy but possible..

 
Ie

 
1.       -s, --protect-args

 
2.       As per usual you can escape the spaces, or substitute for spaces. rsync -avuz user at server1.com:"${remote_path// /\\ }" .

 
3.       Single quote the file name and path inside double quotes.

 
?  One thing I didn't mention is that I would run anything with in a screen (or tmux if that is your poison) and turn on logging.

 
Absolutely agree?

 
?  For those interested I am in the process of cleaning up the script a bit and will post it somewhere in due course.

?  JAB.

 
Would be interesting to see?.

 
I?ve also had success on GPFS with DCP and possibly this would be another option 

 
Regards,

Chris Schlipalius

 
Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey Supercomputing Centre (CSIRO)

1 Bryce Avenue

Kensington  WA  6151

Australia

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/e47e1eb4/attachment-0001.htm>

From marc.caubet at psi.ch  Thu Nov 19 15:34:39 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 15:34:39 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
	filesystem
Message-ID: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>

Hi,


I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.

I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.


Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?

Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.


Thanks a lot,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/32027b1f/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Thu Nov 19 15:49:30 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 15:49:30 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
Message-ID: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>

On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> Hi,
> 
> 
> I have a filesystem holding many projects (i.e., mounted under 
> /projects), each project is managed with filesets.
> 
> I have a new big project which should be placed on a separate filesystem 
> (blocksize, replication policy, etc. will be different, and subprojects 
> of it will be managed with filesets). Ideally, this filesystem should be 
> mounted in /projects/newproject.
> 
> 
> Technically, mounting a filesystem on top of an existing filesystem 
> should be possible, but, is this discouraged for any reason? How GPFS 
> would behave with that and is there a technical reason for avoiding this 
> setup?
> 
> Another alternative would be independent mount point + symlink, but I 
> really would prefer to avoid symlinks.

This has all the hallmarks of either a Windows admin or a newbie 
Linux/Unix admin :-)

Simply put /projects is mounted on top of whatever file system is 
providing the root file system in the first place LOL.

Linux/Unix and/or GPFS does not give a monkeys about mounting another 
file system *ANYWHERE* in it period because there is no other way of 
doing it.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From spectrumscale at kiranghag.com  Thu Nov 19 16:40:47 2020
From: spectrumscale at kiranghag.com (KG)
Date: Thu, 19 Nov 2020 22:10:47 +0530
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
	filesystem
In-Reply-To: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
Message-ID: <CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>

You can also set mount priority on filesystems so that gpfs can try to
mount them in order...parent first

On Thu, Nov 19, 2020, 21:19 Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:

> On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> > Hi,
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under
> > /projects), each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects
> > of it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem
> > should be possible, but, is this discouraged for any reason? How GPFS
> > would behave with that and is there a technical reason for avoiding this
> > setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
>
> This has all the hallmarks of either a Windows admin or a newbie
> Linux/Unix admin :-)
>
> Simply put /projects is mounted on top of whatever file system is
> providing the root file system in the first place LOL.
>
> Linux/Unix and/or GPFS does not give a monkeys about mounting another
> file system *ANYWHERE* in it period because there is no other way of
> doing it.
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/2709ca22/attachment-0001.htm>

From S.J.Thompson at bham.ac.uk  Thu Nov 19 16:42:07 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 19 Nov 2020 16:42:07 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
Message-ID: <2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>

If it is a remote cluster mount from your clients (hopefully!), you might want to look at priority to order mounting of the file-systems. I don?t know what would happen if the overmounted file-system went away, you would likely want to test.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "marc.caubet at psi.ch" <marc.caubet at psi.ch>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 November 2020 at 15:39
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem


Hi,


I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.

I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.


Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?

Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.


Thanks a lot,

Marc
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/21c0ec6e/attachment-0001.htm>

From marc.caubet at psi.ch  Thu Nov 19 16:48:07 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 16:48:07 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>,
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
Message-ID: <c9e651f7f37445dc86fedb56809db871@psi.ch>

Hi Jonathan,


thanks for sharing your opinions. In the sentence "Technically, mounting a filesystem on top of an existing filesystem should be possible" , I guess I was referring to that...

I was concerned about other technical reasons, such like how would this would affect GPFS policies, or how to properly proceed with proper mounting, or any other technical reasons to consider.

For the GPFS policies, I usually applied some of the existing GPFS policies based on directories, but after checking I realized that one can manage via device (never used policies in that way, at least for the simple but necessary use cases I have on the existing filesystems).


Thanks a lot,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
Sent: Thursday, November 19, 2020 4:49:30 PM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> Hi,
>
>
> I have a filesystem holding many projects (i.e., mounted under
> /projects), each project is managed with filesets.
>
> I have a new big project which should be placed on a separate filesystem
> (blocksize, replication policy, etc. will be different, and subprojects
> of it will be managed with filesets). Ideally, this filesystem should be
> mounted in /projects/newproject.
>
>
> Technically, mounting a filesystem on top of an existing filesystem
> should be possible, but, is this discouraged for any reason? How GPFS
> would behave with that and is there a technical reason for avoiding this
> setup?
>
> Another alternative would be independent mount point + symlink, but I
> really would prefer to avoid symlinks.

This has all the hallmarks of either a Windows admin or a newbie
Linux/Unix admin :-)

Simply put /projects is mounted on top of whatever file system is
providing the root file system in the first place LOL.

Linux/Unix and/or GPFS does not give a monkeys about mounting another
file system *ANYWHERE* in it period because there is no other way of
doing it.

JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/ea5d2595/attachment-0001.htm>

From marc.caubet at psi.ch  Thu Nov 19 17:01:37 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 17:01:37 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>,
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
Message-ID: <22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>

Hi Simon,


that's a very good point, thanks a lot :) I have it remotely mounted on a client cluster, so I will consider priorities when mounting the filesystems with remote cluster mount. That's very useful.

Also, as far as I saw, same approach can be also applied to local mounts (via mmchfs) during daemon startup with the same option --mount-priority.


Thanks a lot for the hints, these are very useful. I'll test that.


Cheers,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <S.J.Thompson at bham.ac.uk>
Sent: Thursday, November 19, 2020 5:42:07 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

If it is a remote cluster mount from your clients (hopefully!), you might want to look at priority to order mounting of the file-systems. I don?t know what would happen if the overmounted file-system went away, you would likely want to test.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "marc.caubet at psi.ch" <marc.caubet at psi.ch>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 November 2020 at 15:39
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem


Hi,


I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.

I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.


Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?

Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.


Thanks a lot,

Marc
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/f8d78653/attachment-0001.htm>

From janfrode at tanso.net  Thu Nov 19 17:34:05 2020
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Thu, 19 Nov 2020 18:34:05 +0100
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
	filesystem
In-Reply-To: <22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
Message-ID: <CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>

I would not mount a GPFS filesystem within a GPFS filesystem. Technically
it should work, but I?d expect it to cause surprises if ever the lower
filesystem experienced problems. Alone, a filesystem might recover
automatically by remounting. But if there?s another filesystem mounted
within, I expect it will be a problem..

Much better to use symlinks.


  -jf

tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
marc.caubet at psi.ch>:

> Hi Simon,
>
>
> that's a very good point, thanks a lot :) I have it remotely mounted on a
> client cluster, so I will consider priorities when mounting the filesystems
> with remote cluster mount. That's very useful.
>
> Also, as far as I saw, same approach can be also applied to local mounts
> (via mmchfs) during daemon startup with the same option --mount-priority.
>
>
> Thanks a lot for the hints, these are very useful. I'll test that.
>
>
> Cheers,
>
> Marc
> _________________________________________________________
> Paul Scherrer Institut
> High Performance Computing & Emerging Technologies
> Marc Caubet Serrabou
> Building/Room: OHSA/014
> Forschungsstrasse, 111
> 5232 Villigen PSI
> Switzerland
>
> Telephone: +41 56 310 46 67
> E-Mail: marc.caubet at psi.ch
> ------------------------------
> *From:* gpfsug-discuss-bounces at spectrumscale.org <
> gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> S.J.Thompson at bham.ac.uk>
> *Sent:* Thursday, November 19, 2020 5:42:07 PM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> filesystem
>
>
> If it is a remote cluster mount from your clients (hopefully!), you might
> want to look at priority to order mounting of the file-systems. I don?t
> know what would happen if the overmounted file-system went away, you would
> likely want to test.
>
>
>
> Simon
>
>
>
> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> marc.caubet at psi.ch" <marc.caubet at psi.ch>
> *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> gpfsug-discuss at spectrumscale.org>
> *Date: *Thursday, 19 November 2020 at 15:39
> *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> >
> *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> filesystem
>
>
>
> Hi,
>
>
>
> I have a filesystem holding many projects (i.e., mounted under /projects),
> each project is managed with filesets.
>
> I have a new big project which should be placed on a separate filesystem
> (blocksize, replication policy, etc. will be different, and subprojects of
> it will be managed with filesets). Ideally, this filesystem should be
> mounted in /projects/newproject.
>
>
>
> Technically, mounting a filesystem on top of an existing filesystem should
> be possible, but, is this discouraged for any reason? How GPFS would behave
> with that and is there a technical reason for avoiding this setup?
>
> Another alternative would be independent mount point + symlink, but I
> really would prefer to avoid symlinks.
>
>
>
> Thanks a lot,
>
> Marc
>
> _________________________________________________________
> Paul Scherrer Institut
> High Performance Computing & Emerging Technologies
> Marc Caubet Serrabou
> Building/Room: OHSA/014
>
> Forschungsstrasse, 111
>
> 5232 Villigen PSI
> Switzerland
>
> Telephone: +41 56 310 46 67
> E-Mail: marc.caubet at psi.ch
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/d2841588/attachment-0001.htm>

From skylar2 at uw.edu  Thu Nov 19 17:38:07 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Thu, 19 Nov 2020 09:38:07 -0800
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
Message-ID: <20201119173807.kormirvbweqs3un6@thargelion>

Agreed, not sure how the GPFS tools would react. An alternative to symlinks
would be bind mounts, if for some reason a tool doesn't behave properly
with a symlink in the path.

On Thu, Nov 19, 2020 at 06:34:05PM +0100, Jan-Frode Myklebust wrote:
> I would not mount a GPFS filesystem within a GPFS filesystem. Technically
> it should work, but I???d expect it to cause surprises if ever the lower
> filesystem experienced problems. Alone, a filesystem might recover
> automatically by remounting. But if there???s another filesystem mounted
> within, I expect it will be a problem..
> 
> Much better to use symlinks.
> 
> 
> 
>   -jf
> 
> tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
> marc.caubet at psi.ch>:
> 
> > Hi Simon,
> >
> >
> > that's a very good point, thanks a lot :) I have it remotely mounted on a
> > client cluster, so I will consider priorities when mounting the filesystems
> > with remote cluster mount. That's very useful.
> >
> > Also, as far as I saw, same approach can be also applied to local mounts
> > (via mmchfs) during daemon startup with the same option --mount-priority.
> >
> >
> > Thanks a lot for the hints, these are very useful. I'll test that.
> >
> >
> > Cheers,
> >
> > Marc
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> > Forschungsstrasse, 111
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > ------------------------------
> > *From:* gpfsug-discuss-bounces at spectrumscale.org <
> > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> > S.J.Thompson at bham.ac.uk>
> > *Sent:* Thursday, November 19, 2020 5:42:07 PM
> > *To:* gpfsug main discussion list
> > *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> > If it is a remote cluster mount from your clients (hopefully!), you might
> > want to look at priority to order mounting of the file-systems. I don???t
> > know what would happen if the overmounted file-system went away, you would
> > likely want to test.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> > marc.caubet at psi.ch" <marc.caubet at psi.ch>
> > *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> > gpfsug-discuss at spectrumscale.org>
> > *Date: *Thursday, 19 November 2020 at 15:39
> > *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> > >
> > *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under /projects),
> > each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects of
> > it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem should
> > be possible, but, is this discouraged for any reason? How GPFS would behave
> > with that and is there a technical reason for avoiding this setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
> >
> >
> >
> > Thanks a lot,
> >
> > Marc
> >
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> >
> > Forschungsstrasse, 111
> >
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From jonathan.buzzard at strath.ac.uk  Thu Nov 19 18:08:13 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 18:08:13 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
Message-ID: <acb8a49b-1537-b35a-8ff0-2ae63a3122a6@strath.ac.uk>

On 19/11/2020 17:34, Jan-Frode Myklebust wrote:
> 
> I would not mount a GPFS filesystem within a GPFS filesystem. 
> Technically it should work, but I?d expect it to cause surprises if ever 
> the lower filesystem experienced problems. Alone, a filesystem might 
> recover automatically by remounting. But if there?s another filesystem 
> mounted within, I expect it will be a problem..
> 
> Much better to use symlinks.
> 

Think about that for a minute...


I guess if you are worried about /projects going away (which would 
suggest something really bad has happened anyway) would be to mount the 
GPFS file system that is currently holding /projects somewhere else and 
then bind mount everything into /projects

At this point I would note that bind mounts are much better than 
symlinks which suck for this sort of application.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Thu Nov 19 18:12:03 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 18:12:03 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
	<CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
Message-ID: <2f789d09-3704-2d41-ef2a-953de178dce2@strath.ac.uk>

On 19/11/2020 16:40, KG wrote:
> You can also set mount priority on filesystems so that gpfs can try to 
> mount them in order...parent first
> 

One of the things that systemd brings to the table

https://github.com/systemd/systemd/commit/3519d230c8bafe834b2dac26ace49fcfba139823


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From marc.caubet at psi.ch  Thu Nov 19 18:13:08 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 18:13:08 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <20201119173807.kormirvbweqs3un6@thargelion>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>,
	<20201119173807.kormirvbweqs3un6@thargelion>
Message-ID: <0963457f2dfd418eabf8e1681ef2f801@psi.ch>


Hi all,


thanks a lot for your comments. Agreed, I better avoid it for now. I was concerned about how GPFS would behave in such case. For production I will take the safe route, but, just out of curiosity, I'll give it a try on a couple of test filesystems.


Thanks a lot for your help, it was very helpful,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Skylar Thompson <skylar2 at uw.edu>
Sent: Thursday, November 19, 2020 6:38:07 PM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

Agreed, not sure how the GPFS tools would react. An alternative to symlinks
would be bind mounts, if for some reason a tool doesn't behave properly
with a symlink in the path.

On Thu, Nov 19, 2020 at 06:34:05PM +0100, Jan-Frode Myklebust wrote:
> I would not mount a GPFS filesystem within a GPFS filesystem. Technically
> it should work, but I???d expect it to cause surprises if ever the lower
> filesystem experienced problems. Alone, a filesystem might recover
> automatically by remounting. But if there???s another filesystem mounted
> within, I expect it will be a problem..
>
> Much better to use symlinks.
>
>
>
>   -jf
>
> tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
> marc.caubet at psi.ch>:
>
> > Hi Simon,
> >
> >
> > that's a very good point, thanks a lot :) I have it remotely mounted on a
> > client cluster, so I will consider priorities when mounting the filesystems
> > with remote cluster mount. That's very useful.
> >
> > Also, as far as I saw, same approach can be also applied to local mounts
> > (via mmchfs) during daemon startup with the same option --mount-priority.
> >
> >
> > Thanks a lot for the hints, these are very useful. I'll test that.
> >
> >
> > Cheers,
> >
> > Marc
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> > Forschungsstrasse, 111
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > ------------------------------
> > *From:* gpfsug-discuss-bounces at spectrumscale.org <
> > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> > S.J.Thompson at bham.ac.uk>
> > *Sent:* Thursday, November 19, 2020 5:42:07 PM
> > *To:* gpfsug main discussion list
> > *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> > If it is a remote cluster mount from your clients (hopefully!), you might
> > want to look at priority to order mounting of the file-systems. I don???t
> > know what would happen if the overmounted file-system went away, you would
> > likely want to test.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> > marc.caubet at psi.ch" <marc.caubet at psi.ch>
> > *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> > gpfsug-discuss at spectrumscale.org>
> > *Date: *Thursday, 19 November 2020 at 15:39
> > *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> > >
> > *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under /projects),
> > each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects of
> > it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem should
> > be possible, but, is this discouraged for any reason? How GPFS would behave
> > with that and is there a technical reason for avoiding this setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
> >
> >
> >
> > Thanks a lot,
> >
> > Marc
> >
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> >
> > Forschungsstrasse, 111
> >
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/b2117811/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Thu Nov 19 18:32:39 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 18:32:39 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0963457f2dfd418eabf8e1681ef2f801@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
	<20201119173807.kormirvbweqs3un6@thargelion>
	<0963457f2dfd418eabf8e1681ef2f801@psi.ch>
Message-ID: <5b8edf06-a4ab-a39e-5a02-86fd7565b90a@strath.ac.uk>

On 19/11/2020 18:13, Caubet Serrabou Marc (PSI) wrote:
> 
> Hi all,
> 
> 
> thanks a lot for your comments. Agreed, I?better avoid it for now. I was 
> concerned about how GPFS would behave in such case. For production I 
> will take the safe route, but, just out of curiosity, I'll give it a try 
> on a couple of test filesystems.
> 

Don't use symlinks there is a range of applications that will break and 
you will confuse the hell out of your users as the fact you are not 
under /projects/new but /random/new is not hidden.

Besides which if the symlink goes away because /projects goes away then 
it is all a bust anyway.

If you are worried about /projects going away then the best plan is to 
mount the GPFS file systems somewhere else and then bind mount the 
directories into /projects on all the machines where they are mounted.

GPFS is quite happy with this. We bind mount /gpfs/users into /users and 
/gpfs/software into /opt/software by default. In the past I have bind 
mounted random paths for every user (hundred plus) into /home


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From novosirj at rutgers.edu  Thu Nov 19 18:34:09 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Thu, 19 Nov 2020 18:34:09 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
Message-ID: <A8A32368-1B23-4EA6-B91C-C4D9ACB646AC@rutgers.edu>

> On Nov 19, 2020, at 10:49 AM, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
>> Hi,
>> I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.
>> I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.
>> Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?
>> Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.
> 
> This has all the hallmarks of either a Windows admin or a newbie Linux/Unix admin :-)
> 
> Simply put /projects is mounted on top of whatever file system is providing the root file system in the first place LOL.
> 
> Linux/Unix and/or GPFS does not give a monkeys about mounting another file system *ANYWHERE* in it period because there is no other way of doing it.

Some others have said, but I disagree. It wasn?t that long ago that GPFS acted really screwy with systemd because it did something in a way other than Linux expected. As it is now, their devices are not /dev/whatever or server:/wherever like just about every other filesystem type. Not unreasonable to believe it would ?act funny? compared to other FS. 

I like GPFS a lot, but this is not one of my favorite characteristics of it.

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

From UWEFALKE at de.ibm.com  Thu Nov 19 19:18:41 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 19 Nov 2020 20:18:41 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Mounting_filesystem_on_top_of_an_exist?=
 =?utf-8?q?ing=09filesystem?=
In-Reply-To: <CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch><0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
	<CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
Message-ID: <OF751853B8.9C63EAA6-ONC1258625.0069CBE9-C1258625.006A14A0@notes.na.collabserv.com>

Just the risk your parent system dies which will block your access to the 
child file system mounted on a mount point within. 
If that is not  bothering , go ahead mount stacks . As for the symling 
though : it  is also gone if the parent dies :-). 

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   KG <spectrumscale at kiranghag.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   19/11/2020 17:41
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Mounting filesystem on top 
of an existing  filesystem
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


You can also set mount priority on filesystems so that gpfs can try to 
mount them in order...parent first

On Thu, Nov 19, 2020, 21:19 Jonathan Buzzard <
jonathan.buzzard at strath.ac.uk> wrote:
On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> Hi,
> 
> 
> I have a filesystem holding many projects (i.e., mounted under 
> /projects), each project is managed with filesets.
> 
> I have a new big project which should be placed on a separate filesystem 

> (blocksize, replication policy, etc. will be different, and subprojects 
> of it will be managed with filesets). Ideally, this filesystem should be 

> mounted in /projects/newproject.
> 
> 
> Technically, mounting a filesystem on top of an existing filesystem 
> should be possible, but, is this discouraged for any reason? How GPFS 
> would behave with that and is there a technical reason for avoiding this 

> setup?
> 
> Another alternative would be independent mount point + symlink, but I 
> really would prefer to avoid symlinks.

This has all the hallmarks of either a Windows admin or a newbie 
Linux/Unix admin :-)

Simply put /projects is mounted on top of whatever file system is 
providing the root file system in the first place LOL.

Linux/Unix and/or GPFS does not give a monkeys about mounting another 
file system *ANYWHERE* in it period because there is no other way of 
doing it.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From S.J.Thompson at bham.ac.uk  Thu Nov 19 19:37:52 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 19 Nov 2020 19:37:52 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0963457f2dfd418eabf8e1681ef2f801@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
	<20201119173807.kormirvbweqs3un6@thargelion>
	<0963457f2dfd418eabf8e1681ef2f801@psi.ch>
Message-ID: <738D41AC-6A07-453E-A2D1-C1882BE52EDC@bham.ac.uk>

My understanding was that this was perfectly acceptable in a GPFS system. i.e. mounting parts of file-systems in others. It has been suggested to us as a way of using different vendor GPFS systems (e.g. an ESS with someone elses) as a way of working round the licensing rules about ESS and anything else, but still giving a single user ?name space?. We didn?t go that route, and of course I might have misunderstood what was being suggested.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "marc.caubet at psi.ch" <marc.caubet at psi.ch>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 November 2020 at 18:13
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem


Hi all,


thanks a lot for your comments. Agreed, I better avoid it for now. I was concerned about how GPFS would behave in such case. For production I will take the safe route, but, just out of curiosity, I'll give it a try on a couple of test filesystems.


Thanks a lot for your help, it was very helpful,

Marc
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Skylar Thompson <skylar2 at uw.edu>
Sent: Thursday, November 19, 2020 6:38:07 PM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

Agreed, not sure how the GPFS tools would react. An alternative to symlinks
would be bind mounts, if for some reason a tool doesn't behave properly
with a symlink in the path.

On Thu, Nov 19, 2020 at 06:34:05PM +0100, Jan-Frode Myklebust wrote:
> I would not mount a GPFS filesystem within a GPFS filesystem. Technically
> it should work, but I???d expect it to cause surprises if ever the lower
> filesystem experienced problems. Alone, a filesystem might recover
> automatically by remounting. But if there???s another filesystem mounted
> within, I expect it will be a problem..
>
> Much better to use symlinks.
>
>
>
>   -jf
>
> tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
> marc.caubet at psi.ch>:
>
> > Hi Simon,
> >
> >
> > that's a very good point, thanks a lot :) I have it remotely mounted on a
> > client cluster, so I will consider priorities when mounting the filesystems
> > with remote cluster mount. That's very useful.
> >
> > Also, as far as I saw, same approach can be also applied to local mounts
> > (via mmchfs) during daemon startup with the same option --mount-priority.
> >
> >
> > Thanks a lot for the hints, these are very useful. I'll test that.
> >
> >
> > Cheers,
> >
> > Marc
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> > Forschungsstrasse, 111
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > ------------------------------
> > *From:* gpfsug-discuss-bounces at spectrumscale.org <
> > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> > S.J.Thompson at bham.ac.uk>
> > *Sent:* Thursday, November 19, 2020 5:42:07 PM
> > *To:* gpfsug main discussion list
> > *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> > If it is a remote cluster mount from your clients (hopefully!), you might
> > want to look at priority to order mounting of the file-systems. I don???t
> > know what would happen if the overmounted file-system went away, you would
> > likely want to test.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> > marc.caubet at psi.ch" <marc.caubet at psi.ch>
> > *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> > gpfsug-discuss at spectrumscale.org>
> > *Date: *Thursday, 19 November 2020 at 15:39
> > *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> > >
> > *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under /projects),
> > each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects of
> > it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem should
> > be possible, but, is this discouraged for any reason? How GPFS would behave
> > with that and is there a technical reason for avoiding this setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
> >
> >
> >
> > Thanks a lot,
> >
> > Marc
> >
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> >
> > Forschungsstrasse, 111
> >
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/b648e4d1/attachment-0001.htm>

From Kamil.Czauz at Squarepoint-Capital.com  Fri Nov 20 19:13:41 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Fri, 20 Nov 2020 19:13:41 +0000
Subject: [gpfsug-discuss] Poor client performance with high
	cpu	usage	of	mmfsd process
In-Reply-To: <OF2441E9DD.746C6053-ONC1258622.00456CFA-C1258622.004B9E2D@notes.na.collabserv.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com><BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com><OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain><OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
	<BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OF2441E9DD.746C6053-ONC1258622.00456CFA-C1258622.004B9E2D@notes.na.collabserv.com>
Message-ID: <BL0PR07MB38910770F6FD0D1D7E4626AEACFF0@BL0PR07MB3891.namprd07.prod.outlook.com>

Here is the output of waiters on 2 hosts that were having the issue today:

HOST 1
 [2020-11-20 09:07:53 root at nyzls149m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===
Waiting 0.0035 sec since 09:08:07, monitored, thread 135497 FileBlockReadFetchHandlerThread: on ThCond 0x7F615C152468 (MsgRecordCondvar), reason 'RPC wait' for NSD I/O completion on node 10.64.44.180 <c0n105>
Waiting 0.0036 sec since 09:08:07, monitored, thread 139228 PrefetchWorkerThread: on ThCond 0x7F627000D5D8 (MsgRecordCondvar), reason 'RPC wait' for NSD I/O completion on node 10.64.44.181 <c0n106>

[2020-11-20 09:08:07 root at nyzls149m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===

HOST 2
[2020-11-20 09:08:49 root at nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===
Waiting 0.0034 sec since 09:08:50, monitored, thread 345318 SharedHashTabFetchHandlerThread: on ThCond 0x7F049C001F08 (MsgRecordCondvar), reason 'RPC wait' for NSD I/O completion on node 10.64.44.133 <c1n2>

[2020-11-20 09:08:50 root at nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===

[2020-11-20 09:08:52 root at nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===


You can see the waiters go from 0 to 1-2 , but they are hardly blocking.

Yes there are separate pools for metadata for all of the filesystems here.


I did another trace today when the problem was happening - this time I was able to get a longer trace using the following command:
/usr/lpp/mmfs/bin/mmtracectl --start --trace=io --trace-file-size=512M --tracedev-write-mode=blocking --tracedev-buffer-size=64M -N nyzls149m


This is what the trsum output looks like:

Elapsed trace time:                                   62.412092000 seconds
Elapsed trace time from first VFS call to last:       62.412091999
Time idle between VFS calls:                           0.002913000 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  readpage                   0.003487000         9      387.444
  rdwr                       0.273721000       183     1495.743
  read_inode2                0.007304000       325       22.474
  follow_link                0.013952000        58      240.552
  pagein                     0.025974000        66      393.545
  getattr                    0.002792000        26      107.385
  revalidate                 0.009406000      2172        4.331
  create                    66.194479000         3 22064826.333
  open                       1.725505000        88    19608.011
  unlink                    18.685099000         1 18685099.000
  setattr                    0.011627000        14      830.500
  lookup                  2379.215514000       502  4739473.135
  delete_inode               0.015553000       328       47.418
  rename                    98.099073000         5 19619814.600
  release                    0.050574000        89      568.247
  permission                 0.007454000        73      102.110
  getxattr                   0.002346000        32       73.312
  statfs                     0.000081000         6       13.500
  mmap                       0.049809000        18     2767.167
  removexattr                0.000827000        14       59.071
  llseek                     0.000441000        47        9.383
  readdir                    0.002667000        34       78.441
Ops      4093 Secs      62.409178999  Ops/Sec       65.583

MaxFilesToCache is set to 12000 :
[common]
maxFilesToCache 12000


I only see gpfs_i_lookup in the tracefile, no gpfs_v_lookups
#  grep gpfs_i_lookup trcrpt.2020-11-20_09.20.38.283986.nyzls149m |wc -l
1097

They mostly look like this -

  62.346560 238895 TRACE_VNODE: gpfs_i_lookup exit: new inode 0xFFFF922178971A40 iNum 21980113 (0x14F63D1) cnP 0xFFFF922178971C88 retP 0x0 code 0 rc 0
  62.346955 238895 TRACE_VNODE: gpfs_i_lookup enter: diP 0xFFFF91A8A4991E00 dentryP 0xFFFF92C545A93500 name '20170323.txt' d_flags 0x80 d_count 1 unhashed 1
  62.367701 218442 TRACE_VNODE: gpfs_i_lookup exit: new inode 0xFFFF922071300000 iNum 29629892 (0x1C41DC4) cnP 0xFFFF922071300248 retP 0x0 code 0 rc 0
  62.367734 218444 TRACE_VNODE: gpfs_i_lookup enter: diP 0xFFFF9193CF457800 dentryP 0xFFFF9229527A89C0 name 'node.py' d_flags 0x80 d_count 1 unhashed 1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Monday, November 16, 2020 8:46 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi,
while the other nodes can well block the local one, as Frederick suggests,  there should at least be something visible locally waiting for these other nodes.
Looking at all waiters might be a good thing, but this case looks strange in other ways. Mind statement there are almost no local waiters and none of them gets older than 10 ms.

I am no developer nor do I have the code, so don't expect too much.  Can you tell what lookups you see (check in the trcrpt file, could be like gpfs_i_lookup or gpfs_v_lookup)?
Lookups are metadata ops, do you have a separate pool for your metadata?
How is that pool set up (doen to the physical block devices)?
Your trcsum down revealed 36 lookups, each one on avg taking >30ms. That is a lot (albeit the respective waiters won't show up at first glance as suspicious ...).
So, which waiters did you see  (hope you saved them, if not, do it next time).

What are the node you see this on and the whole cluster used for? What is the MaxFilesToCache setting (for that node and for others)? what HW is that, how big are your nodes (memory,CPU)?
To check the unreasonably short trace capture time: how large are the trcrpt files you obtain?


Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 14:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance
with high cpu   usage   of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

Regarding your previous message - waiters were coming / going with just
1-2 waiters when I ran the mmdiag command, with very low wait times (<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and this was the header result:

Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 cycles Measured cycle count update rate to be 2600001271 per second <---- using this value OS reported cycle count update rate as 2599999000 per second Trace milestones:
kernel trace enabled Fri Nov 13 08:20:01.800558000 2020 (TOD 1605273601.800558, cycles 20807897445779444) daemon trace enabled Fri Nov 13 08:20:01.910017000 2020 (TOD 1605273601.910017, cycles 20807897730372442) all streams included Fri Nov 13 08:20:26.423085049 2020 (TOD 1605273626.423085, cycles 20807961464381068) <---- useful part of trace extends from here trace quiesced Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, cycles 20807965037900696) <---- to here Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled with lookup commands / large lookup times. Can you help debug what those long lookup operations mean?

Unfinished operations:

27967 ***************** pagein ************** 1.362382116
27967 ***************** readpage ************** 1.362381516
139130 1.362448448 ********* Unfinished IO: buffer/disk 3002F670000
20:107498951168^\archive_data_16
104686 1.362022068 ********* Unfinished IO: buffer/disk 50011878000
1:47169618944^\archive_data_1
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFE
341710 1.362423815 ********* Unfinished IO: buffer/disk 20022218000
19:107498951680^\archive_data_15
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 18:3452986648^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFF
139150 1.361122006 ********* Unfinished IO: buffer/disk 50012018000
2:47169622016^\archive_data_2
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\00000000FFFFFFFF
95782 1.361112791 ********* Unfinished IO: buffer/disk 40016300000
20:107498950656^\archive_data_16
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\00000000FFFFFFFF
271076 1.361579585 ********* Unfinished IO: buffer/disk 20023DB8000
4:47169606656^\archive_data_4
341676 1.362018599 ********* Unfinished IO: buffer/disk 40038140000
5:47169614336^\archive_data_5
139150 1.361131599 MSG FSnd: nsdMsgReadExt msg_id 2930654492 Sduration
13292.382 + us
341676 1.362027104 MSG FSnd: nsdMsgReadExt msg_id 2930654495 Sduration
12396.877 + us
95782 1.361124739 MSG FSnd: nsdMsgReadExt msg_id 2930654491 Sduration
13299.242 + us
271076 1.361587653 MSG FSnd: nsdMsgReadExt msg_id 2930654493 Sduration
12836.328 + us
92182 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
341710 1.362429643 MSG FSnd: nsdMsgReadExt msg_id 2930654497 Sduration
11994.338 + us
341662 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
139130 1.362458376 MSG FSnd: nsdMsgReadExt msg_id 2930654498 Sduration
11965.605 + us
104686 1.362028772 MSG FSnd: nsdMsgReadExt msg_id 2930654496 Sduration
12395.209 + us
412373 0.775676657 MSG FRep: nsdMsgReadExt msg_id 304915249 Rduration
598747.324 us Rlen 262144 Hduration 598752.112 + us
341770 0.589739579 MSG FRep: nsdMsgReadExt msg_id 338079050 Rduration
784684.402 us Rlen 4 Hduration 784692.651 + us
143315 0.536252844 MSG FRep: nsdMsgReadExt msg_id 631945522 Rduration
838171.137 us Rlen 233472 Hduration 838174.299 + us
341878 0.134331812 MSG FRep: nsdMsgReadExt msg_id 338079023 Rduration
1240092.169 us Rlen 262144 Hduration 1240094.403 + us
175478 0.587353287 MSG FRep: nsdMsgReadExt msg_id 338079047 Rduration
787070.694 us Rlen 262144 Hduration 787073.990 + us
139558 0.633517347 MSG FRep: nsdMsgReadExt msg_id 631945538 Rduration
740906.634 us Rlen 102400 Hduration 740910.172 + us
143308 0.958832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration
415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time: 1.374423981 seconds
Elapsed trace time from first VFS call to last: 1.374423980 Time idle between VFS calls: 0.001603738 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs readpage 1.151660085 1874 614.546 rdwr 0.431456904 581 742.611
read_inode2 0.001180648 934 1.264
follow_link 0.000029502 7 4.215
getattr 0.000048413 9 5.379
revalidate 0.000007080 67 0.106
pagein 1.149699537 1877 612.520
create 0.007664829 9 851.648
open 0.001032657 19 54.350
unlink 0.002563726 14 183.123
delete_inode 0.000764598 826 0.926
lookup 0.312847947 953 328.277
setattr 0.020651226 824 25.062
permission 0.000015018 1 15.018
rename 0.000529023 4 132.256
release 0.001613800 22 73.355
getxattr 0.000030494 6 5.082
mmap 0.000054767 1 54.767
llseek 0.000001130 4 0.283
readdir 0.000033947 2 16.973
removexattr 0.002119736 820 2.585

User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
42625 0.000000138 0.000031017 0.44% 99.56% 3
42378 0.000586959 0.011596801 4.82% 95.18% 32
42627 0.000000272 0.000013421 1.99% 98.01% 2
42641 0.003284590 0.012593594 20.69% 79.31% 35
42628 0.001522335 0.000002748 99.82% 0.18% 2
25464 0.003462795 0.500281914 0.69% 99.31% 12
301420 0.000016711 0.052848218 0.03% 99.97% 38
95103 0.000000544 0.000000000 100.00% 0.00% 1
145858 0.000000659 0.000794896 0.08% 99.92% 2
42221 0.000011484 0.000039445 22.55% 77.45% 5
371718 0.000000707 0.001805425 0.04% 99.96% 2
95109 0.000000880 0.008998763 0.01% 99.99% 2
95337 0.000010330 0.503057866 0.00% 100.00% 8
42700 0.002442175 0.012504429 16.34% 83.66% 35
189680 0.003466450 0.500128627 0.69% 99.31% 9
42681 0.006685396 0.000391575 94.47% 5.53% 16
42702 0.000048203 0.000000500 98.97% 1.03% 2
42703 0.000033280 0.140102087 0.02% 99.98% 9
224423 0.000000195 0.000000000 100.00% 0.00% 1
42706 0.000541098 0.000014713 97.35% 2.65% 3
106275 0.000000456 0.000000000 100.00% 0.00% 1
42721 0.000372857 0.000000000 100.00% 0.00% 1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi Kamil,
in my mail just a few minutes ago I'd overlooked that the buffer size in your trace was indeed 128M (I suppose the trace file is adapting that size if not set in particular). That is very strange, even under high load, the trace should then capture some longer time than 10 secs, and , most of all, it should contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when everything is normal and check that out - you should see many records, the trcsum.awk should list just a small portion of unfinished ops at the end, ... If that is fine, then the tracing itself is affected by your crritical condition (never experienced that before - rather GPFS grinds to a halt than the trace is abandoned), and that might well be worth a support ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: Uwe Falke/Germany/IBM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 10:21
Subject: Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process


Hi, Kamil,
looks your tracefile setting has been too low:
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here means you effectively captured a period of about 5ms only ... you can't see much from that.

I'd assumed the default trace file size would be sufficient here but it doesn't seem to.
try running with something like
mmtracectl --start --trace-file-size=512M --trace=io --tracedev-write-mode=overwrite -N <your_critical_node>.

However, if you say "no major waiter" - how many waiters did you see at any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload.

The very short cyle time of the trace points, OTOH, to high activity, OTOH the trace file setting appears quite low (trace=io doesnt' collect many trace infos, just basic IO stuff).
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 03:33
Subject: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 100.019054 seconds and
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using this value OS reported cycle count update rate as 2599999000 per second Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220) daemon trace enabled Thu Nov
12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles
20701293488095152) all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893 Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771 Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete. This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science. I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier. Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201120/8259daa6/attachment-0001.htm>

From hooft at natlab.research.philips.com  Sat Nov 21 00:37:01 2020
From: hooft at natlab.research.philips.com (Peter van Hooft)
Date: Sat, 21 Nov 2020 01:37:01 +0100
Subject: [gpfsug-discuss] mmchdisk /dev/fs start -a progress
Message-ID: <20201121003701.GA32509@pc67340132.natlab.research.philips.com>


Hello,

Is it possible to find out the progress of the 'mmchdisk /dev/fs start -a'
command when the controlling terminal had been lost?

We can see the task running on the fs manager node with 'mmdiag --commands' with
attributes 'hold PIT/disk waitTime 0'
We are starting to worry the mmchdisk is taking too long, and see continuously waiters like
Waiting 3.1946 sec since 01:28:23, ignored, thread 22092 TSCHDISKCmdThread: on ThCond 0x180267573D0 (SGManagementMgrDataCondvar), reason 'waiting for stripe group to recover'

Thanks for any hints.

Peter van Hooft
Philips Research


From jonathan.buzzard at strath.ac.uk  Sat Nov 21 10:13:42 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 21 Nov 2020 10:13:42 +0000
Subject: [gpfsug-discuss] mmchdisk /dev/fs start -a progress
In-Reply-To: <20201121003701.GA32509@pc67340132.natlab.research.philips.com>
References: <20201121003701.GA32509@pc67340132.natlab.research.philips.com>
Message-ID: <ecf94819-9cd5-7868-4d7f-13e9e43bc87a@strath.ac.uk>

On 21/11/2020 00:37, Peter van Hooft wrote:
> 
> Hello,
> 
> Is it possible to find out the progress of the 'mmchdisk /dev/fs start -a'
> command when the controlling terminal had been lost?
> 

I don't think so. You are lucky it is still running

> We can see the task running on the fs manager node with 'mmdiag --commands' with
> attributes 'hold PIT/disk waitTime 0'
> We are starting to worry the mmchdisk is taking too long, and see continuously waiters like
> Waiting 3.1946 sec since 01:28:23, ignored, thread 22092 TSCHDISKCmdThread: on ThCond 0x180267573D0 (SGManagementMgrDataCondvar), reason 'waiting for stripe group to recover'
> 
> Thanks for any hints.
> 

Not that this is going to help this time, but it is why you should 
*ALWAYS* without exception run these sorts of commands within a 
screen/tmux session so when you loose the connection to the server you 
can just reconnect and pick it up again.

This is introductory system administration 101. No critical or long 
running command should ever be dependant on a remote controlling 
terminal. If you can't run them locally then run them in a screen or 
tmux session.

There are plenty of good howto's for both screen and tmux on the 
internet. Depending on which distribution you use I would note that 
RedHat have very annoyingly and for completely specious reasons removed 
screen from RHEL8 and left tmux. So if you are starting from scratch 
tmux is the one to learn :-(


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From robert.horton at icr.ac.uk  Mon Nov 23 15:06:05 2020
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Mon, 23 Nov 2020 15:06:05 +0000
Subject: [gpfsug-discuss] AFM experiences?
Message-ID: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>

Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

From novosirj at rutgers.edu  Mon Nov 23 15:30:47 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Mon, 23 Nov 2020 15:30:47 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>

We use it similar to how you describe it. We now run 5.0.4.1 on the client side (I mean actual client nodes, not the home or cache clusters). Before that, we had reliability problems (failure to cache libraries of programs that were executing, etc.). The storage clusters in our case are 5.0.3-2.3.

We also got bit by the quotas thing. You have to set them the same on both sides, or you will have problems. It seems a little silly that they are not kept in sync by GPFS, but that?s how it is. If memory serves, the result looked like an AFM failure (queue not being cleared), but it turned out to be that the files just could not be written at the home cluster because the user was over quota there. I also think I?ve seen load average increase due to this sort of thing, but I may be mixing that up with another problem scenario.

We monitor via Nagios which I believe monitors using mmafmctl commands. Really can?t think of a single time, apart from the other day, where the queue backed up. The instance the other day only lasted a few minutes (if you suddenly create many small files, like installing new software, it may not catch up instantly).

--
#BlackLivesMatter
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Nov 23, 2020, at 10:19, Robert Horton <robert.horton at icr.ac.uk> wrote:

?Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

--
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201123/a5efe006/attachment-0001.htm>

From dean.flanders at fmi.ch  Mon Nov 23 17:58:12 2020
From: dean.flanders at fmi.ch (Flanders, Dean)
Date: Mon, 23 Nov 2020 17:58:12 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
	<440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>
Message-ID: <b37dedb470bb426692cdcc9c560d7b09@ex2013mbx2.fmi.ch>

Hello Rob,

We looked at AFM years ago for DR, but after reading the bug reports, we avoided it, and also have had seen a case where it had to be removed from one customer, so we have kept things simple. Now looking again a few years later there are still issues, IBM Spectrum Scale Active File Management (AFM) issues which may result in undetected data corruption<https://www.ibm.com/support/pages/ibm-spectrum-scale-active-file-management-afm-issues-which-may-result-undetected-data-corruption>, and that was just my first google hit. We have kept it simple, and use a parallel rsync process with policy engine and can hit wire speed for copying of millions of small files in order to have isolation between the sites at GB/s. I am not saying it is bad, just that it needs an appropriate risk/reward ratio to implement as it increases overall complexity.

Kind regards,

Dean

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
Sent: Monday, November 23, 2020 4:31 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM experiences?

We use it similar to how you describe it. We now run 5.0.4.1 on the client side (I mean actual client nodes, not the home or cache clusters). Before that, we had reliability problems (failure to cache libraries of programs that were executing, etc.). The storage clusters in our case are 5.0.3-2.3.

We also got bit by the quotas thing. You have to set them the same on both sides, or you will have problems. It seems a little silly that they are not kept in sync by GPFS, but that?s how it is. If memory serves, the result looked like an AFM failure (queue not being cleared), but it turned out to be that the files just could not be written at the home cluster because the user was over quota there. I also think I?ve seen load average increase due to this sort of thing, but I may be mixing that up with another problem scenario.

We monitor via Nagios which I believe monitors using mmafmctl commands. Really can?t think of a single time, apart from the other day, where the queue backed up. The instance the other day only lasted a few minutes (if you suddenly create many small files, like installing new software, it may not catch up instantly).

--
#BlackLivesMatter
____
|| \\UTGERS<file://UTGERS>,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'


On Nov 23, 2020, at 10:19, Robert Horton <robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk>> wrote:
?Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

--
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk<http://www.icr.ac.uk> |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201123/eb7b3f63/attachment-0001.htm>

From abeattie at au1.ibm.com  Mon Nov 23 21:54:39 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Mon, 23 Nov 2020 21:54:39 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <OFEDB69982.350EFD12-ON00258629.00785C1F-1606168479528@notes.na.collabserv.com>


Rob,

Talk to Jake Carroll from the University of Queensland, he has done a
number of presentations at Scale User Groups of UQ?s MeDiCI data fabric
which is based on Spectrum Scale and does very aggressive use of AFM.

Their use of AFM is not only on campus, but to remote Storage clusters
between 30km and 1500km away from their Home cluster.  They have also
tested AFM between Australia, Japan, and USA

Sent from my iPhone

> On 24 Nov 2020, at 01:20, Robert Horton <robert.horton at icr.ac.uk> wrote:
>
> ?Hi all,
>
> We're thinking about deploying AFM and would be interested in hearing
> from anyone who has used it in anger - particularly independent writer.
>
> Our scenario is we have a relatively large but slow (mainly because it
> is stretched over two sites with a 10G link) cluster for long/medium-
> term storage and a smaller but faster cluster for scratch storage in
> our HPC system. What we're thinking of doing is using some/all of the
> scratch capacity as an IW cache of some/all of the main cluster, the
> idea to reduce the need for people to manually move data between the
> two.
>
> It seems to generally work as expected in a small test environment,
> although we have a few concerns:
>
> - Quota management on the home cluster - we need a way of ensuring
> people don't write data to the cache which can't be accomodated on
> home. Probably not insurmountable but needs a bit of thought...
>
> - It seems inodes on the cache only get freed when they are deleted on
> the cache cluster - not if they get deleted from the home cluster or
> when the blocks are evicted from the cache. Does this become an issue
> in time?
>
> If anyone has done anything similar I'd be interested to hear how you
> got on. It would be intresting to know if you created a cache fileset
> for each home fileset or just one for the whole lot, as well as any
> other pearls of wisdom you may have to offer.
>
> Thanks!
> Rob
>
> --
> Robert Horton | Research Data Storage Lead
> The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
> T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
> Twitter @ICR_London
> Facebook: www.facebook.com/theinstituteofcancerresearch
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable
Company Limited by Guarantee, Registered in England under Company No.
534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the addressee only.
If the message is received by anyone other than the addressee, please
return the message to the sender by replying to it and then delete the
message from your computer and network.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201123/0d01cf78/attachment-0001.htm>

From novosirj at rutgers.edu  Mon Nov 23 23:14:08 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Mon, 23 Nov 2020 23:14:08 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <OFEDB69982.350EFD12-ON00258629.00785C1F-1606168479528@notes.na.collabserv.com>
References: <OFEDB69982.350EFD12-ON00258629.00785C1F-1606168479528@notes.na.collabserv.com>
Message-ID: <2C7317A6-B9DF-450A-92A6-AE156396204A@rutgers.edu>

Ours are about 50 and 100 km from the home cluster, but it?s over 100Gb fiber.

> On Nov 23, 2020, at 4:54 PM, Andrew Beattie <abeattie at au1.ibm.com> wrote:
> 
> Rob,
> 
> Talk to Jake Carroll from the University of Queensland, he has done a number of presentations at Scale User Groups of UQ?s MeDiCI data fabric which is based on Spectrum Scale and does very aggressive use of AFM.
> 
> Their use of AFM is not only on campus, but to remote Storage clusters between 30km and 1500km away from their Home cluster. They have also tested AFM between Australia, Japan, and USA
> 
> Sent from my iPhone
> 
> > On 24 Nov 2020, at 01:20, Robert Horton <robert.horton at icr.ac.uk> wrote:
> > 
> > ?Hi all,
> > 
> > We're thinking about deploying AFM and would be interested in hearing
> > from anyone who has used it in anger - particularly independent writer.
> > 
> > Our scenario is we have a relatively large but slow (mainly because it
> > is stretched over two sites with a 10G link) cluster for long/medium-
> > term storage and a smaller but faster cluster for scratch storage in
> > our HPC system. What we're thinking of doing is using some/all of the
> > scratch capacity as an IW cache of some/all of the main cluster, the
> > idea to reduce the need for people to manually move data between the
> > two.
> > 
> > It seems to generally work as expected in a small test environment,
> > although we have a few concerns:
> > 
> > - Quota management on the home cluster - we need a way of ensuring
> > people don't write data to the cache which can't be accomodated on
> > home. Probably not insurmountable but needs a bit of thought...
> > 
> > - It seems inodes on the cache only get freed when they are deleted on
> > the cache cluster - not if they get deleted from the home cluster or
> > when the blocks are evicted from the cache. Does this become an issue
> > in time?
> > 
> > If anyone has done anything similar I'd be interested to hear how you
> > got on. It would be intresting to know if you created a cache fileset
> > for each home fileset or just one for the whole lot, as well as any
> > other pearls of wisdom you may have to offer.
> > 
> > Thanks!
> > Rob
> > 
> > -- 
> > Robert Horton | Research Data Storage Lead
> > The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
> > T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
> > Twitter @ICR_London
> > Facebook: www.facebook.com/theinstituteofcancerresearch
> > 
> > The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
> > 
> > This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'


From vpuvvada at in.ibm.com  Tue Nov 24 02:32:01 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Tue, 24 Nov 2020 08:02:01 +0530
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <OF7C863960.89C105B0-ON6525862A.000D46CF-6525862A.000DEB0B@notes.na.collabserv.com>

>- Quota management on the home cluster - we need a way of ensuring
>people don't write data to the cache which can't be accomodated on
>home. Probably not insurmountable but needs a bit of thought...

You could set same quotas between cache and home clusters. AFM does not 
support replication of filesystem metadata like quotas, fileset 
configuration etc...

>- It seems inodes on the cache only get freed when they are deleted on
>the cache cluster - not if they get deleted from the home cluster or
>when the blocks are evicted from the cache. Does this become an issue
>in time?

AFM periodically revalidates with home cluster. If the files/dirs were 
already deleted at home cluster, AFM moves them to <fileset path>/.ptrash 
directory at cache cluster during the revalidation. These files can be 
removed manually by user or auto eviction process. If the .ptrash 
directory is not cleaned up on time, it might result into quota issues at 
cache cluster.

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/23/2020 08:51 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM experiences?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only.  If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/71e61c21/attachment-0001.htm>

From vpuvvada at in.ibm.com  Tue Nov 24 02:37:18 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Tue, 24 Nov 2020 08:07:18 +0530
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <b37dedb470bb426692cdcc9c560d7b09@ex2013mbx2.fmi.ch>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk><440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>
	<b37dedb470bb426692cdcc9c560d7b09@ex2013mbx2.fmi.ch>
Message-ID: <OFE7699146.DC34213F-ON6525862A.000DF453-6525862A.000E66F1@notes.na.collabserv.com>

Dean,

This is one of the corner case which is associated with sparse files at 
the home cluster. You could try with latest versions of scale, AFM 
indepedent-writer mode have many performance/functional improvements in 
newer releases. 

~Venkat (vpuvvada at in.ibm.com)


From:   "Flanders, Dean" <dean.flanders at fmi.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   11/23/2020 11:44 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] AFM experiences?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello Rob,
 
We looked at AFM years ago for DR, but after reading the bug reports, we 
avoided it, and also have had seen a case where it had to be removed from 
one customer, so we have kept things simple. Now looking again a few years 
later there are still issues, IBM Spectrum Scale Active File Management 
(AFM) issues which may result in undetected data corruption, and that was 
just my first google hit. We have kept it simple, and use a parallel rsync 
process with policy engine and can hit wire speed for copying of millions 
of small files in order to have isolation between the sites at GB/s. I am 
not saying it is bad, just that it needs an appropriate risk/reward ratio 
to implement as it increases overall complexity.
 
Kind regards,
 
Dean
 
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
Sent: Monday, November 23, 2020 4:31 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM experiences?
 
We use it similar to how you describe it. We now run 5.0.4.1 on the client 
side (I mean actual client nodes, not the home or cache clusters). Before 
that, we had reliability problems (failure to cache libraries of programs 
that were executing, etc.). The storage clusters in our case are 
5.0.3-2.3. 
 
We also got bit by the quotas thing. You have to set them the same on both 
sides, or you will have problems. It seems a little silly that they are 
not kept in sync by GPFS, but that?s how it is. If memory serves, the 
result looked like an AFM failure (queue not being cleared), but it turned 
out to be that the files just could not be written at the home cluster 
because the user was over quota there. I also think I?ve seen load average 
increase due to this sort of thing, but I may be mixing that up with 
another problem scenario. 

We monitor via Nagios which I believe monitors using mmafmctl commands. 
Really can?t think of a single time, apart from the other day, where the 
queue backed up. The instance the other day only lasted a few minutes (if 
you suddenly create many small files, like installing new software, it may 
not catch up instantly). 
 
-- 
#BlackLivesMatter
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS 
Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, 
Newark
    `'


On Nov 23, 2020, at 10:19, Robert Horton <robert.horton at icr.ac.uk> wrote:
?Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only.  If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/e89b392c/attachment-0001.htm>

From vpuvvada at in.ibm.com  Tue Nov 24 02:41:21 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Tue, 24 Nov 2020 08:11:21 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>
Message-ID: <OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>

AFM provides near zero downtime for migration.  As of today,  AFM 
migration does not support ACLs or other EAs migration from non scale 
(GPFS) source.

https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1ins_uc_migrationusingafmmigrationenhancements.htm

~Venkat (vpuvvada at in.ibm.com)


From:   "Frederick Stock" <stockf at us.ibm.com>
To:     gpfsug-discuss at spectrumscale.org
Cc:     gpfsug-discuss at spectrumscale.org
Date:   11/17/2020 03:14 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over       NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Have you considered using the AFM feature of Spectrum Scale?  I doubt it 
will provide any speed improvement but it would allow for data to be 
accessed as it was being migrated.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com
 
 
----- Original message -----
From: Andi Christiansen <andi at christiansen.xxx>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon 
to Scale over NFS?
Date: Mon, Nov 16, 2020 2:44 PM
 
Hi all,
 
i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes...
 
as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 
 
does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit...
 
 
if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/b7fd7863/attachment-0001.htm>

From luke.raimbach at googlemail.com  Tue Nov 24 12:16:55 2020
From: luke.raimbach at googlemail.com (Luke Raimbach)
Date: Tue, 24 Nov 2020 12:16:55 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <CAAGb8Nu4wATFNhUnva0r7gw-N3t7zP9GFR-bs+-Nhe4kwNafVA@mail.gmail.com>

Hi Rob,

Some things to think about from experiences a year or so ago...

If you intend to perform any HPC workload (writing / updating / deleting
files) inside a cache, then appropriately specified gateway nodes will be
your friend:

1. When creating, updating or deleting files in the cache, each operation
requires acknowledgement from the gateway handling that particular cache,
before returning ACK to the application. This will add a latency overhead
to the workload - if your storage is IB connected to the compute cluster
and using verbsRdmaSend for example, this will increase your happiness.
Connecting low-spec gateway nodes over 10GbE with the expectation that they
will "drain down" over time was a sore learning experience in the early
days of AFM for me.

2. AFM queues can quickly eat up memory. I think around 350bytes of memory
is consumed for each operation in the AFM queue, so if you have huge file
churn inside a cache then the queue will grow very quickly. If you run out
of memory, the node dies and you enter cache recovery when it comes back up
(or another node takes over). This can end up cycling the node as it tries
to revalidate a cache and keep up with any other queues. Get more memory!

I've not used AFM for a while now and I think the latter enormity has some
mitigation against create / delete cycles (i.e. the create operation is
expunged from the queue instead of two operations being played back to the
home). I expect IBM experts will tell you more about those improvements.
Also, several smaller caches are better than one large one (parallel
execution of queues helps utilise the available bandwidth and you have a
better failover spread if you have multiple gateways, for example).

Independent Writer mode comes with some small danger (user error or
impatience mainly) inasmuch as whoever updates a file last will win; e.g.
home user A writes a file, then cache user B updates the file after reading
it and tells user A the update is complete, when really the gateway queue
is long and the change is waiting to go back home. User A uses the file
expecting the changes are made, then updates it with some results.
Meanwhile the AFM queue drains down and user B's change arrives after user
A has completed their changes. The interim version of the file user B
modified will persist at home and user A's latest changes are lost. Some
careful thought about workflow (or good user training about eventual
consistency) will save some potential misery on this front.

Hope this helps,
Luke


On Mon, 23 Nov 2020 at 15:19, Robert Horton <robert.horton at icr.ac.uk> wrote:

> Hi all,
>
> We're thinking about deploying AFM and would be interested in hearing
> from anyone who has used it in anger - particularly independent writer.
>
> Our scenario is we have a relatively large but slow (mainly because it
> is stretched over two sites with a 10G link) cluster for long/medium-
> term storage and a smaller but faster cluster for scratch storage in
> our HPC system. What we're thinking of doing is using some/all of the
> scratch capacity as an IW cache of some/all of the main cluster, the
> idea to reduce the need for people to manually move data between the
> two.
>
> It seems to generally work as expected in a small test environment,
> although we have a few concerns:
>
> - Quota management on the home cluster - we need a way of ensuring
> people don't write data to the cache which can't be accomodated on
> home. Probably not insurmountable but needs a bit of thought...
>
> - It seems inodes on the cache only get freed when they are deleted on
> the cache cluster - not if they get deleted from the home cluster or
> when the blocks are evicted from the cache. Does this become an issue
> in time?
>
> If anyone has done anything similar I'd be interested to hear how you
> got on. It would be intresting to know if you created a cache fileset
> for each home fileset or just one for the whole lot, as well as any
> other pearls of wisdom you may have to offer.
>
> Thanks!
> Rob
>
> --
> Robert Horton | Research Data Storage Lead
> The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
> T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
> Twitter @ICR_London
> Facebook: www.facebook.com/theinstituteofcancerresearch
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable
> Company Limited by Guarantee, Registered in England under Company No.
> 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the addressee only.  If
> the message is received by anyone other than the addressee, please return
> the message to the sender by replying to it and then delete the message
> from your computer and network.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/0c2dc53e/attachment-0001.htm>

From yeep at robust.my  Tue Nov 24 14:09:34 2020
From: yeep at robust.my (T.A. Yeep)
Date: Tue, 24 Nov 2020 22:09:34 +0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>
	<OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>
Message-ID: <CAFaHHOzDk9rxZJSouJoHOm++Z0HuMKqUPrngKMwXOZknOS8Y+Q@mail.gmail.com>

Hi Venkat,

If ACLs and other EAs migration from non scale is not supported by AFM, is
there any 3rd party tool that could complement that when paired with AFM?

On Tue, Nov 24, 2020 at 10:41 AM Venkateswara R Puvvada <vpuvvada at in.ibm.com>
wrote:

> AFM provides near zero downtime for migration.  As of today,  AFM
> migration does not support ACLs or other EAs migration from non scale
> (GPFS) source.
>
>
> https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1ins_uc_migrationusingafmmigrationenhancements.htm
>
> ~Venkat (vpuvvada at in.ibm.com)
>
>
>
> From:        "Frederick Stock" <stockf at us.ibm.com>
> To:        gpfsug-discuss at spectrumscale.org
> Cc:        gpfsug-discuss at spectrumscale.org
> Date:        11/17/2020 03:14 AM
> Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data
> from Isilon to Scale over        NFS?
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Have you considered using the AFM feature of Spectrum Scale?  I doubt it
> will provide any speed improvement but it would allow for data to be
> accessed as it was being migrated.
>
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> stockf at us.ibm.com
>
>
> ----- Original message -----
> From: Andi Christiansen <andi at christiansen.xxx>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
> Cc:
> Subject: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon
> to Scale over NFS?
> Date: Mon, Nov 16, 2020 2:44 PM
>
> Hi all,
>
> i have got a case where a customer wants 700TB migrated from isilon to
> Scale and the only way for him is exporting the same directory on NFS from
> two different nodes...
>
> as of now we are using multiple rsync processes on different parts of
> folders within the main directory. this is really slow and will take
> forever.. right now 14 rsync processes spread across 3 nodes fetching from
> 2..
>
> does anyone know of a way to speed it up? right now we see from 1Gbit to
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from
> scale nodes and 20Gbits from isilon so we should be able to reach just
> under 20Gbit...
>
>
> if anyone have any ideas they are welcome!
>
>
> Thanks in advance
> Andi Christiansen
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Best regards

*T.A. Yeep*Mobile: +6-016-719 8506 | Tel: +6-03-7628 0526 |
www.robusthpc.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/02bbc9ee/attachment-0001.htm>

From chair at spectrumscale.org  Tue Nov 24 09:39:47 2020
From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair))
Date: Tue, 24 Nov 2020 09:39:47 +0000
Subject: [gpfsug-discuss] SSUG::Digital with CIUK
Message-ID: <>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/d3275165/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: meeting.ics
Type: text/calendar
Size: 2623 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/d3275165/attachment-0001.ics>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 3499622 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/d3275165/attachment-0001.png>

From prasad.surampudi at theatsgroup.com  Tue Nov 24 16:05:19 2020
From: prasad.surampudi at theatsgroup.com (Prasad Surampudi)
Date: Tue, 24 Nov 2020 16:05:19 +0000
Subject: [gpfsug-discuss] mmhealth reports fserrinvalid errors on CNFS
	servers
Message-ID: <MN2PR13MB297645D9A920B7E5E2D16C8F9EFB0@MN2PR13MB2976.namprd13.prod.outlook.com>

We are seeing fserrinvalid error on couple of filesystems in Spectrum Scale cluster. These errors are reported but mmhealth  only couple of nodes  (CNFS servers) in the cluster, but mmhealth on other nodes shows no issues. Any idea what this error means? And why its reported on CNFS servers and not on other nodes? What need to be done to fix this issue?

sudo /usr/lpp/mmfs/bin/mmhealth node show FILESYSTEM -v

Node name:      cnfs05-gpfs

Component      Status              Reasons
-------------------------------------------------------------------
FILESYSTEM     DEGRADED            fserrinvalid(vol)
  argus        HEALTHY             -
  dytech       HEALTHY             -
  enlnt_E      HEALTHY             -
  enlnt_Es     HEALTHY             -
  haaforfs     HEALTHY             -
  haaforfs2    HEALTHY             -
  historical   HEALTHY             -
  prcfs        HEALTHY             -
  qmtfs        HEALTHY             -
  research     HEALTHY             -
  research2    HEALTHY             -
  schon_raw    HEALTHY             -
  uhdb_vol1    HEALTHY             -
  vol          DEGRADED            fserrinvalid(vol)

Event                Parameter      Severity            Event Message
----------------------------------------------------------------------------------------------------------
fserrinvalid         vol            ERROR               FS=vol,ErrNo=1124,Unknown error=0464000000010000000180A108BC000079B4000000000000003400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/ee724fed/attachment-0001.htm>

From NSCHULD at de.ibm.com  Tue Nov 24 16:44:35 2020
From: NSCHULD at de.ibm.com (Norbert Schuld)
Date: Tue, 24 Nov 2020 17:44:35 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?mmhealth_reports_fserrinvalid_errors_o?=
 =?utf-8?q?n_CNFS=09servers?=
In-Reply-To: <MN2PR13MB297645D9A920B7E5E2D16C8F9EFB0@MN2PR13MB2976.namprd13.prod.outlook.com>
References: <MN2PR13MB297645D9A920B7E5E2D16C8F9EFB0@MN2PR13MB2976.namprd13.prod.outlook.com>
Message-ID: <OFCA885747.7355EDC3-ONC125862A.005B0FC2-C125862A.005BF90D@notes.na.collabserv.com>


To get an explanation for any event one can ask the system:

# mmhealth event show fserrinvalid

Event Name:              fserrinvalid

Event ID:                999338

Description:             Unrecognized FSSTRUCT error received. Check
documentation

Cause:                   A filesystem corruption detected

User Action:             Check error message for details and the
mmfs.log.latest log for further details. See the topic Checking and
repairing a file system in the IBM Spectrum Scale documentation:
Administering. Managing file systems. If the file system is severely
damaged, the best course of action is to follow the procedures in section:
Additional information to collect for file system corruption or
MMFS_FSSTRUCT errors
Severity:                ERROR

State:                   DEGRADED

The event is triggered by a callback which may not fire on all nodes, that
is why only a subset of nodes have the information.
Depending on the version of scale the procedure to remove the event varies:
For newer release please use

# mmhealth event resolve
Missing arguments.
Usage:
  mmhealth event resolve {EventName} [Identifier]

For older releases it is described here:
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_fsstruc.htm
mmsysmonc event filesystem fsstruct_fixed <filesystem_name>
<filesystem_name>


Mit freundlichen Gr??en / Kind regards

Norbert Schuld

M925:IBM Spectrum Scale Software Development
                                                                                                                
                                                                                                                
 Phone:            +49-160 70 70 335                   IBM Deutschland Research & Development                   
                                                      GmbH                                                      
                                                                                                                
 Email:            nschuld at de.ibm.com                  Am Weiher 24                                             
                                                                                                                
                                                       65451 Kelsterbach                                        
                                                                                                                
 Knowing is not                                                                                                 
 enough; we must                                                                                                
 apply. Willing is                                                                                              
 not enough; we                                                                                                 
 must do.                                                                                                       
                                                                                                                
                                                                                                                
 IBM Data Privacy                                                                                               
 Statement                                                                                                      
                                                                                                                
 IBM Deutschland                                                                                                
 Research &                                                                                                     
 Development                                                                                                    
 GmbH /                                                                                                         
 Vorsitzender des                                                                                               
 Aufsichtsrats:                                                                                                 
 Gregor Pillen                                                                                                  
 Gesch?ftsf?hrung:                                                                                              
 Dirk Wittkopp                                                                                                  
 Sitz der                                                                                                       
 Gesellschaft:                                                                                                  
 B?blingen /                                                                                                    
 Registergericht:                                                                                               
 Amtsgericht                                                                                                    
 Stuttgart, HRB                                                                                                 
 243294                                                                                                         
                                                                                                                

From:	Prasad Surampudi <prasad.surampudi at theatsgroup.com>
To:	"gpfsug-discuss at spectrumscale.org"
            <gpfsug-discuss at spectrumscale.org>
Date:	24.11.2020 17:05
Subject:	[EXTERNAL] [gpfsug-discuss] mmhealth reports fserrinvalid
            errors on CNFS	servers
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


We are seeing fserrinvalid error on couple of filesystems in Spectrum Scale
cluster. These errors are reported but mmhealth  only couple of nodes
(CNFS servers) in the cluster, but mmhealth on other nodes shows no issues.
Any idea what this error means? And why its reported on CNFS servers and
not on other nodes? What need to be done to fix this issue?

sudo /usr/lpp/mmfs/bin/mmhealth node show FILESYSTEM -v

Node name:      cnfs05-gpfs

Component      Status              Reasons
-------------------------------------------------------------------
FILESYSTEM     DEGRADED            fserrinvalid(vol)
  argus        HEALTHY             -
  dytech       HEALTHY             -
  enlnt_E      HEALTHY             -
  enlnt_Es     HEALTHY             -
  haaforfs     HEALTHY             -
  haaforfs2    HEALTHY             -
  historical   HEALTHY             -
  prcfs        HEALTHY             -
  qmtfs        HEALTHY             -
  research     HEALTHY             -
  research2    HEALTHY             -
  schon_raw    HEALTHY             -
  uhdb_vol1    HEALTHY             -
  vol          DEGRADED            fserrinvalid(vol)

Event                Parameter      Severity            Event Message
----------------------------------------------------------------------------------------------------------
fserrinvalid         vol            ERROR
FS=vol,ErrNo=1124,Unknown
error=0464000000010000000180A108BC000079B4000000000000003400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1D963707.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0005.gif>

From jake.carroll at uq.edu.au  Wed Nov 25 21:29:24 2020
From: jake.carroll at uq.edu.au (Jake Carroll)
Date: Wed, 25 Nov 2020 21:29:24 +0000
Subject: [gpfsug-discuss] IB routers in ESS configuration + 3 different
	subnets - valid config?
Message-ID: <SYYP282MB12486F10A6010E8EC3C19B69D8FA0@SYYP282MB1248.AUSP282.PROD.OUTLOOK.COM>

Hi.

I am just in the process of sanity-checking a potential future configuration.

Let's say I have an ESS 5000 and an ESS 3000 placed on the data centre floor to form the basis of a new scratch array.

Let's then suppose that I have three existing supercomputers in that same location. Each of those supercomputers has a separate IB subnet and their networks are unrelated to each other, IB-wise.

My understanding is that it is valid and possible to use MLNX EDR IB *routers* in order to be able to transport NSD communications back and forth across those separate subnets, back to the ESS (which lives on its own unique subnet). So at this point, I've got four unique subnets - one for the ESS, one for each super. As I understand it, there is an upper limit of *SIX* unique subnets on those EDR IB routers.

As I understand it - for IPoIB transport, I'd also need some "gateway" boxes more or less - essentially some decent servers which I put EDR/HDR cards in as dog legs that act as an IPoIB gateway interface to each subnet.

I appreciate that there is devil in the detail - but what I'm asking is if it is valid to "route" NSD with IB Routers (not switches) this way to separate subnets.

Colleagues at IBM have all said "yeah....should work....we've not done it....but should be fine?"

Colleagues at Mellanox (uhhh...nvidia...) say "Yes, this is valid and does exactly as the IB Router should and there is nothing unusual about this".

If someone has experience doing this or could call out any oddity/weirdness/gotchas, I'd be very appreciative. I'm fairly sure this is all very low risk - but given nobody locally could tell me "Yeah, all certified and valid!" I'd like the wisdom of the wider crowd.

Thank you.

--jc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201125/f088aa8d/attachment-0001.htm>

From vpuvvada at in.ibm.com  Fri Nov 27 11:46:05 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 27 Nov 2020 17:16:05 +0530
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
In-Reply-To: <CAFaHHOzDk9rxZJSouJoHOm++Z0HuMKqUPrngKMwXOZknOS8Y+Q@mail.gmail.com>
References: <1388247256.209171.1605555854969@privateemail.com><OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com><OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>
	<CAFaHHOzDk9rxZJSouJoHOm++Z0HuMKqUPrngKMwXOZknOS8Y+Q@mail.gmail.com>
Message-ID: <OFE214CC39.AED577C8-ON6525862B.0045BE44-6525862D.0040A4A5@notes.na.collabserv.com>

Hi Yeep,

>If ACLs and other EAs migration from non scale is not supported by AFM, 
is there any 3rd party tool that could complement that when paired with 
AFM?

rsync can be used to just fix metadata like ACLs and EAs.  AFM does not 
revalidate the files with source system if rsync changes the ACLs on them. 
So ACLs can only be fixed after or during the cutover.  ACL inheritance 
may be used by setting on ACLs on required parent dirs upfront if this 
option is sufficient, there was an user who migrated to scale using this 
method.

~Venkat (vpuvvada at in.ibm.com)


From:   "T.A. Yeep" <yeep at robust.my>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:     gpfsug-discuss-bounces at spectrumscale.org
Date:   11/24/2020 07:40 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Venkat,

If ACLs and other EAs migration from non scale is not supported by AFM, is 
there any 3rd party tool that could complement that when paired with AFM?

On Tue, Nov 24, 2020 at 10:41 AM Venkateswara R Puvvada <
vpuvvada at in.ibm.com> wrote:
AFM provides near zero downtime for migration.  As of today,  AFM 
migration does not support ACLs or other EAs migration from non scale 
(GPFS) source.

https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1ins_uc_migrationusingafmmigrationenhancements.htm


~Venkat (vpuvvada at in.ibm.com)


From:        "Frederick Stock" <stockf at us.ibm.com>
To:        gpfsug-discuss at spectrumscale.org
Cc:        gpfsug-discuss at spectrumscale.org
Date:        11/17/2020 03:14 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over        NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Have you considered using the AFM feature of Spectrum Scale?  I doubt it 
will provide any speed improvement but it would allow for data to be 
accessed as it was being migrated.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com
 
 
----- Original message -----
From: Andi Christiansen <andi at christiansen.xxx>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon 
to Scale over NFS?
Date: Mon, Nov 16, 2020 2:44 PM
 
Hi all,
 
i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes...
 
as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 
 
does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit...
 
 
if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Best regards 
T.A. Yeep
Mobile: +6-016-719 8506 | Tel: +6-03-7628 0526 | www.robusthpc.com


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201127/aaf80327/attachment-0001.htm>

From carlz at us.ibm.com  Mon Nov 30 13:49:12 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Mon, 30 Nov 2020 13:49:12 +0000
Subject: [gpfsug-discuss] Licensing costs for data lakes (SSUG follow-up)
Message-ID: <C85DC92F-1E02-4763-8E94-FF44DDC0AA6C@us.ibm.com>

I am seeking some help on a topic I know many of you care deeply about: licensing costs

I am trying to gather some more information about a request that has come up a couple of times, pricing for ?data lakes?. I would like to understand better what people are looking for here.

- Is it as simple as ?much steeper discounts for very large deployments?? Or is a ?data lake? something specific, e.g. a large deployment that is not performance/latency sensitive; a storage pool that is [primarily] HDD; a tier that has specific read/write patterns such as moving entire large datasets in or out; or something else? Bear in mind that if we have special licensing for data lakes, we need a rigorous definition so that both you and we know whether your use of that licensing is compliant. Nobody likes ambiguity in licensing!

- Are you expecting pricing to get very flat/discounting to get steep for large deployments? Or a different price tier/structure for ?data lakes? if we can rigorously define what one means? Do you agree or disagree with the proposition that if you keep adding storage hardware/capacity, that the software licensing cost should rise in proportion (even if that proportion is much smaller for a ?data lake? than for a performance tier)?

- Feel free to be creative and imaginative. For example, would you be interested in a low-cost pricing model for storage that is an AFM Home and is _only_ accessed by using AFM to move data in and out of an AFM Cache (probably on the performance tier)? This would be conceptually similar to the way you can now (5.1) use AFM-Object to park data in a cheap object store.

- Also feel free to answer questions I didn?t ask?


If you prefer to discuss this in Slack rather than email, I started a discussion there a little while ago (please thread your comments!): https://ssug-poweraiug.slack.com/archives/CEVVCEE8M/p1605815075188800


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_1545794140]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201130/71cf82fc/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201130/71cf82fc/attachment-0001.png>

From david_johnson at brown.edu  Mon Nov 30 21:41:30 2020
From: david_johnson at brown.edu (David Johnson)
Date: Mon, 30 Nov 2020 16:41:30 -0500
Subject: [gpfsug-discuss] internal details on GPFS inode expansion
Message-ID: <D0E447B2-778A-41D6-8282-9BDCEEF4A583@brown.edu>

When GPFS needs to add inodes to the filesystem, it seems to pre-create about 4 million of them.
Judging by the logs, it seems it only takes a few (13 maybe) seconds to do this.
However we are suspecting that this might only be to request the additional inodes and 
that there is some background activity for some time afterwards.  
Would someone who has knowledge of the actual internals be willing to confirm or deny this,
and if there is background activity, is it on all nodes in the cluster, NSD nodes, "default worker nodes"?

Thanks,
 -- ddj
Dave Johnson
ddj at brown.edu

From madhu.punjabi at in.ibm.com  Mon Nov  2 08:17:23 2020
From: madhu.punjabi at in.ibm.com (Madhu P Punjabi)
Date: Mon, 2 Nov 2020 08:17:23 +0000
Subject: [gpfsug-discuss] [NFS-Ganesha-Support] 'ganesha_mgr
	display_export - client not listed
In-Reply-To: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch>
References: <660DD807-C723-44EF-BC51-57EFB296FFC4@id.ethz.ch>
Message-ID: <OFB3FCC929.2377A4AA-ON00258614.002C5D33-00258614.002D8995@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201102/e249fee6/attachment-0002.htm>

From christian.vieser at 1und1.de  Mon Nov  2 13:44:50 2020
From: christian.vieser at 1und1.de (Christian Vieser)
Date: Mon, 2 Nov 2020 14:44:50 +0100
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <1109480230.484366.1603799162955@privateemail.com>
References: <1109480230.484366.1603799162955@privateemail.com>
Message-ID: <1dffa509-1bcc-0d5a-bf79-fa82746dca07@1und1.de>

Hi Andi,

we suffer from the same issue. IBM support told me that Spectrum Scale
5.1 will come with a new release of the underlying Openstack components,
so we still hope that some/most of limitations will vanish then. But I
already know, that the new S3 policies won't be available, only the
"legacy" S3 ACLs.

We also tried MinIO but deemed that it's not "production ready". It's
fine for quickly setting up a S3 service for development, but they
release too often and with breaking changes, and documentation is
lacking all aspects regarding maintenance.

Regards,

Christian

Am 27.10.20 um 12:46 schrieb Andi Christiansen:
> Hi all,
>
> We have over a longer period used the S3 API within spectrum Scale..
> And that has shown that it does not support very many applications
> because of limitations of the API..

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201102/6b1be66e/attachment-0002.htm>

From jmtick at us.ibm.com  Tue Nov  3 00:21:43 2020
From: jmtick at us.ibm.com (Jacob M Tick)
Date: Tue, 3 Nov 2020 00:21:43 +0000
Subject: [gpfsug-discuss] Use cases for file audit logging and clustered
	watch folder
Message-ID: <OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201103/caba8727/attachment-0002.htm>

From S.J.Thompson at bham.ac.uk  Tue Nov  3 17:00:54 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 3 Nov 2020 17:00:54 +0000
Subject: [gpfsug-discuss] SSUG::Digital Scalable multi-node training for AI
 workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
Message-ID: <a48cee803cee419abd8574a0b71dda44@bham.ac.uk>

Apologies, looks like the calendar invite for this week?s SSUG::Digital didn?t get sent!


Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.


>>>&nbsp;Join&nbsp;Here&nbsp;<<<<https://ibm.webex.com/ibm/onstage/g.php?MTID=e896290a1eef7e81ab4b411669138a17e>


This episode will start 15 minutes later as usual.


   *   San Francisco, USA at 08:15 PST

   *   New York, USA at 11:15 EST

   *   London, United Kingdom at 16:15 GMT

   *   Frankfurt, Germany at 17:15 CET

   *   Pune, India at 21:45 IST


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201103/0c3c01a2/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2488 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201103/0c3c01a2/attachment-0002.ics>

From andi at christiansen.xxx  Wed Nov  4 07:14:41 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Wed, 4 Nov 2020 08:14:41 +0100 (CET)
Subject: [gpfsug-discuss] Alternative to Scale S3 API.
In-Reply-To: <1dffa509-1bcc-0d5a-bf79-fa82746dca07@1und1.de>
References: <1109480230.484366.1603799162955@privateemail.com>
	<1dffa509-1bcc-0d5a-bf79-fa82746dca07@1und1.de>
Message-ID: <1512108314.679947.1604474081488@privateemail.com>

Hi Christian,

Thanks for the information! My question also triggered IBM to tell me the same so i think we will stay on S3 with Scale and hoping the same with the new release..

Yes, MinIO is really lacking some good documentation.. but definatly a cool software package that i will keep an eye on in the future...

Best Regards
Andi Christiansen


>     On 11/02/2020 2:44 PM Christian Vieser <christian.vieser at 1und1.de> wrote:
> 
> 
> 
>     Hi Andi,
> 
>     we suffer from the same issue. IBM support told me that Spectrum Scale 5.1 will come with a new release of the underlying Openstack components, so we still hope that some/most of limitations will vanish then. But I already know, that the new S3 policies won't be available, only the "legacy" S3 ACLs.
> 
>     We also tried MinIO but deemed that it's not "production ready". It's fine for quickly setting up a S3 service for development, but they release too often and with breaking changes, and documentation is lacking all aspects regarding maintenance.
> 
>     Regards,
> 
>     Christian
> 
>     Am 27.10.20 um 12:46 schrieb Andi Christiansen:
> 
>         > >         Hi all,
> > 
> >         We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..
> > 
> >     >     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/a492d2e2/attachment-0002.htm>

From joe at excelero.com  Wed Nov  4 12:19:07 2020
From: joe at excelero.com (joe at excelero.com)
Date: Wed, 4 Nov 2020 06:19:07 -0600
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 106, Issue 3
Message-ID: <924bb673-0b2a-420a-8ce2-be24c5e6e4e8@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/fdc15501/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/fdc15501/attachment-0002.bin>

From oluwasijibomi.saula at ndsu.edu  Wed Nov  4 16:05:50 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Wed, 4 Nov 2020 16:05:50 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 106, Issue 3
In-Reply-To: <mailman.1.1604491201.3061627.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1604491201.3061627.gpfsug-discuss@spectrumscale.org>
Message-ID: <PH0PR08MB6598EA67BBF1D74990C2441B98EF0@PH0PR08MB6598.namprd08.prod.outlook.com>

Could someone share the password for the event today? Thanks!


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Wednesday, November 4, 2020 6:00 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 106, Issue 3

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. SSUG::Digital Scalable multi-node training for AI workloads
      on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum Scale
      (Simon Thompson)
   2. Re: Alternative to Scale S3 API. (Andi Christiansen)


----------------------------------------------------------------------

Message: 1
Date: Tue, 3 Nov 2020 17:00:54 +0000
From: Simon Thompson <S.J.Thompson at bham.ac.uk>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] SSUG::Digital Scalable multi-node training
        for AI workloads on NVIDIA DGX, Red Hat OpenShift and IBM Spectrum
        Scale
Message-ID: <a48cee803cee419abd8574a0b71dda44 at bham.ac.uk>
Content-Type: text/plain; charset="utf-8"

Apologies, looks like the calendar invite for this week?s SSUG::Digital didn?t get sent!


Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.


>>>&nbsp;Join&nbsp;Here&nbsp;<<<<https://ibm.webex.com/ibm/onstage/g.php?MTID=e896290a1eef7e81ab4b411669138a17e>


This episode will start 15 minutes later as usual.


   *   San Francisco, USA at 08:15 PST

   *   New York, USA at 11:15 EST

   *   London, United Kingdom at 16:15 GMT

   *   Frankfurt, Germany at 17:15 CET

   *   Pune, India at 21:45 IST


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201103/0c3c01a2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2488 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201103/0c3c01a2/attachment-0001.ics>

------------------------------

Message: 2
Date: Wed, 4 Nov 2020 08:14:41 +0100 (CET)
From: Andi Christiansen <andi at christiansen.xxx>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
        Christian Vieser <christian.vieser at 1und1.de>
Subject: Re: [gpfsug-discuss] Alternative to Scale S3 API.
Message-ID: <1512108314.679947.1604474081488 at privateemail.com>
Content-Type: text/plain; charset="utf-8"

Hi Christian,

Thanks for the information! My question also triggered IBM to tell me the same so i think we will stay on S3 with Scale and hoping the same with the new release..

Yes, MinIO is really lacking some good documentation.. but definatly a cool software package that i will keep an eye on in the future...

Best Regards
Andi Christiansen


>     On 11/02/2020 2:44 PM Christian Vieser <christian.vieser at 1und1.de> wrote:
>
>
>
>     Hi Andi,
>
>     we suffer from the same issue. IBM support told me that Spectrum Scale 5.1 will come with a new release of the underlying Openstack components, so we still hope that some/most of limitations will vanish then. But I already know, that the new S3 policies won't be available, only the "legacy" S3 ACLs.
>
>     We also tried MinIO but deemed that it's not "production ready". It's fine for quickly setting up a S3 service for development, but they release too often and with breaking changes, and documentation is lacking all aspects regarding maintenance.
>
>     Regards,
>
>     Christian
>
>     Am 27.10.20 um 12:46 schrieb Andi Christiansen:
>
>         > >         Hi all,
> >
> >         We have over a longer period used the S3 API within spectrum Scale.. And that has shown that it does not support very many applications because of limitations of the API..
> >
> >     >     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201104/a492d2e2/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 106, Issue 3
**********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201104/ef8b79cd/attachment-0002.htm>

From herrmann at sprintmail.com  Sat Nov  7 21:10:36 2020
From: herrmann at sprintmail.com (Ron H)
Date: Sat, 7 Nov 2020 16:10:36 -0500
Subject: [gpfsug-discuss] Use cases for file audit logging and
	clusteredwatch folder
In-Reply-To: <OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>
References: <OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>
Message-ID: <8F771847BDEB4447919D30A16FE48FAB@rone8PC>

Hi Jacob,

Can you point me to a good overview of each of these features?   I know File Audit and Watch is part of the DME Scale edition license, but I can?t seem to find
a good explanation of what these features can offer.

Thanks

Ron


From: Jacob M Tick 
Sent: Monday, November 02, 2020 7:21 PM
To: gpfsug-discuss at spectrumscale.org 
Cc: April Brown 
Subject: [gpfsug-discuss] Use cases for file audit logging and clusteredwatch folder

Hi All, 

I am reaching out on behalf of the Spectrum Scale development team to get some insight on how our customers are using the file audit logging and the clustered watch folder features. If you have it enabled in your test or production environment, could you please elaborate on how and why you are using the feature? Also, knowing how you have the function configured (ie: watching or auditing for certain events, only enabling on certain filesets, ect..) would help us out. Please respond back to April, John (both on CC), and I with any info you are willing to provide. Thanks in advance!

Regards,

Jake Tick
Manager
Spectrum Scale - Scalable Data Interfaces
IBM Systems Group

Email:jmtick at us.ibm.com

IBM


--------------------------------------------------------------------------------
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201107/5b5e649f/attachment-0002.htm>

From jmtick at us.ibm.com  Mon Nov  9 17:31:00 2020
From: jmtick at us.ibm.com (Jacob M Tick)
Date: Mon, 9 Nov 2020 17:31:00 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Use_cases_for_file_audit_logging_and?=
 =?utf-8?q?=09clusteredwatch_folder?=
In-Reply-To: <8F771847BDEB4447919D30A16FE48FAB@rone8PC>
References: <8F771847BDEB4447919D30A16FE48FAB@rone8PC>,
	<OFA2E90790.D9CA1D2F-ON00258614.00629632-00258615.0001FD57@notes.na.collabserv.com>
Message-ID: <OF4405685B.5F90D85B-ON0025861B.005F4A68-0025861B.00603915@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201109/6b387bbe/attachment-0002.htm>

From Kamil.Czauz at Squarepoint-Capital.com  Wed Nov 11 22:29:31 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Wed, 11 Nov 2020 22:29:31 +0000
Subject: [gpfsug-discuss] Poor client performance with high cpu usage of
	mmfsd process
Message-ID: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>

We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete.   This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science.  I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier.  Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201111/99c10ebe/attachment-0002.htm>

From UWEFALKE at de.ibm.com  Thu Nov 12 01:56:46 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 12 Nov 2020 02:56:46 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage_of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>

Hi, Kamil, 
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes. 
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client. 

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters 

That might give you a first idea what is taking long and which components 
are involved. 

Also, 
mmdiag --iohist 
shows you the last IOs and some stats (service time, size) for them. 

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example). 
GPFS has a nice tracing which you can configure or just run the default 
trace. 

Running a dedicated (low-level) io trace can be achieved by 
mmtracectl --start --trace=io  --tracedev-write-mode=overwrite -N 
<your_critical_node>
then, when the issue is seen, stop the trace by 
mmtracectl --stop   -N <your_critical_node>

Do not wait  to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output). 


There you should see lines with  FIO which carry the inode of the related 
file after the "tag" keyword.
example: 
0.000745100  25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path. 
you need to build this first if not yet done: 
cd /usr/lpp/mmfs/samples/util ; make
, then run 
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool : 
/usr/lpp/mmfs/samples/debugtools/trsum.awk. 

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit. 

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/11/2020 23:36
Subject:        [EXTERNAL] [gpfsug-discuss] Poor client performance with 
high cpu usage of       mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a 
ls could take a few minutes to complete.   This affects every gpfs mount 
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.
 
The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science.  I often end up killing many processes before finding the 
offending one.
 
My question is really about finding the offending process easier.  Is 
there something similar to iotop or a trace that I can enable that can 
tell me what files/processes and being heavily used by the mmfsd process 
on the client?
 
-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation. 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From luis.bolinches at fi.ibm.com  Thu Nov 12 13:19:05 2020
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Thu, 12 Nov 2020 13:19:05 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage_of=09mmfsd_process?=
In-Reply-To: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
References: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>,
	<BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OF56A60072.6FB30E63-ON0025861E.00491448-0025861E.004928D6@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b3d403a/attachment-0002.htm>

From jyyum at kr.ibm.com  Thu Nov 12 14:10:17 2020
From: jyyum at kr.ibm.com (Jae Yoon Yum)
Date: Thu, 12 Nov 2020 14:10:17 +0000
Subject: [gpfsug-discuss] Question about the Clearing Spectrum Scale GUI
	event
Message-ID: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b2087b7/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163322.png
Type: image/png
Size: 262 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b2087b7/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163323.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0b2087b7/attachment-0002.jpg>

From Eric.Wendel at ibm.com  Thu Nov 12 15:43:46 2020
From: Eric.Wendel at ibm.com (Eric Wendel - Eric.Wendel@ibm.com)
Date: Thu, 12 Nov 2020 15:43:46 +0000
Subject: [gpfsug-discuss] Problems reading emails to the mailing list
Message-ID: <31233620a4324240885aed7ad18a729a@ibm.com>

Hi Folks,

As you are no doubt aware, Lotus Notes and its ecosystem is virtually extinct.

For those of us who have moved on to more modern email clients (including an increasing number of IBMERs like me), the email links we receive from SSUG (for example)  'OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4 at notes.na.collabserv.com are useless because they can only be read if you have the Notes client installed.  This is especially problematic for Linux users as the Linux client for Notes is discontinued.

It would be very helpful if the SSUG could move to a modern email platform.

Thanks,

Eric Wendel
eric.wendel at ibm.com  

 
-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of gpfsug-discuss-request at spectrumscale.org
Sent: Thursday, November 12, 2020 8:10 AM
To: gpfsug-discuss at spectrumscale.org
Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 106, Issue 8

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
	gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Poor client performance with high cpu usage of	mmfsd
      process (Luis Bolinches)
   2. Question about the Clearing Spectrum Scale GUI	event
      (Jae Yoon Yum)


----------------------------------------------------------------------

Message: 1
Date: Thu, 12 Nov 2020 13:19:05 +0000
From: "Luis Bolinches" <luis.bolinches at fi.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu
	usage of	mmfsd process
Message-ID:
	<OF56A60072.6FB30E63-ON0025861E.00491448-0025861E.004928D6 at notes.na.collabserv.com>
	
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b3d403a/attachment-0001.html >

------------------------------

Message: 2
Date: Thu, 12 Nov 2020 14:10:17 +0000
From: "Jae Yoon Yum" <jyyum at kr.ibm.com>
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Question about the Clearing Spectrum Scale
	GUI	event
Message-ID:
	<OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4 at notes.na.collabserv.com>
	
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b2087b7/attachment.html >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163322.png
Type: image/png
Size: 262 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b2087b7/attachment.png >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.14713274163323.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201112/0b2087b7/attachment.jpg >

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


End of gpfsug-discuss Digest, Vol 106, Issue 8
**********************************************


From stefan.roth at de.ibm.com  Thu Nov 12 17:13:38 2020
From: stefan.roth at de.ibm.com (Stefan Roth)
Date: Thu, 12 Nov 2020 18:13:38 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Question_about_the_Clearing_Spectrum_S?=
 =?utf-8?q?cale_GUI=09event?=
In-Reply-To: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>
References: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>
Message-ID: <OF81AFB144.75684F13-ONC125861E.005DE5FA-C125861E.005EA21B@notes.na.collabserv.com>

Hello Jay,

as long as those errors are still shown by "mmhealth node show" CLI
command, they will again appear in the GUI.

In the GUI events table you can show an "Event Type" column which is hidden
by default.
Events that have event type "Notice" can be cleared by the "Mark as Read"
action.
Events that have event type "State" can not be cleared by the "Mark as
Read" action. They have to disappear by solving the problem.
If a problem is solved the error should disappear from "mmhealth node show"
and after that it will disappear from the GUI as well.

Mit freundlichen Gr??en / Kind regards

Stefan Roth

Spectrum Scale Developement
                                                                                                                
                                                                                                                
 Phone:            +49 162 4159934                     IBM Deutschland Research & Development                   
                                                      GmbH                                                      
                                                                                                                
 Email:            stefan.roth at de.ibm.com              Am Weiher 24                                             
                                                                                                                
                                                       65451 Kelsterbach                                        
                                                                                                                
                                                                                                                
 IBM Data Privacy                                                                                               
 Statement                                                                                                      
                                                                                                                
 IBM Deutschland                                                                                                
 Research &                                                                                                     
 Development                                                                                                    
 GmbH /                                                                                                         
 Vorsitzender des                                                                                               
 Aufsichtsrats:                                                                                                 
 Gregor Pillen                                                                                                  
 Gesch?ftsf?hrung:                                                                                              
 Dirk Wittkopp                                                                                                  
 Sitz der                                                                                                       
 Gesellschaft:                                                                                                  
 B?blingen /                                                                                                    
 Registergericht:                                                                                               
 Amtsgericht                                                                                                    
 Stuttgart, HRB                                                                                                 
 243294                                                                                                         
                                                                                                                

From:	"Jae Yoon Yum" <jyyum at kr.ibm.com>
To:	gpfsug-discuss at spectrumscale.org
Date:	12.11.2020 15:10
Subject:	[EXTERNAL] [gpfsug-discuss] Question about the Clearing
            Spectrum Scale GUI	event
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Hi Team,
I hope you all stay safe from COVID 19,

One of my client wants to clear their ?ERROR? events on the Scale GUI.
As you know, there is ?mark as read? for ?warning? messages but there isn?t
for ?ERROR?. (In fact, the ?mark as read? button is exist but it does not
work.)

So I sent him to run this command on cli.
/usr/lpp/mmfs/gui/cli/lshealth --reset

On my test VM, all of the error messages has been cleared when I run the
command?.
But, for the client?s system, client said that  ?All of the error / warning
messages had been appeared again include the one which I had delete by
clicking ?mark as read?.?

Does anyone who has similar experience like this? and How Could I solve
this problem?

Or, Is there any way to clear the event one by one?

* I sent the same message to the Slack 'scale-help' channel.


Thanks.

Jay.


Best Regards,


 JaeYoon(Jay)                              IBM Korea, Three IFC,                            
 Yum                                                                                        
                                                                                            
                                           10 Gukjegeumyung-ro,                             
                                          Yeongdeungpo-gu,                                  
                                                                                            
 IBM Systems                               Seoul, Korea                                     
 Hardware,                                                                                  
 Storage                                                                                    
 Technical Sales                                                                            
                                                                                            
 Mobile :        +82-10-4995-4814          07326                                            
                                                                                            
 e-mail:         jyyum at kr.ibm.com                                                           
                                                                                            

 ? ??? ??? ??, ??? ??, ???? ???? ???? ?????. ???   
 ??? ??? ??? ????, ????? ??? ?? ?????,??IBM ??? ?  
 ?? ??? ????, ????? ???? ????. (If you don't wish to receive   
 e-mail from sender, please send e-mail directly. For IBM e-mail, please click       
 here). ??? ???? ??,??, ?? ??? ???????(??: 02-3781-7800,?  
 ?? mktg at kr.ibm.com )? ?? ?? ???? ?? ???? ? ????.              
                                                                                     

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0008.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1E506389.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0009.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0010.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1E764757.gif
Type: image/gif
Size: 262 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0011.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1E982001.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/0c9196b5/attachment-0002.jpg>

From arc at b4restore.com  Thu Nov 12 17:33:01 2020
From: arc at b4restore.com (=?utf-8?B?QW5kaSBOw7hyIENocmlzdGlhbnNlbg==?=)
Date: Thu, 12 Nov 2020 17:33:01 +0000
Subject: [gpfsug-discuss] Question about the Clearing Spectrum Scale
	GUI	event
In-Reply-To: <OF81AFB144.75684F13-ONC125861E.005DE5FA-C125861E.005EA21B@notes.na.collabserv.com>
References: <OF0433B7F4.580A7B75-ON0025861E.004DD432-0025861E.004DD8A4@notes.na.collabserv.com>
	<OF81AFB144.75684F13-ONC125861E.005DE5FA-C125861E.005EA21B@notes.na.collabserv.com>
Message-ID: <PR3P194MB0570EF4EC3BFFE22A7BA8C59FBE70@PR3P194MB0570.EURP194.PROD.OUTLOOK.COM>

Hi Jay,

First of you need to make sure your system is actually healthy. Events that are not fixed will reappear.

I have had a lot of ?stale? entries happening over the last years and more often than not ?/usr/lpp/mmfs/gui/cli/lshealth ?reset? clears the entries if they are not actual faults..

As Stefan says if the errors/warnings are shown in ?mmhealth node show or mmhealth cluster show? they will reappear as they should. (I have sometimes seen stale entries there aswell)

When I have encountered stale entries which wasn?t cleared with ?lshealth ?reset? I could clear them with ?mmsysmoncontrol restart?.

I think I actually run that command maybe once or twice every month because of stale entries in the GUI og mmhealth itself.. don?t know why they happen but they seem to appear more frequently for me atleast.. I have high hopes for the 5.1.0.0/5.1.0.1 release as I have heard there should be some new things for the GUI as well.. not sure what they are yet though &#128522;

Hope this helps.

Cheers
A. Christiansen

Fra: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> P? vegne af Stefan Roth
Sendt: Thursday, November 12, 2020 6:14 PM
Til: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Emne: Re: [gpfsug-discuss] Question about the Clearing Spectrum Scale GUI event


Hello Jay,

as long as those errors are still shown by "mmhealth node show" CLI command, they will again appear in the GUI.

In the GUI events table you can show an "Event Type" column which is hidden by default.
Events that have event type "Notice" can be cleared by the "Mark as Read" action.
Events that have event type "State" can not be cleared by the "Mark as Read" action. They have to disappear by solving the problem.
If a problem is solved the error should disappear from "mmhealth node show" and after that it will disappear from the GUI as well.

Mit freundlichen Gr??en / Kind regards

Stefan Roth

Spectrum Scale Developement

________________________________


Phone:

+49 162 4159934

IBM Deutschland Research & Development GmbH

[cid:image002.gif at 01D6B922.3FE99E70]

Email:

stefan.roth at de.ibm.com<mailto:stefan.roth at de.ibm.com>

Am Weiher 24


65451 Kelsterbach

________________________________

IBM Data Privacy Statement<https://www.ibm.com/privacy/us/en/>

IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


[cid:image003.gif at 01D6B922.3FE99E70]"Jae Yoon Yum" ---12.11.2020 15:10:35---Hi Team, I hope you all stay safe from COVID 19, One of my client wants to clear their ?ERROR? ev

From: "Jae Yoon Yum" <jyyum at kr.ibm.com<mailto:jyyum at kr.ibm.com>>
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Date: 12.11.2020 15:10
Subject: [EXTERNAL] [gpfsug-discuss] Question about the Clearing Spectrum Scale GUI event
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>

________________________________


Hi Team,
I hope you all stay safe from COVID 19,

One of my client wants to clear their ?ERROR? events on the Scale GUI.
As you know, there is ?mark as read? for ?warning? messages but there isn?t for ?ERROR?. (In fact, the ?mark as read? button is exist but it does not work.)

So I sent him to run this command on cli.
/usr/lpp/mmfs/gui/cli/lshealth --reset

On my test VM, all of the error messages has been cleared when I run the command?.
But, for the client?s system, client said that ?All of the error / warning messages had been appeared again include the one which I had delete by clicking ?mark as read?.?

Does anyone who has similar experience like this? and How Could I solve this problem?

Or, Is there any way to clear the event one by one?

* I sent the same message to the Slack 'scale-help' channel.


Thanks.

Jay.


Best Regards,


JaeYoon(Jay) Yum

IBM Korea, Three IFC,

[cid:image005.jpg at 01D6B922.3FE99E70]


10 Gukjegeumyung-ro, Yeongdeungpo-gu,

IBM Systems Hardware, Storage Technical Sales

Seoul, Korea

Mobile :

+82-10-4995-4814

07326

e-mail:

jyyum at kr.ibm.com<mailto:jyyum at kr.ibm.com>


? ??? ??? ??, ??? ??, ???? ???? ???? ?????. ??? ??? ??? ??? ????, ????? ??? ?? ?????,??IBM ??? ??? ??? ????, ????? ???? ????. (If you don't wish to receive e-mail from sender, please send e-mail directly. For IBM e-mail, please click here). ??? ???? ??,??, ?? ??? ???????(??: 02-3781-7800,??? mktg at kr.ibm.com<mailto:mktg at kr.ibm.com> )? ?? ?? ???? ?? ???? ? ????.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 1851 bytes
Desc: image002.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.gif
Type: image/gif
Size: 105 bytes
Desc: image003.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0005.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.jpg
Type: image/jpeg
Size: 2457 bytes
Desc: image005.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.png
Type: image/png
Size: 166 bytes
Desc: image006.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.png
Type: image/png
Size: 616 bytes
Desc: image007.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201112/86789d75/attachment-0005.png>

From Kamil.Czauz at Squarepoint-Capital.com  Fri Nov 13 02:33:17 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Fri, 13 Nov 2020 02:33:17 +0000
Subject: [gpfsug-discuss] Poor client performance with high cpu usage
	of	mmfsd process
In-Reply-To: <OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
Message-ID: <BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>

Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the iohist report.  Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening.  I took 2 traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
  buffer size: 134217728
   64 kernel trace streams, indices 0-63 (selected by low bits of processor ID)
  128 daemon trace streams, indices 64-191 (selected by low bits of thread ID)
Interval for calibrating clock rate was 100.019054 seconds and 260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
  kernel trace enabled  Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220)
  daemon trace enabled  Thu Nov 12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 20701293488095152)
  all streams included  Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here
  trace quiesced        Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <----   to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it?  The 'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

   21234 ***************** lookup               ************** 0.165851604
  290020 ***************** lookup               ************** 0.151032241
  302757 ***************** lookup               ************** 0.168723402
  301677 ***************** lookup               ************** 0.070016530
  230983 ***************** lookup               ************** 0.127699082
   21233 ***************** lookup               ************** 0.060357257
  309046 ***************** lookup               ************** 0.157124551
  301643 ***************** lookup               ************** 0.165543982
  304042 ***************** lookup               ************** 0.172513838
  167794 ***************** lookup               ************** 0.056056815
  189680 ***************** lookup               ************** 0.062022237
  362216 ***************** lookup               ************** 0.072063619
  406314 ***************** lookup               ************** 0.114121838
  167776 ***************** lookup               ************** 0.114899642
  303016 ***************** lookup               ************** 0.144491120
  290021 ***************** lookup               ************** 0.142311603
  167762 ***************** lookup               ************** 0.144240366
  248530 ***************** lookup               ************** 0.168728131
       0       0.000000000 *********  Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF

Elapsed trace time:                                    0.182617894 seconds
Elapsed trace time from first VFS call to last:        0.182617893
Time idle between VFS calls:                           0.000006317 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  rdwr                       0.012021696        35      343.477
  read_inode2                0.000100787        43        2.344
  follow_link                0.000050609         8        6.326
  pagein                     0.000097806        10        9.781
  revalidate                 0.000010884       156        0.070
  open                       0.001001824        18       55.657
  lookup                     1.152449696        36    32012.492
  delete_inode               0.000036816        38        0.969
  permission                 0.000080574        14        5.755
  release                    0.000470096        18       26.116
  mmap                       0.000340095         9       37.788
  llseek                     0.000001903         9        0.211


User thread stats: GPFS-time(sec)    Appl-time  GPFS-%  Appl-%     Ops
    221919       0.000000244       0.050064080   0.00% 100.00%       4
    167794       0.000011891       0.000069707  14.57%  85.43%       4
    309046       0.147664569       0.000074663  99.95%   0.05%       9
    349767       0.000000070       0.000000000 100.00%   0.00%       1
    301677       0.017638372       0.048741086  26.57%  73.43%      12
     84407       0.000010448       0.000016977  38.10%  61.90%       3
    406314       0.000002279       0.000122367   1.83%  98.17%       7
     25464       0.043270937       0.000006200  99.99%   0.01%       2
    362216       0.000005617       0.000017498  24.30%  75.70%       2
    379982       0.000000626       0.000000000 100.00%   0.00%       1
    230983       0.123947465       0.000056796  99.95%   0.05%       6
     21233       0.047877661       0.004887113  90.74%   9.26%      17
    302757       0.154486003       0.010695642  93.52%   6.48%      24
    248530       0.000006763       0.000035442  16.02%  83.98%       3
    303016       0.014678039       0.000013098  99.91%   0.09%       2
    301643       0.088025575       0.054036566  61.96%  38.04%      33
      3339       0.000034997       0.178199426   0.02%  99.98%      35
     21234       0.164240073       0.000262711  99.84%   0.16%      39
    167762       0.000011886       0.000041865  22.11%  77.89%       3
    336006       0.000001246       0.100519562   0.00% 100.00%      16
    304042       0.121322325       0.019218406  86.33%  13.67%      33
    301644       0.054325242       0.087715613  38.25%  61.75%      37
    301680       0.000015005       0.020838281   0.07%  99.93%       9
    290020       0.147713357       0.000121422  99.92%   0.08%      19
    290021       0.000476072       0.000085833  84.72%  15.28%      10
     44777       0.040819757       0.000010957  99.97%   0.03%       3
    189680       0.000000044       0.000002376   1.82%  98.18%       1
    241759       0.000000698       0.000000000 100.00%   0.00%       1
    184839       0.000001621       0.150341986   0.00% 100.00%      28
    362220       0.000010818       0.000020949  34.05%  65.95%       2
    104687       0.000000495       0.000000000 100.00%   0.00%       1

# total App-read/write = 45 Average duration = 0.000269322 sec
#  time(sec)  count         %     %ile       read      write  avgBytesR  avgBytesW
0.000500         34  0.755556 0.755556         34          0      32889          0
0.001000         10  0.222222 0.977778         10          0     108136          0
0.004000          1  0.022222 1.000000          1          0          8          0

# max concurrant App-read/write = 2
# conc    count         %     %ile
   1         38  0.844444 0.844444
   2          7  0.155556 1.000000


Capture 2

Unfinished operations:

  335096 ***************** lookup               ************** 0.289127895
  334691 ***************** lookup               ************** 0.225380797
  362246 ***************** lookup               ************** 0.052106493
  334694 ***************** lookup               ************** 0.048567769
  362220 ***************** lookup               ************** 0.054825580
  333972 ***************** lookup               ************** 0.275355791
  406314 ***************** lookup               ************** 0.283219905
  334686 ***************** lookup               ************** 0.285973208
  289606 ***************** lookup               ************** 0.064608288
   21233 ***************** lookup               ************** 0.074923689
  189680 ***************** lookup               ************** 0.089702578
  335100 ***************** lookup               ************** 0.151553955
  334685 ***************** lookup               ************** 0.117808430
  167700 ***************** lookup               ************** 0.119441314
  336813 ***************** lookup               ************** 0.120572137
  334684 ***************** lookup               ************** 0.124718126
   21234 ***************** lookup               ************** 0.131124745
   84407 ***************** lookup               ************** 0.132442945
  334696 ***************** lookup               ************** 0.140938740
  335094 ***************** lookup               ************** 0.201637910
  167735 ***************** lookup               ************** 0.164059859
  334687 ***************** lookup               ************** 0.252930745
  334695 ***************** lookup               ************** 0.278037098
  341818       0.291815990 *********  Unfinished IO: buffer/disk 50000015000 3:439888512^\scratch_metadata_5
  341818       0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 199.688 + us
  100041       0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time:                                    0.292021772 seconds
Elapsed trace time from first VFS call to last:        0.292021771
Time idle between VFS calls:                           0.001436519 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  rdwr                       0.000831801         4      207.950
  read_inode2                0.000082347        31        2.656
  pagein                     0.000033905         3       11.302
  revalidate                 0.000013109       156        0.084
  open                       0.000237969        22       10.817
  lookup                     1.233407280        10   123340.728
  delete_inode               0.000013877        33        0.421
  permission                 0.000046486         8        5.811
  release                    0.000172456        21        8.212
  mmap                       0.000064411         2       32.206
  llseek                     0.000000391         2        0.196
  readdir                    0.000213657        36        5.935


User thread stats: GPFS-time(sec)    Appl-time  GPFS-%  Appl-%     Ops
    335094       0.053506265       0.000170270  99.68%   0.32%      16
    167700       0.000008522       0.000027547  23.63%  76.37%       2
    167776       0.000008293       0.000019462  29.88%  70.12%       2
    334684       0.000023562       0.000160872  12.78%  87.22%       8
    349767       0.000000467       0.250029787   0.00% 100.00%       5
     84407       0.000000230       0.000017947   1.27%  98.73%       2
    334685       0.000028543       0.000094147  23.26%  76.74%       8
    406314       0.221755229       0.000009720 100.00%   0.00%       2
    334694       0.000024913       0.000125229  16.59%  83.41%      10
    335096       0.254359005       0.000240785  99.91%   0.09%      18
    334695       0.000028966       0.000127823  18.47%  81.53%      10
    334686       0.223770082       0.000267271  99.88%   0.12%      24
    334687       0.000031265       0.000132905  19.04%  80.96%       9
    334696       0.000033808       0.000131131  20.50%  79.50%       9
    129075       0.000000102       0.000000000 100.00%   0.00%       1
    341842       0.000000318       0.000000000 100.00%   0.00%       1
    335100       0.059518133       0.000287934  99.52%   0.48%      19
    224423       0.000000471       0.000000000 100.00%   0.00%       1
    336812       0.000042720       0.000193294  18.10%  81.90%      10
     21233       0.000556984       0.000083399  86.98%  13.02%      11
    289606       0.000000088       0.000018043   0.49%  99.51%       2
    362246       0.014440188       0.000046516  99.68%   0.32%       4
     21234       0.000524848       0.000162353  76.37%  23.63%      13
    336813       0.000046426       0.000175666  20.90%  79.10%       9
      3339       0.000011816       0.272396876   0.00% 100.00%      29
    341818       0.000000778       0.000000000 100.00%   0.00%       1
    167735       0.000007866       0.000049468  13.72%  86.28%       3
    175480       0.000000278       0.000000000 100.00%   0.00%       1
    336006       0.000001170       0.250020470   0.00% 100.00%      16
     44777       0.000000367       0.250149757   0.00% 100.00%       6
    189680       0.000002717       0.000006518  29.42%  70.58%       1
    184839       0.000003001       0.250144214   0.00% 100.00%      35
    145858       0.000000687       0.000000000 100.00%   0.00%       1
    333972       0.218656404       0.000043897  99.98%   0.02%       4
    334691       0.187695040       0.000295117  99.84%   0.16%      25

# total App-read/write = 7 Average duration = 0.000123672 sec
#  time(sec)  count         %     %ile       read      write  avgBytesR  avgBytesW
0.000500          7  1.000000 1.000000          7          0       1172          0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io  --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by
mmtracectl --stop   -N <your_critical_node>

Do not wait  to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with  FIO which carry the inode of the related file after the "tag" keyword.
example:
0.000745100  25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date:   11/11/2020 23:36
Subject:        [EXTERNAL] [gpfsug-discuss] Poor client performance with
high cpu usage of       mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a
ls could take a few minutes to complete.   This affects every gpfs mount
on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science.  I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier.  Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201113/20e7414e/attachment-0002.htm>

From UWEFALKE at de.ibm.com  Fri Nov 13 09:21:17 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Fri, 13 Nov 2020 10:21:17 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage=09of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
	<BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.00336328@notes.na.collabserv.com>

Hi, Kamil, 
looks your tracefile setting has been too low: 
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
means you effectively captured a period of about 5ms only ... you can't 
see much from that. 

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to. 
try running with something like 
 mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N <your_critical_node>. 

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload. 

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff). 
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 03:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu usage     of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 100.019054 seconds and 
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using 
this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 
1605232600.114080, cycles 20701293141385220)
daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 
1605232600.247430, cycles 20701293488095152)
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893
Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000 
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771
Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components 
are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default 
trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl 
--start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> 
then, when the issue is seen, stop the trace by
mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related 
file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with
high cpu usage of mmfsd process
Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a
ls could take a few minutes to complete. This affects every gpfs mount
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science. I often end up killing many processes before finding the 
offending one.

My question is really about finding the offending process easier. Is there 
something similar to iotop or a trace that I can enable that can tell me 
what files/processes and being heavily used by the mmfsd process on the 
client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From UWEFALKE at de.ibm.com  Fri Nov 13 09:37:04 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Fri, 13 Nov 2020 10:37:04 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu_?=
 =?utf-8?q?usage=09of=09mmfsd_process?=
In-Reply-To: <OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
	<BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain>
Message-ID: <OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>

Hi Kamil, 
in my mail just a few minutes ago  I'd overlooked that the buffer size in 
your trace was indeed 128M (I suppose the trace file is adapting that size 
if not set in particular). That is very strange, even under high load, the 
trace should then capture some longer time than 10 secs, and , most of 
all, it should contain much more activities than just these few you had. 
That is very mysterious. 
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when 
everything is normal and check that out - you should see many records, the 
trcsum.awk should list just a small portion of unfinished ops at the end, 
... If that is fine, then the tracing itself is affected by your crritical 
condition (never experienced that before - rather GPFS grinds to a halt 
than the trace is abandoned), and that might well be worth a support 
ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Uwe Falke/Germany/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 10:21
Subject:        Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client 
performance with high cpu usage of      mmfsd process


Hi, Kamil, 
looks your tracefile setting has been too low: 
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
means you effectively captured a period of about 5ms only ... you can't 
see much from that. 

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to. 
try running with something like 
 mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N <your_critical_node>. 

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload. 

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff). 
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 03:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu usage     of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 100.019054 seconds and 
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using 
this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 
1605232600.114080, cycles 20701293141385220)
daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 
1605232600.247430, cycles 20701293488095152)
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) <---- to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893
Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000 
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771
Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec
# time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components 
are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default 
trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl 
--start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> 
then, when the issue is seen, stop the trace by
mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related 
file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with
high cpu usage of mmfsd process
Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a
ls could take a few minutes to complete. This affects every gpfs mount
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science. I often end up killing many processes before finding the 
offending one.

My question is really about finding the offending process easier. Is there 
something similar to iotop or a trace that I can enable that can tell me 
what files/processes and being heavily used by the mmfsd process on the 
client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From Kamil.Czauz at Squarepoint-Capital.com  Fri Nov 13 13:31:21 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Fri, 13 Nov 2020 13:31:21 +0000
Subject: [gpfsug-discuss] Poor client performance with high cpu
	usage	of	mmfsd process
In-Reply-To: <OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com>
	<BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain>
	<OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
Message-ID: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>

Hi Uwe -

Regarding your previous message - waiters were coming / going with just 1-2 waiters when I ran the mmdiag command, with very low wait times (<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and this was the header result:

Overwrite trace parameters:
  buffer size: 134217728
   64 kernel trace streams, indices 0-63 (selected by low bits of processor ID)
  128 daemon trace streams, indices 64-191 (selected by low bits of thread ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 cycles
Measured cycle count update rate to be 2600001271 per second <---- using this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
  kernel trace enabled  Fri Nov 13 08:20:01.800558000 2020 (TOD 1605273601.800558, cycles 20807897445779444)
  daemon trace enabled  Fri Nov 13 08:20:01.910017000 2020 (TOD 1605273601.910017, cycles 20807897730372442)
  all streams included  Fri Nov 13 08:20:26.423085049 2020 (TOD 1605273626.423085, cycles 20807961464381068) <---- useful part of trace extends from here
  trace quiesced        Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, cycles 20807965037900696) <----   to here
Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled with lookup commands  / large lookup times.  Can you help debug what those long lookup operations mean?

Unfinished operations:

   27967 ***************** pagein               ************** 1.362382116
   27967 ***************** readpage             ************** 1.362381516
  139130       1.362448448 *********  Unfinished IO: buffer/disk 3002F670000 20:107498951168^\archive_data_16
  104686       1.362022068 *********  Unfinished IO: buffer/disk 50011878000 1:47169618944^\archive_data_1
       0       0.000000000 *********  Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFE
  341710       1.362423815 *********  Unfinished IO: buffer/disk 20022218000 19:107498951680^\archive_data_15
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFE
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 18:3452986648^\00000000FFFFFFFF
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFF
  139150       1.361122006 *********  Unfinished IO: buffer/disk 50012018000 2:47169622016^\archive_data_2
       0       0.000000000 *********  Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\00000000FFFFFFFF
   95782       1.361112791 *********  Unfinished IO: buffer/disk 40016300000 20:107498950656^\archive_data_16
       0       0.000000000 *********  Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\00000000FFFFFFFF
  271076       1.361579585 *********  Unfinished IO: buffer/disk 20023DB8000 4:47169606656^\archive_data_4
  341676       1.362018599 *********  Unfinished IO: buffer/disk 40038140000 5:47169614336^\archive_data_5
  139150       1.361131599 MSG FSnd: nsdMsgReadExt msg_id 2930654492 Sduration 13292.382 + us
  341676       1.362027104 MSG FSnd: nsdMsgReadExt msg_id 2930654495 Sduration 12396.877 + us
   95782       1.361124739 MSG FSnd: nsdMsgReadExt msg_id 2930654491 Sduration 13299.242 + us
  271076       1.361587653 MSG FSnd: nsdMsgReadExt msg_id 2930654493 Sduration 12836.328 + us
   92182       0.000000000 MSG FSnd:  msg_id 0 Sduration 0.000 + us
  341710       1.362429643 MSG FSnd: nsdMsgReadExt msg_id 2930654497 Sduration 11994.338 + us
  341662       0.000000000 MSG FSnd:  msg_id 0 Sduration 0.000 + us
  139130       1.362458376 MSG FSnd: nsdMsgReadExt msg_id 2930654498 Sduration 11965.605 + us
  104686       1.362028772 MSG FSnd: nsdMsgReadExt msg_id 2930654496 Sduration 12395.209 + us
  412373       0.775676657 MSG FRep: nsdMsgReadExt msg_id 304915249 Rduration 598747.324 us Rlen 262144 Hduration 598752.112 + us
  341770       0.589739579 MSG FRep: nsdMsgReadExt msg_id 338079050 Rduration 784684.402 us Rlen 4 Hduration 784692.651 + us
  143315       0.536252844 MSG FRep: nsdMsgReadExt msg_id 631945522 Rduration 838171.137 us Rlen 233472 Hduration 838174.299 + us
  341878       0.134331812 MSG FRep: nsdMsgReadExt msg_id 338079023 Rduration 1240092.169 us Rlen 262144 Hduration 1240094.403 + us
  175478       0.587353287 MSG FRep: nsdMsgReadExt msg_id 338079047 Rduration 787070.694 us Rlen 262144 Hduration 787073.990 + us
  139558       0.633517347 MSG FRep: nsdMsgReadExt msg_id 631945538 Rduration 740906.634 us Rlen 102400 Hduration 740910.172 + us
  143308       0.958832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration 415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time:                                    1.374423981 seconds
Elapsed trace time from first VFS call to last:        1.374423980
Time idle between VFS calls:                           0.001603738 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  readpage                   1.151660085      1874      614.546
  rdwr                       0.431456904       581      742.611
  read_inode2                0.001180648       934        1.264
  follow_link                0.000029502         7        4.215
  getattr                    0.000048413         9        5.379
  revalidate                 0.000007080        67        0.106
  pagein                     1.149699537      1877      612.520
  create                     0.007664829         9      851.648
  open                       0.001032657        19       54.350
  unlink                     0.002563726        14      183.123
  delete_inode               0.000764598       826        0.926
  lookup                     0.312847947       953      328.277
  setattr                    0.020651226       824       25.062
  permission                 0.000015018         1       15.018
  rename                     0.000529023         4      132.256
  release                    0.001613800        22       73.355
  getxattr                   0.000030494         6        5.082
  mmap                       0.000054767         1       54.767
  llseek                     0.000001130         4        0.283
  readdir                    0.000033947         2       16.973
  removexattr                0.002119736       820        2.585

User thread stats: GPFS-time(sec)    Appl-time  GPFS-%  Appl-%     Ops
     42625       0.000000138       0.000031017   0.44%  99.56%       3
     42378       0.000586959       0.011596801   4.82%  95.18%      32
     42627       0.000000272       0.000013421   1.99%  98.01%       2
     42641       0.003284590       0.012593594  20.69%  79.31%      35
     42628       0.001522335       0.000002748  99.82%   0.18%       2
     25464       0.003462795       0.500281914   0.69%  99.31%      12
    301420       0.000016711       0.052848218   0.03%  99.97%      38
     95103       0.000000544       0.000000000 100.00%   0.00%       1
    145858       0.000000659       0.000794896   0.08%  99.92%       2
     42221       0.000011484       0.000039445  22.55%  77.45%       5
    371718       0.000000707       0.001805425   0.04%  99.96%       2
     95109       0.000000880       0.008998763   0.01%  99.99%       2
     95337       0.000010330       0.503057866   0.00% 100.00%       8
     42700       0.002442175       0.012504429  16.34%  83.66%      35
    189680       0.003466450       0.500128627   0.69%  99.31%       9
     42681       0.006685396       0.000391575  94.47%   5.53%      16
     42702       0.000048203       0.000000500  98.97%   1.03%       2
     42703       0.000033280       0.140102087   0.02%  99.98%       9
    224423       0.000000195       0.000000000 100.00%   0.00%       1
     42706       0.000541098       0.000014713  97.35%   2.65%       3
    106275       0.000000456       0.000000000 100.00%   0.00%       1
     42721       0.000372857       0.000000000 100.00%   0.00%       1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi Kamil,
in my mail just a few minutes ago  I'd overlooked that the buffer size in your trace was indeed 128M (I suppose the trace file is adapting that size if not set in particular). That is very strange, even under high load, the trace should then capture some longer time than 10 secs, and , most of all, it should contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when everything is normal and check that out - you should see many records, the trcsum.awk should list just a small portion of unfinished ops at the end, ... If that is fine, then the tracing itself is affected by your crritical condition (never experienced that before - rather GPFS grinds to a halt than the trace is abandoned), and that might well be worth a support ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Uwe Falke/Germany/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 10:21
Subject:        Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client
performance with high cpu usage of      mmfsd process


Hi, Kamil,
looks your tracefile setting has been too low:
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here means you effectively captured a period of about 5ms only ... you can't see much from that.

I'd assumed the default trace file size would be sufficient here but it doesn't seem to.
try running with something like
 mmtracectl --start --trace-file-size=512M --trace=io --tracedev-write-mode=overwrite -N <your_critical_node>.

However, if you say "no major waiter" - how many waiters did you see at any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload.

The very short cyle time of the trace points, OTOH, to high activity, OTOH the trace file setting appears quite low (trace=io doesnt' collect many trace infos, just basic IO stuff).
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 03:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance
with high cpu usage     of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 100.019054 seconds and
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using this value OS reported cycle count update rate as 2599999000 per second Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220) daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 20701293488095152) all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893 Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771 Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete. This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science. I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier. Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201113/c702b31c/attachment-0002.htm>

From stockf at us.ibm.com  Fri Nov 13 13:38:48 2020
From: stockf at us.ibm.com (Frederick Stock)
Date: Fri, 13 Nov 2020 13:38:48 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu?=
 =?utf-8?q?=09usage=09of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>,
	<BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com><BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com><OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain><OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
Message-ID: <OF0260E625.4DAA590F-ON0025861F.004AF58C-0025861F.004AF6E7@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201113/6be5bbb6/attachment-0002.htm>

From kkr at lbl.gov  Fri Nov 13 21:11:16 2020
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Fri, 13 Nov 2020 13:11:16 -0800
Subject: [gpfsug-discuss] REMINDER - SC20 Sessions - Monday Nov. 16 and
	Wednesday Nov. 18
Message-ID: <7B85E526-88D4-44AE-B034-4EC5A61E524C@lbl.gov>

Hi all,

	A Reminder to attend and also submit any panel questions for the Wednesday session. So far, there are 3 questions around these topics:

1)  excessive prefetch when reading small fractions of many large files
2)  improved the integration between TSM and GPFS
3) number of security vulnerabilities in GPFS, the GUI, ESS, or something else related

	Bring on your tough questions and make it interesting.

Cheers,
Kristy


?original email---

	The Spectrum Scale User Group will be hosting two 90 minute sessions at SC20 this year and we hope you can join us. The first one is:

"Storage for AI" and will be held Monday, Nov. 16th, from 11:00-12:30 EST 

and the second one is 

"What's new in Spectrum Scale 5.1?" and will be held Wednesday, Nov. 18th from 11:00-12:30 EST.  

Please see the calendar at https://www.spectrumscaleug.org/eventslist/2020-11/ and register by clicking on a session on the calendar and then the "Please register here to join the session" link.

Best,
Kristy

Kristy Kallback-Rose
Senior HPC Storage Systems Analyst
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory


From UWEFALKE at de.ibm.com  Mon Nov 16 13:45:57 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Mon, 16 Nov 2020 14:45:57 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Poor_client_performance_with_high_cpu?=
 =?utf-8?q?=09usage=09of=09mmfsd_process?=
In-Reply-To: <BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com><BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com><OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain><OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
	<BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
Message-ID: <OF2441E9DD.746C6053-ONC1258622.00456CFA-C1258622.004B9E2D@notes.na.collabserv.com>

Hi, 
while the other nodes can well block the local one, as Frederick suggests, 
 there should at least be something visible locally waiting for these 
other nodes. 
Looking at all waiters might be a good thing, but this case looks strange 
in other ways. Mind statement there are almost no local waiters and none 
of them gets older than 10 ms.

I am no developer nor do I have the code, so don't expect too much.  Can 
you tell what lookups you see (check in the trcrpt file, could be like 
gpfs_i_lookup or gpfs_v_lookup)? 
Lookups are metadata ops, do you have a separate pool for your metadata? 
How is that pool set up (doen to the physical block devices)?
Your trcsum down revealed 36 lookups, each one on avg taking >30ms. That 
is a lot (albeit the respective waiters won't show up at first glance as 
suspicious ...). 
So, which waiters did you see  (hope you saved them, if not, do it next 
time).

What are the node you see this on and the whole cluster used for? What is 
the MaxFilesToCache setting (for that node and for others)? what HW is 
that, how big are your nodes (memory,CPU)?
To check the unreasonably short trace capture time: how large are the 
trcrpt files you obtain?


Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 14:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu   usage   of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

Regarding your previous message - waiters were coming / going with just 
1-2 waiters when I ran the mmdiag command, with very low wait times 
(<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and 
this was the header result:

Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 
cycles
Measured cycle count update rate to be 2600001271 per second <---- using 
this value
OS reported cycle count update rate as 2599999000 per second
Trace milestones:
kernel trace enabled Fri Nov 13 08:20:01.800558000 2020 (TOD 
1605273601.800558, cycles 20807897445779444)
daemon trace enabled Fri Nov 13 08:20:01.910017000 2020 (TOD 
1605273601.910017, cycles 20807897730372442)
all streams included Fri Nov 13 08:20:26.423085049 2020 (TOD 
1605273626.423085, cycles 20807961464381068) <---- useful part of trace 
extends from here
trace quiesced Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, 
cycles 20807965037900696) <---- to here
Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled 
with lookup commands / large lookup times. Can you help debug what those 
long lookup operations mean?

Unfinished operations:

27967 ***************** pagein ************** 1.362382116
27967 ***************** readpage ************** 1.362381516
139130 1.362448448 ********* Unfinished IO: buffer/disk 3002F670000 
20:107498951168^\archive_data_16
104686 1.362022068 ********* Unfinished IO: buffer/disk 50011878000 
1:47169618944^\archive_data_1
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\FFFFFFFE
341710 1.362423815 ********* Unfinished IO: buffer/disk 20022218000 
19:107498951680^\archive_data_15
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
18:3452986648^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
18:3452986648^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
18:3452986648^\FFFFFFFF
139150 1.361122006 ********* Unfinished IO: buffer/disk 50012018000 
2:47169622016^\archive_data_2
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\00000000FFFFFFFF
95782 1.361112791 ********* Unfinished IO: buffer/disk 40016300000 
20:107498950656^\archive_data_16
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\00000000FFFFFFFF
271076 1.361579585 ********* Unfinished IO: buffer/disk 20023DB8000 
4:47169606656^\archive_data_4
341676 1.362018599 ********* Unfinished IO: buffer/disk 40038140000 
5:47169614336^\archive_data_5
139150 1.361131599 MSG FSnd: nsdMsgReadExt msg_id 2930654492 Sduration 
13292.382 + us
341676 1.362027104 MSG FSnd: nsdMsgReadExt msg_id 2930654495 Sduration 
12396.877 + us
95782 1.361124739 MSG FSnd: nsdMsgReadExt msg_id 2930654491 Sduration 
13299.242 + us
271076 1.361587653 MSG FSnd: nsdMsgReadExt msg_id 2930654493 Sduration 
12836.328 + us
92182 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
341710 1.362429643 MSG FSnd: nsdMsgReadExt msg_id 2930654497 Sduration 
11994.338 + us
341662 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
139130 1.362458376 MSG FSnd: nsdMsgReadExt msg_id 2930654498 Sduration 
11965.605 + us
104686 1.362028772 MSG FSnd: nsdMsgReadExt msg_id 2930654496 Sduration 
12395.209 + us
412373 0.775676657 MSG FRep: nsdMsgReadExt msg_id 304915249 Rduration 
598747.324 us Rlen 262144 Hduration 598752.112 + us
341770 0.589739579 MSG FRep: nsdMsgReadExt msg_id 338079050 Rduration 
784684.402 us Rlen 4 Hduration 784692.651 + us
143315 0.536252844 MSG FRep: nsdMsgReadExt msg_id 631945522 Rduration 
838171.137 us Rlen 233472 Hduration 838174.299 + us
341878 0.134331812 MSG FRep: nsdMsgReadExt msg_id 338079023 Rduration 
1240092.169 us Rlen 262144 Hduration 1240094.403 + us
175478 0.587353287 MSG FRep: nsdMsgReadExt msg_id 338079047 Rduration 
787070.694 us Rlen 262144 Hduration 787073.990 + us
139558 0.633517347 MSG FRep: nsdMsgReadExt msg_id 631945538 Rduration 
740906.634 us Rlen 102400 Hduration 740910.172 + us
143308 0.958832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration 
415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time: 1.374423981 seconds
Elapsed trace time from first VFS call to last: 1.374423980
Time idle between VFS calls: 0.001603738 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs
readpage 1.151660085 1874 614.546
rdwr 0.431456904 581 742.611
read_inode2 0.001180648 934 1.264
follow_link 0.000029502 7 4.215
getattr 0.000048413 9 5.379
revalidate 0.000007080 67 0.106
pagein 1.149699537 1877 612.520
create 0.007664829 9 851.648
open 0.001032657 19 54.350
unlink 0.002563726 14 183.123
delete_inode 0.000764598 826 0.926
lookup 0.312847947 953 328.277
setattr 0.020651226 824 25.062
permission 0.000015018 1 15.018
rename 0.000529023 4 132.256
release 0.001613800 22 73.355
getxattr 0.000030494 6 5.082
mmap 0.000054767 1 54.767
llseek 0.000001130 4 0.283
readdir 0.000033947 2 16.973
removexattr 0.002119736 820 2.585

User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
42625 0.000000138 0.000031017 0.44% 99.56% 3
42378 0.000586959 0.011596801 4.82% 95.18% 32
42627 0.000000272 0.000013421 1.99% 98.01% 2
42641 0.003284590 0.012593594 20.69% 79.31% 35
42628 0.001522335 0.000002748 99.82% 0.18% 2
25464 0.003462795 0.500281914 0.69% 99.31% 12
301420 0.000016711 0.052848218 0.03% 99.97% 38
95103 0.000000544 0.000000000 100.00% 0.00% 1
145858 0.000000659 0.000794896 0.08% 99.92% 2
42221 0.000011484 0.000039445 22.55% 77.45% 5
371718 0.000000707 0.001805425 0.04% 99.96% 2
95109 0.000000880 0.008998763 0.01% 99.99% 2
95337 0.000010330 0.503057866 0.00% 100.00% 8
42700 0.002442175 0.012504429 16.34% 83.66% 35
189680 0.003466450 0.500128627 0.69% 99.31% 9
42681 0.006685396 0.000391575 94.47% 5.53% 16
42702 0.000048203 0.000000500 98.97% 1.03% 2
42703 0.000033280 0.140102087 0.02% 99.98% 9
224423 0.000000195 0.000000000 100.00% 0.00% 1
42706 0.000541098 0.000014713 97.35% 2.65% 3
106275 0.000000456 0.000000000 100.00% 0.00% 1
42721 0.000372857 0.000000000 100.00% 0.00% 1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi Kamil,
in my mail just a few minutes ago I'd overlooked that the buffer size in 
your trace was indeed 128M (I suppose the trace file is adapting that size 
if not set in particular). That is very strange, even under high load, the 
trace should then capture some longer time than 10 secs, and , most of 
all, it should contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when 
everything is normal and check that out - you should see many records, the 
trcsum.awk should list just a small portion of unfinished ops at the end, 
... If that is fine, then the tracing itself is affected by your crritical 
condition (never experienced that before - rather GPFS grinds to a halt 
than the trace is abandoned), and that might well be worth a support 
ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: Uwe Falke/Germany/IBM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 10:21
Subject: Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client
performance with high cpu usage of mmfsd process


Hi, Kamil,
looks your tracefile setting has been too low:
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) <---- useful part of trace 
extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 
1605232700.000133, cycles 20701553190681534) <---- to here means you 
effectively captured a period of about 5ms only ... you can't see much 
from that.

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to.
try running with something like
mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N <your_critical_node>.

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload.

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff).
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: 
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 03:33
Subject: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance
with high cpu usage of mmfsd process
Sent by: gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 100.019054 seconds and
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using 
this value OS reported cycle count update rate as 2599999000 per second 
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 
1605232600.114080, cycles 20701293141385220) daemon trace enabled Thu Nov 
12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 
20701293488095152) all streams included Thu Nov 12 20:58:19.950515266 2020 
(TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of 
trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 
(TOD 1605232700.000133, cycles 20701553190681534) <---- to here 
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 
2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 
2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893 Time idle 
between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs 
rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec # time(sec) 
count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771 Time idle 
between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs 
rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) 
count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage 
of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components 
are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default 
trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl 
--start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> 
then, when the issue is seen, stop the trace by mmtracectl --stop -N 
<your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related 
file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu 
usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a 
ls could take a few minutes to complete. This affects every gpfs mount on 
the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science. I often end up killing many processes before finding the 
offending one.

My question is really about finding the offending process easier. Is there 
something similar to iotop or a trace that I can enable that can tell me 
what files/processes and being heavily used by the mmfsd process on the 
client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
http://www.squarepoint-capital.com. Please note that e-mails may be 
monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From andi at christiansen.xxx  Mon Nov 16 19:44:14 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Mon, 16 Nov 2020 20:44:14 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over
	NFS?
Message-ID: <1388247256.209171.1605555854969@privateemail.com>

Hi all,

i have got a case where a customer wants 700TB migrated from isilon to Scale and the only way for him is exporting the same directory on NFS from two different nodes...

as of now we are using multiple rsync processes on different parts of folders within the main directory. this is really slow and will take forever.. right now 14 rsync processes spread across 3 nodes fetching from 2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from scale nodes and 20Gbits from isilon so we should be able to reach just under 20Gbit...


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201116/db8d01a1/attachment-0002.htm>

From stockf at us.ibm.com  Mon Nov 16 21:44:30 2020
From: stockf at us.ibm.com (Frederick Stock)
Date: Mon, 16 Nov 2020 21:44:30 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201116/a17a56c4/attachment-0002.htm>

From skylar2 at uw.edu  Mon Nov 16 21:58:19 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Mon, 16 Nov 2020 13:58:19 -0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <20201116215819.wda6nophekamzs3v@thargelion>

When we did a similar (though larger, at ~2.5PB) migration, we used rsync
as well, but ran one rsync process per Isilon node, and made sure the NFS
clients were hitting separate Isilon nodes for their reads. We also didn't
have more than one rsync process running per client, as the Linux NFS
client (at least in CentOS 6) was terrible when it came to concurrent access.

Whatever method you end up using, I can guarantee you will be much happier
once you are on GPFS. :)

On Mon, Nov 16, 2020 at 08:44:14PM +0100, Andi Christiansen wrote:
> Hi all,
> 
> i have got a case where a customer wants 700TB migrated from isilon to Scale and the only way for him is exporting the same directory on NFS from two different nodes...
> 
> as of now we are using multiple rsync processes on different parts of folders within the main directory. this is really slow and will take forever.. right now 14 rsync processes spread across 3 nodes fetching from 2.. 
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from scale nodes and 20Gbits from isilon so we should be able to reach just under 20Gbit...
> 
> 
> if anyone have any ideas they are welcome! 
> 
> 
> Thanks in advance 
> Andi Christiansen

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From jonathan.buzzard at strath.ac.uk  Mon Nov 16 22:58:49 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 16 Nov 2020 22:58:49 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <4de1fa02-a074-0901-cf12-31be9e843f5f@strath.ac.uk>

On 16/11/2020 19:44, Andi Christiansen wrote:
> Hi all,
> 
> i have got a case where a customer wants 700TB migrated from isilon to 
> Scale and the only way for him is exporting the same directory on NFS 
> from two different nodes...
> 
> as of now we are using multiple rsync processes on different parts of 
> folders within the main directory. this is really slow and will take 
> forever.. right now 14 rsync processes spread across 3 nodes fetching 
> from 2..
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit 
> from scale nodes and 20Gbits from isilon so we should be able to reach 
> just under 20Gbit...
> 
> 
> if anyone have any ideas they are welcome!
> 

My biggest recommendation when doing this is to use a sqlite database to 
keep track of what is going on.

The main issue is that you are almost certainly going to need to do more 
than one rsync pass unless your source Isilon system has no user 
activity, and with 700TB to move that seems unlikely. Typically you do 
an initial rsync to move the bulk of the data while the users are still 
live, then shutdown user access to the source system and do the final 
rsync which hopefully has a significantly smaller amount of data to 
actually move.

So this is what I have done on a number of occasions now. I create a 
very simple sqlite DB with a list of source and destination folders and 
a status code. Initially the status code is set to -1.

Then I have a perl script which looks at the sqlite DB, picks a row with 
a status code of -1, and sets the status code to -2, aka that directory 
is in progress. It then proceeds to run the rsync and when it finishes 
it updates the status code to the exit code of the rsync process.

As long as all the rsync processes have access to the same copy of the 
sqlite DB (simplest to put it on either the source or destination file 
system) then all is good. You can fire off multiple rsync's on multiple 
nodes and they will all keep churning away till there is no more work to 
be done.

The advantage is you can easily interrogate the DB to find out the state 
of play. That is how many of your transfers have completed, how many are 
yet to be done, which ones are currently being transferred etc. without 
logging onto multiple nodes.

*MOST* importantly you can see if any of the rsync's had an error, by 
simply looking for status codes greater than zero. I cannot stress how 
important this is. Noting that if the source is still active you will 
see errors down to files being deleted on the source file system before 
rsync has a chance to copy them. However this has a specific exit code 
(24) so is easy to spot and not worry about.

Finally it is also very simple to set the status codes to -1 again and 
set the process away again. So the final run is easier to do.

If you want to mail me off list I can dig out a copy of the perl code I 
used if your interested. There are several version as I have tended to 
tailor to each transfer.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Mon Nov 16 23:12:47 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 16 Nov 2020 23:12:47 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <20201116215819.wda6nophekamzs3v@thargelion>
References: <1388247256.209171.1605555854969@privateemail.com>
	<20201116215819.wda6nophekamzs3v@thargelion>
Message-ID: <8d4d2987-77dd-e3e1-1c98-a635f1b96ddd@strath.ac.uk>

On 16/11/2020 21:58, Skylar Thompson wrote:
> When we did a similar (though larger, at ~2.5PB) migration, we used rsync
> as well, but ran one rsync process per Isilon node, and made sure the NFS
> clients were hitting separate Isilon nodes for their reads. We also didn't
> have more than one rsync process running per client, as the Linux NFS
> client (at least in CentOS 6) was terrible when it came to concurrent access.
> 

The million dollar question IMHO is the number of files and their sizes.

Basically if you have a million 1KB files to move it is going to take 
much longer than a 100 1GB files. That is the overhead of dealing with 
each file is a real bitch and kills your attainable transfer speed stone 
dead.

One option I have used in the past is to use your last backup and 
restore to the new system, then rsync in the changes. That way you don't 
impact the source file system which is live.

Another option I have used is to inform users in advance that data will 
be transferred based on a metric of how many files and how much data 
they have. So the less data and fewer files the quicker you will get 
access to the new system once access to the old system is turned off.

It is amazing how much users clear up junk under this scenario. Last 
time I did this a single user went from over 17 million files to 11 
thousand! In total many many TB of data just vanished from the system 
(around half of the data when puff) as users actually got around to some 
house keeping LOL. Moving less data and files is always less painful.

> Whatever method you end up using, I can guarantee you will be much happier
> once you are on GPFS. :)
> 
Goes without saying :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From UWEFALKE at de.ibm.com  Tue Nov 17 08:50:56 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Tue, 17 Nov 2020 09:50:56 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <1388247256.209171.1605555854969@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
Message-ID: <OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309BB9@notes.na.collabserv.com>

Hi Andi, 

what about leaving NFS completeley out and using rsync  (multiple rsyncs 
in parallel, of course) directly between your source and target servers? 
I am not sure how many TCP connections (suppose it is NFS4) in parallel 
are opened between client and server, using a 2x bonded interface well 
requires at least two.  That combined with the DB approach suggested by 
Jonathan to control the activity of the rsync streams would be my best 
guess.
If you have many small files, the overhead might still kill you. Tarring 
them up into larger aggregates for transfer would help a lot, but then you 
must be sure they won't change or you need to implement your own version 
control for that class of files.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Andi Christiansen <andi at christiansen.xxx>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   16/11/2020 20:44
Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
Isilon to Scale over    NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi all, 

i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes... 

as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit... 


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From UWEFALKE at de.ibm.com  Tue Nov 17 08:57:07 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Tue, 17 Nov 2020 09:57:07 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
Message-ID: <OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>

Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but 
it is over two nodes, so no bonding. But still, I'd expect to open several 
TCP connections in parallel per source-target pair  (like with several 
rsyncs per source node) would bear an advantage (and still I thing NFS 
doesn't do that, but I can be wrong). 
If more nodes have access to the Isilon data they could also participate 
(and don't need NFS exports for that).

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Uwe Falke/Germany/IBM
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   17/11/2020 09:50
Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over       NFS?


Hi Andi, 

what about leaving NFS completeley out and using rsync  (multiple rsyncs 
in parallel, of course) directly between your source and target servers? 
I am not sure how many TCP connections (suppose it is NFS4) in parallel 
are opened between client and server, using a 2x bonded interface well 
requires at least two.  That combined with the DB approach suggested by 
Jonathan to control the activity of the rsync streams would be my best 
guess.
If you have many small files, the overhead might still kill you. Tarring 
them up into larger aggregates for transfer would help a lot, but then you 
must be sure they won't change or you need to implement your own version 
control for that class of files.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   Andi Christiansen <andi at christiansen.xxx>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   16/11/2020 20:44
Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
Isilon to Scale over    NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi all, 

i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes... 

as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit... 


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From andi at christiansen.xxx  Tue Nov 17 11:51:58 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 17 Nov 2020 12:51:58 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over	NFS?
In-Reply-To: <OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
Message-ID: <616234716.258600.1605613918767@privateemail.com>

Hi all,

thanks for all the information, there was some interesting things amount it..

I kept on going with rsync and ended up making a file with all top level user directories and splitting them into chunks of 347 per rsync session(total 42000 ish folders). yesterday we had only 14 sessions with 3000 folders in each and that was too much work for one rsync session..

i divided them out among all GPFS nodes to have them fetch an area each and actually doing that 3 times on each node and that has now boosted the bandwidth usage from 3Gbit to around 16Gbit in total..

all nodes have been seing doing work above 7Gbit individual which is actually near to what i was expecting without any modifications to the NFS server or TCP tuning..

CPU is around 30-50% on each server and mostly below or around 30% so it seems like it could have handled abit more sessions..

Small files are really a killer but with all 96+ sessions we have now its not often all sessions are handling small files at the same time so we have an average of about 10-12Gbit bandwidth usage.

Thanks all! ill keep you in mind if for some reason we see it slowing down again but for now i think we will try to see if it will go the last mile with a bit more sessions on each :)

Best Regards
Andi Christiansen

> On 11/17/2020 9:57 AM Uwe Falke <uwefalke at de.ibm.com> wrote:
> 
>  
> Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but 
> it is over two nodes, so no bonding. But still, I'd expect to open several 
> TCP connections in parallel per source-target pair  (like with several 
> rsyncs per source node) would bear an advantage (and still I thing NFS 
> doesn't do that, but I can be wrong). 
> If more nodes have access to the Isilon data they could also participate 
> (and don't need NFS exports for that).
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Uwe Falke
> IT Specialist
> Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
> Services
> +49 175 575 2877 Mobile
> Rathausstr. 7, 09111 Chemnitz, Germany
> uwefalke at de.ibm.com
> 
> IBM Services
> 
> IBM Data Privacy Statement
> 
> IBM Deutschland Business & Technology Services GmbH
> Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
> 
> 
> 
> From:   Uwe Falke/Germany/IBM
> To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:   17/11/2020 09:50
> Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data 
> from Isilon to Scale over       NFS?
> 
> 
> Hi Andi, 
> 
> what about leaving NFS completeley out and using rsync  (multiple rsyncs 
> in parallel, of course) directly between your source and target servers? 
> I am not sure how many TCP connections (suppose it is NFS4) in parallel 
> are opened between client and server, using a 2x bonded interface well 
> requires at least two.  That combined with the DB approach suggested by 
> Jonathan to control the activity of the rsync streams would be my best 
> guess.
> If you have many small files, the overhead might still kill you. Tarring 
> them up into larger aggregates for transfer would help a lot, but then you 
> must be sure they won't change or you need to implement your own version 
> control for that class of files.
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Uwe Falke
> IT Specialist
> Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
> Services
> +49 175 575 2877 Mobile
> Rathausstr. 7, 09111 Chemnitz, Germany
> uwefalke at de.ibm.com
> 
> IBM Services
> 
> IBM Data Privacy Statement
> 
> IBM Deutschland Business & Technology Services GmbH
> Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
> 
> 
> 
> 
> From:   Andi Christiansen <andi at christiansen.xxx>
> To:     "gpfsug-discuss at spectrumscale.org" 
> <gpfsug-discuss at spectrumscale.org>
> Date:   16/11/2020 20:44
> Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
> Isilon to Scale over    NFS?
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> Hi all, 
> 
> i have got a case where a customer wants 700TB migrated from isilon to 
> Scale and the only way for him is exporting the same directory on NFS from 
> two different nodes... 
> 
> as of now we are using multiple rsync processes on different parts of 
> folders within the main directory. this is really slow and will take 
> forever.. right now 14 rsync processes spread across 3 nodes fetching from 
> 2.. 
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
> scale nodes and 20Gbits from isilon so we should be able to reach just 
> under 20Gbit... 
> 
> 
> if anyone have any ideas they are welcome! 
> 
> 
> Thanks in advance 
> Andi Christiansen _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From janfrode at tanso.net  Tue Nov 17 12:07:30 2020
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Tue, 17 Nov 2020 13:07:30 +0100
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <616234716.258600.1605613918767@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
Message-ID: <CAHwPatjXpM_2B_FZd8b_ETQXL2AS3Es6oJmq7VVzoxa-8kp48g@mail.gmail.com>

Nice to see it working well!

But, what about ACLs? Does you rsync pull in all needed metadata, or do you
also need to sync ACLs ? Any plans for how to solve that ?

On Tue, Nov 17, 2020 at 12:52 PM Andi Christiansen <andi at christiansen.xxx>
wrote:

> Hi all,
>
> thanks for all the information, there was some interesting things amount
> it..
>
> I kept on going with rsync and ended up making a file with all top level
> user directories and splitting them into chunks of 347 per rsync
> session(total 42000 ish folders). yesterday we had only 14 sessions with
> 3000 folders in each and that was too much work for one rsync session..
>
> i divided them out among all GPFS nodes to have them fetch an area each
> and actually doing that 3 times on each node and that has now boosted the
> bandwidth usage from 3Gbit to around 16Gbit in total..
>
> all nodes have been seing doing work above 7Gbit individual which is
> actually near to what i was expecting without any modifications to the NFS
> server or TCP tuning..
>
> CPU is around 30-50% on each server and mostly below or around 30% so it
> seems like it could have handled abit more sessions..
>
> Small files are really a killer but with all 96+ sessions we have now its
> not often all sessions are handling small files at the same time so we have
> an average of about 10-12Gbit bandwidth usage.
>
> Thanks all! ill keep you in mind if for some reason we see it slowing down
> again but for now i think we will try to see if it will go the last mile
> with a bit more sessions on each :)
>
> Best Regards
> Andi Christiansen
>
> > On 11/17/2020 9:57 AM Uwe Falke <uwefalke at de.ibm.com> wrote:
> >
> >
> > Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons,
> but
> > it is over two nodes, so no bonding. But still, I'd expect to open
> several
> > TCP connections in parallel per source-target pair  (like with several
> > rsyncs per source node) would bear an advantage (and still I thing NFS
> > doesn't do that, but I can be wrong).
> > If more nodes have access to the Isilon data they could also participate
> > (and don't need NFS exports for that).
> >
> > Mit freundlichen Gr??en / Kind regards
> >
> > Dr. Uwe Falke
> > IT Specialist
> > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> > Services
> > +49 175 575 2877 Mobile
> > Rathausstr. 7, 09111 Chemnitz, Germany
> > uwefalke at de.ibm.com
> >
> > IBM Services
> >
> > IBM Data Privacy Statement
> >
> > IBM Deutschland Business & Technology Services GmbH
> > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> > Sitz der Gesellschaft: Ehningen
> > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >
> >
> >
> > From:   Uwe Falke/Germany/IBM
> > To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Date:   17/11/2020 09:50
> > Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data
> > from Isilon to Scale over       NFS?
> >
> >
> > Hi Andi,
> >
> > what about leaving NFS completeley out and using rsync  (multiple rsyncs
> > in parallel, of course) directly between your source and target servers?
> > I am not sure how many TCP connections (suppose it is NFS4) in parallel
> > are opened between client and server, using a 2x bonded interface well
> > requires at least two.  That combined with the DB approach suggested by
> > Jonathan to control the activity of the rsync streams would be my best
> > guess.
> > If you have many small files, the overhead might still kill you. Tarring
> > them up into larger aggregates for transfer would help a lot, but then
> you
> > must be sure they won't change or you need to implement your own version
> > control for that class of files.
> >
> > Mit freundlichen Gr??en / Kind regards
> >
> > Dr. Uwe Falke
> > IT Specialist
> > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> > Services
> > +49 175 575 2877 Mobile
> > Rathausstr. 7, 09111 Chemnitz, Germany
> > uwefalke at de.ibm.com
> >
> > IBM Services
> >
> > IBM Data Privacy Statement
> >
> > IBM Deutschland Business & Technology Services GmbH
> > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> > Sitz der Gesellschaft: Ehningen
> > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >
> >
> >
> >
> > From:   Andi Christiansen <andi at christiansen.xxx>
> > To:     "gpfsug-discuss at spectrumscale.org"
> > <gpfsug-discuss at spectrumscale.org>
> > Date:   16/11/2020 20:44
> > Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from
> > Isilon to Scale over    NFS?
> > Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> >
> >
> >
> > Hi all,
> >
> > i have got a case where a customer wants 700TB migrated from isilon to
> > Scale and the only way for him is exporting the same directory on NFS
> from
> > two different nodes...
> >
> > as of now we are using multiple rsync processes on different parts of
> > folders within the main directory. this is really slow and will take
> > forever.. right now 14 rsync processes spread across 3 nodes fetching
> from
> > 2..
> >
> > does anyone know of a way to speed it up? right now we see from 1Gbit to
> > 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit
> from
> > scale nodes and 20Gbits from isilon so we should be able to reach just
> > under 20Gbit...
> >
> >
> > if anyone have any ideas they are welcome!
> >
> >
> > Thanks in advance
> > Andi Christiansen _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201117/1fba22bb/attachment-0002.htm>

From andi at christiansen.xxx  Tue Nov 17 12:24:22 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 17 Nov 2020 13:24:22 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <CAHwPatjXpM_2B_FZd8b_ETQXL2AS3Es6oJmq7VVzoxa-8kp48g@mail.gmail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<CAHwPatjXpM_2B_FZd8b_ETQXL2AS3Es6oJmq7VVzoxa-8kp48g@mail.gmail.com>
Message-ID: <1023406427.259407.1605615862969@privateemail.com>

Hi Jan,

We are syncing ACLs, groups, owners and timestamps aswell :)

/Andi Christiansen

>     On 11/17/2020 1:07 PM Jan-Frode Myklebust <janfrode at tanso.net> wrote:
> 
> 
>     Nice to see it working well!
> 
>     But, what about ACLs? Does you rsync pull in all needed metadata, or do you also need to sync ACLs ? Any plans for how to solve that ?
> 
>     On Tue, Nov 17, 2020 at 12:52 PM Andi Christiansen <andi at christiansen.xxx> wrote:
> 
>         > > Hi all,
> > 
> >         thanks for all the information, there was some interesting things amount it..
> > 
> >         I kept on going with rsync and ended up making a file with all top level user directories and splitting them into chunks of 347 per rsync session(total 42000 ish folders). yesterday we had only 14 sessions with 3000 folders in each and that was too much work for one rsync session..
> > 
> >         i divided them out among all GPFS nodes to have them fetch an area each and actually doing that 3 times on each node and that has now boosted the bandwidth usage from 3Gbit to around 16Gbit in total..
> > 
> >         all nodes have been seing doing work above 7Gbit individual which is actually near to what i was expecting without any modifications to the NFS server or TCP tuning..
> > 
> >         CPU is around 30-50% on each server and mostly below or around 30% so it seems like it could have handled abit more sessions..
> > 
> >         Small files are really a killer but with all 96+ sessions we have now its not often all sessions are handling small files at the same time so we have an average of about 10-12Gbit bandwidth usage.
> > 
> >         Thanks all! ill keep you in mind if for some reason we see it slowing down again but for now i think we will try to see if it will go the last mile with a bit more sessions on each :)
> > 
> >         Best Regards
> >         Andi Christiansen
> > 
> >         > On 11/17/2020 9:57 AM Uwe Falke <uwefalke at de.ibm.com mailto:uwefalke at de.ibm.com > wrote:
> >         >
> >         > 
> >         > Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but
> >         > it is over two nodes, so no bonding. But still, I'd expect to open several
> >         > TCP connections in parallel per source-target pair  (like with several
> >         > rsyncs per source node) would bear an advantage (and still I thing NFS
> >         > doesn't do that, but I can be wrong).
> >         > If more nodes have access to the Isilon data they could also participate
> >         > (and don't need NFS exports for that).
> >         >
> >         > Mit freundlichen Gr??en / Kind regards
> >         >
> >         > Dr. Uwe Falke
> >         > IT Specialist
> >         > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> >         > Services
> >         > +49 175 575 2877 Mobile
> >         > Rathausstr. 7, 09111 Chemnitz, Germany
> >         > uwefalke at de.ibm.com mailto:uwefalke at de.ibm.com
> >         >
> >         > IBM Services
> >         >
> >         > IBM Data Privacy Statement
> >         >
> >         > IBM Deutschland Business & Technology Services GmbH
> >         > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> >         > Sitz der Gesellschaft: Ehningen
> >         > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >         >
> >         >
> >         >
> >         > From:   Uwe Falke/Germany/IBM
> >         > To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org mailto:gpfsug-discuss at spectrumscale.org >
> >         > Date:   17/11/2020 09:50
> >         > Subject:        Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data
> >         > from Isilon to Scale over       NFS?
> >         >
> >         >
> >         > Hi Andi,
> >         >
> >         > what about leaving NFS completeley out and using rsync  (multiple rsyncs
> >         > in parallel, of course) directly between your source and target servers?
> >         > I am not sure how many TCP connections (suppose it is NFS4) in parallel
> >         > are opened between client and server, using a 2x bonded interface well
> >         > requires at least two.  That combined with the DB approach suggested by
> >         > Jonathan to control the activity of the rsync streams would be my best
> >         > guess.
> >         > If you have many small files, the overhead might still kill you. Tarring
> >         > them up into larger aggregates for transfer would help a lot, but then you
> >         > must be sure they won't change or you need to implement your own version
> >         > control for that class of files.
> >         >
> >         > Mit freundlichen Gr??en / Kind regards
> >         >
> >         > Dr. Uwe Falke
> >         > IT Specialist
> >         > Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> >         > Services
> >         > +49 175 575 2877 Mobile
> >         > Rathausstr. 7, 09111 Chemnitz, Germany
> >         > uwefalke at de.ibm.com mailto:uwefalke at de.ibm.com
> >         >
> >         > IBM Services
> >         >
> >         > IBM Data Privacy Statement
> >         >
> >         > IBM Deutschland Business & Technology Services GmbH
> >         > Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
> >         > Sitz der Gesellschaft: Ehningen
> >         > Registergericht: Amtsgericht Stuttgart, HRB 17122
> >         >
> >         >
> >         >
> >         >
> >         > From:   Andi Christiansen <andi at christiansen.xxx>
> >         > To:     "gpfsug-discuss at spectrumscale.org mailto:gpfsug-discuss at spectrumscale.org "
> >         > <gpfsug-discuss at spectrumscale.org mailto:gpfsug-discuss at spectrumscale.org >
> >         > Date:   16/11/2020 20:44
> >         > Subject:        [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from
> >         > Isilon to Scale over    NFS?
> >         > Sent by:        gpfsug-discuss-bounces at spectrumscale.org mailto:gpfsug-discuss-bounces at spectrumscale.org
> >         >
> >         >
> >         >
> >         > Hi all,
> >         >
> >         > i have got a case where a customer wants 700TB migrated from isilon to
> >         > Scale and the only way for him is exporting the same directory on NFS from
> >         > two different nodes...
> >         >
> >         > as of now we are using multiple rsync processes on different parts of
> >         > folders within the main directory. this is really slow and will take
> >         > forever.. right now 14 rsync processes spread across 3 nodes fetching from
> >         > 2..
> >         >
> >         > does anyone know of a way to speed it up? right now we see from 1Gbit to
> >         > 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from
> >         > scale nodes and 20Gbits from isilon so we should be able to reach just
> >         > under 20Gbit...
> >         >
> >         >
> >         > if anyone have any ideas they are welcome!
> >         >
> >         >
> >         > Thanks in advance
> >         > Andi Christiansen _______________________________________________
> >         > gpfsug-discuss mailing list
> >         > gpfsug-discuss athttp://spectrumscale.org
> >         > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         > _______________________________________________
> >         > gpfsug-discuss mailing list
> >         > gpfsug-discuss athttp://spectrumscale.org
> >         > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >         _______________________________________________
> >         gpfsug-discuss mailing list
> >         gpfsug-discuss athttp://spectrumscale.org
> >         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> >     >     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201117/e8883d92/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Tue Nov 17 13:53:43 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 17 Nov 2020 13:53:43 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <616234716.258600.1605613918767@privateemail.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
Message-ID: <12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>

On 17/11/2020 11:51, Andi Christiansen wrote:
> Hi all,
> 
> thanks for all the information, there was some interesting things
> amount it..
> 
> I kept on going with rsync and ended up making a file with all top
> level user directories and splitting them into chunks of 347 per
> rsync session(total 42000 ish folders). yesterday we had only 14
> sessions with 3000 folders in each and that was too much work for one
> rsync session..

Unless you use something similar to my DB suggestion it is almost 
inevitable that some of those rsync sessions are going to have issues 
and you will have no way to track it or even know it has happened unless 
you do a single final giant catchup/check rsync.

I should add that a copy of the sqlite DB is cover your backside 
protection when a user pops up claiming that you failed to transfer one 
of their vitally important files six months down the line and the old 
system is turned off and scrapped.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From skylar2 at uw.edu  Tue Nov 17 14:59:43 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Tue, 17 Nov 2020 06:59:43 -0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
Message-ID: <20201117145943.5cxyfpfyrk7udmn4@thargelion>

On Tue, Nov 17, 2020 at 01:53:43PM +0000, Jonathan Buzzard wrote:
> On 17/11/2020 11:51, Andi Christiansen wrote:
> > Hi all,
> > 
> > thanks for all the information, there was some interesting things
> > amount it..
> > 
> > I kept on going with rsync and ended up making a file with all top
> > level user directories and splitting them into chunks of 347 per
> > rsync session(total 42000 ish folders). yesterday we had only 14
> > sessions with 3000 folders in each and that was too much work for one
> > rsync session..
> 
> Unless you use something similar to my DB suggestion it is almost inevitable
> that some of those rsync sessions are going to have issues and you will have
> no way to track it or even know it has happened unless you do a single final
> giant catchup/check rsync.
> 
> I should add that a copy of the sqlite DB is cover your backside protection
> when a user pops up claiming that you failed to transfer one of their
> vitally important files six months down the line and the old system is
> turned off and scrapped.

That's not a bad idea, and I like it more than the method I setup where we
captured the output of find from both sides of the transfer and preserved
it for posterity, but obviously did require a hard-stop date on the source.

Fortunately, we seem committed to GPFS so it might be we never have to do
another bulk transfer outside of the filesystem...

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From S.J.Thompson at bham.ac.uk  Tue Nov 17 15:55:41 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 17 Nov 2020 15:55:41 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <20201117145943.5cxyfpfyrk7udmn4@thargelion>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
	<20201117145943.5cxyfpfyrk7udmn4@thargelion>
Message-ID: <55E3401C-2F59-4B47-A176-CDF7BCACBE2E@bham.ac.uk>


>    Fortunately, we seem committed to GPFS so it might be we never have to do
>    another bulk transfer outside of the filesystem...

Until you want to move a v3 or v4 created file-system to v5 block sizes __

I hopes we won't be doing that sort of thing again...

Simon


From jonathan.buzzard at strath.ac.uk  Tue Nov 17 19:45:29 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 17 Nov 2020 19:45:29 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <55E3401C-2F59-4B47-A176-CDF7BCACBE2E@bham.ac.uk>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
	<20201117145943.5cxyfpfyrk7udmn4@thargelion>
	<55E3401C-2F59-4B47-A176-CDF7BCACBE2E@bham.ac.uk>
Message-ID: <1a1be12b-a4f2-f2b3-4cdf-e34bc5eace24@strath.ac.uk>

On 17/11/2020 15:55, Simon Thompson wrote:
> 
>>     Fortunately, we seem committed to GPFS so it might be we never have to do
>>     another bulk transfer outside of the filesystem...
> 
> Until you want to move a v3 or v4 created file-system to v5 block sizes __

You forget the v2 to v3 for more than two billion files switch. Either 
that or you where not using it back then. Then there was the v3.2 if you 
ever want to mount it on Windows.

> 
> I hopes we won't be doing that sort of thing again...
> 

Yep, going to be recycling my scripts in the coming week for a v4 to v5 
with capacity upgrade on our DSS-G. That basically involves a trashing 
of the file system and a restore from backup.

Going to be doing the your data will be restored based on a metric of 
how many files and how much data you have ploy again :-)

I too hope that will be the last time I have to do anything similar but 
my experience of the last couple of decades says that is likely to be a 
forlorn hope :-(

I speculate that one day the 10,000 file set limit will be lifted, but 
only if you reformat your file system...

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From andi at christiansen.xxx  Tue Nov 17 20:40:39 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Tue, 17 Nov 2020 21:40:39 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OF0ADFC399.48B30D64-ONC1258623.002FF613-C1258623.00309B3F@LocalDomain>
	<OFC9AD76BA.40B09ABB-ONC1258623.0030D447-C1258623.00312C61@notes.na.collabserv.com>
	<616234716.258600.1605613918767@privateemail.com>
	<12435be3-3eb4-00b9-5939-e83eb1649168@strath.ac.uk>
Message-ID: <82434297.276248.1605645639435@privateemail.com>

Hi Jonathan,

yes you are correct! but we plan to resync this once or twice every week for the next 3-4months to be sure everything is as it should be.

Right now we are focused on getting them synced up and then we will run scheduled resyncs/checks once or twice a week depending on the data growth :)

Thanks
Andi Christiansen

> On 11/17/2020 2:53 PM Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
>  
> On 17/11/2020 11:51, Andi Christiansen wrote:
> > Hi all,
> > 
> > thanks for all the information, there was some interesting things
> > amount it..
> > 
> > I kept on going with rsync and ended up making a file with all top
> > level user directories and splitting them into chunks of 347 per
> > rsync session(total 42000 ish folders). yesterday we had only 14
> > sessions with 3000 folders in each and that was too much work for one
> > rsync session..
> 
> Unless you use something similar to my DB suggestion it is almost 
> inevitable that some of those rsync sessions are going to have issues 
> and you will have no way to track it or even know it has happened unless 
> you do a single final giant catchup/check rsync.
> 
> I should add that a copy of the sqlite DB is cover your backside 
> protection when a user pops up claiming that you failed to transfer one 
> of their vitally important files six months down the line and the old 
> system is turned off and scrapped.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From chris.schlipalius at pawsey.org.au  Tue Nov 17 23:17:18 2020
From: chris.schlipalius at pawsey.org.au (Chris Schlipalius)
Date: Wed, 18 Nov 2020 07:17:18 +0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
Message-ID: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>

So at my last job we used to rsync data between isilons across campus, and isilon to Windows File Cluster (and back).

I recommend using dry run to generate a list of files and then use this to run with rysnc.

This allows you also to be able to break up the transfer into batches, and check if files have changed before sync (say if your isilon files are not RO.

Also ensure you have a recent version of rsync that preserves extended attributes and check your ACLS.

 
A dry run example:

https://unix.stackexchange.com/a/261372

 
I always felt more comfortable having a list of files before a sync?.

 
Regards,

Chris Schlipalius

 
Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey Supercomputing Centre (CSIRO)

1 Bryce Avenue

Kensington  WA  6151

Australia

 
Tel  +61 8 6436 8815 

Email  chris.schlipalius at pawsey.org.au

Web  www.pawsey.org.au

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201118/c99c2fb1/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Wed Nov 18 11:48:52 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 18 Nov 2020 11:48:52 +0000
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
References: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
Message-ID: <7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>

On 17/11/2020 23:17, Chris Schlipalius wrote:
> So at my last job we used to rsync data between isilons across campus, 
> and isilon to Windows File Cluster (and back).
> 
> I recommend using dry run to generate a list of files and then use this 
> to run with rysnc.
> 
> This allows you also to be able to break up the transfer into batches, 
> and check if files have changed before sync (say if your isilon files 
> are not RO.
> 
> Also ensure you have a recent version of rsync that preserves extended 
> attributes and check your ACLS.
> 
> A dry run example:
> 
> https://unix.stackexchange.com/a/261372 
> 
> I always felt more comfortable having a list of files before a sync?.
> 

I would counsel in the strongest possible terms against that approach.

Basically you have to be assured that none of your file names have 
"wacky" characters in them, because handling "wacky" characters in file 
names is exceedingly difficult. I cannot stress how hard it is and the 
above example does not handle all "wacky" characters in file names.

So what do I mean by "wacky" characters. Well remember a file name can 
have just about anything in it on Linux with the exception of '/', and 
users especially when using a GUI, and even more so if they are Mac 
users can and do use what I will call "wacky" characters in their file 
names.

The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too. 
Then there is the use of the wildcard characters, especially '?' but 
also '*'.

Not too difficult to handle you might say. Right now deal with a file 
name with a newline character in it :-) Don't ask me how or why you even 
do that but let me assure you that I have seen them on more than one 
occasion. And now your dry run list is broken...

Not only that if you have a few hundred million files to move a list 
just becomes unwieldy anyway.

One thing I didn't mention is that I would run anything with in a screen 
(or tmux if that is your poison) and turn on logging.

For those interested I am in the process of cleaning up the script a bit 
and will post it somewhere in due course.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From andi at christiansen.xxx  Wed Nov 18 11:54:47 2020
From: andi at christiansen.xxx (Andi Christiansen)
Date: Wed, 18 Nov 2020 12:54:47 +0100 (CET)
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
References: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
	<7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
Message-ID: <1947408989.293430.1605700487095@privateemail.com>

Hi Jonathan,

i would be very interested in seeing your scripts when they are posted. Let me know where to get them!

Thanks a bunch!
Andi Christiansen

> On 11/18/2020 12:48 PM Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
>  
> On 17/11/2020 23:17, Chris Schlipalius wrote:
> > So at my last job we used to rsync data between isilons across campus, 
> > and isilon to Windows File Cluster (and back).
> > 
> > I recommend using dry run to generate a list of files and then use this 
> > to run with rysnc.
> > 
> > This allows you also to be able to break up the transfer into batches, 
> > and check if files have changed before sync (say if your isilon files 
> > are not RO.
> > 
> > Also ensure you have a recent version of rsync that preserves extended 
> > attributes and check your ACLS.
> > 
> > A dry run example:
> > 
> > https://unix.stackexchange.com/a/261372 
> > 
> > I always felt more comfortable having a list of files before a sync?.
> > 
> 
> I would counsel in the strongest possible terms against that approach.
> 
> Basically you have to be assured that none of your file names have 
> "wacky" characters in them, because handling "wacky" characters in file 
> names is exceedingly difficult. I cannot stress how hard it is and the 
> above example does not handle all "wacky" characters in file names.
> 
> So what do I mean by "wacky" characters. Well remember a file name can 
> have just about anything in it on Linux with the exception of '/', and 
> users especially when using a GUI, and even more so if they are Mac 
> users can and do use what I will call "wacky" characters in their file 
> names.
> 
> The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too. 
> Then there is the use of the wildcard characters, especially '?' but 
> also '*'.
> 
> Not too difficult to handle you might say. Right now deal with a file 
> name with a newline character in it :-) Don't ask me how or why you even 
> do that but let me assure you that I have seen them on more than one 
> occasion. And now your dry run list is broken...
> 
> Not only that if you have a few hundred million files to move a list 
> just becomes unwieldy anyway.
> 
> One thing I didn't mention is that I would run anything with in a screen 
> (or tmux if that is your poison) and turn on logging.
> 
> For those interested I am in the process of cleaning up the script a bit 
> and will post it somewhere in due course.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From cal.sawyer at framestore.com  Wed Nov 18 12:18:57 2020
From: cal.sawyer at framestore.com (Cal Sawyer)
Date: Wed, 18 Nov 2020 12:18:57 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 106, Issue 21
In-Reply-To: <mailman.1.1605700802.2437096.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1605700802.2437096.gpfsug-discuss@spectrumscale.org>
Message-ID: <CAD-C0PF=uzNL15-_hU1pQ-EWVktTGJrKSKBd358_oBvHeOGiGQ@mail.gmail.com>

Hello

Not a Scale user per se (we run a 3rdparty offshoot of Scale).  In a past
life managing Nexenta with OpenSolaris DR storage, I used nc/netcat for
bulk data sync, which is far more efficient than rsync.  With a bit of
planning and analysis of directory structure on the target, nc runs could
be parallelised as well, although not quite in the same way as running
rsync via parallels. Of course, nc has to be available on Isilon but i have
no experience with that platform. The only caveat in using nc is the amount
of change to the target data as copying progresses (is the target datastore
static or still seeing changes?). nc has to be followed with rsync to apply
any changes and/or verify the integrity of the bulk copy.

https://nakkaya.com/2009/04/15/using-netcat-for-file-transfers/

Are your Isilon and Scale systems located in the same network space?

I'd also suggest that if possible, add a quad-port 10GbE (or larger:
25/100GbE) NIC to your servers to gain a wider data path and conduct your
copy operations on those interfaces

regards

[image: Framestore]
Cal Sawyer ? Senior Systems Engineer   London ? New York ? Los Angeles ?
Chicago ? Montr?al ? Mumbai
28 Chancery Lane
London WC2A 1LB
[T] +44 (0)20 7344 8000
W3W: warm.soil.patio


On Wed, 18 Nov 2020 at 12:00, <gpfsug-discuss-request at spectrumscale.org>
wrote:

> Send gpfsug-discuss mailing list submissions to
>         gpfsug-discuss at spectrumscale.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
>         gpfsug-discuss-request at spectrumscale.org
>
> You can reach the person managing the list at
>         gpfsug-discuss-owner at spectrumscale.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>    1. Re: Migrate/syncronize data from Isilon to Scale  over NFS?
>       (Chris Schlipalius)
>    2. Re: Migrate/syncronize data from Isilon to Scale over NFS?
>       (Jonathan Buzzard)
>    3. Re: Migrate/syncronize data from Isilon to Scale over NFS?
>       (Andi Christiansen)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 18 Nov 2020 07:17:18 +0800
> From: Chris Schlipalius <chris.schlipalius at pawsey.org.au>
> To: <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to
>         Scale   over NFS?
> Message-ID: <578BE691-DEE4-43AC-97D2-546AC406E14A at pawsey.org.au>
> Content-Type: text/plain; charset="utf-8"
>
> So at my last job we used to rsync data between isilons across campus, and
> isilon to Windows File Cluster (and back).
>
> I recommend using dry run to generate a list of files and then use this to
> run with rysnc.
>
> This allows you also to be able to break up the transfer into batches, and
> check if files have changed before sync (say if your isilon files are not
> RO.
>
> Also ensure you have a recent version of rsync that preserves extended
> attributes and check your ACLS.
>
>
>
> A dry run example:
>
> https://unix.stackexchange.com/a/261372
>
>
>
> I always felt more comfortable having a list of files before a sync?.
>
>
>
>
>
>
>
> Regards,
>
> Chris Schlipalius
>
>
>
> Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey
> Supercomputing Centre (CSIRO)
>
> 1 Bryce Avenue
>
> Kensington  WA  6151
>
> Australia
>
>
>
> Tel  +61 8 6436 8815
>
> Email  chris.schlipalius at pawsey.org.au
>
> Web  www.pawsey.org.au
>
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20201118/c99c2fb1/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Wed, 18 Nov 2020 11:48:52 +0000
> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to
>         Scale over NFS?
> Message-ID: <7983810e-f51c-8cf7-a750-5c3285870bd4 at strath.ac.uk>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> On 17/11/2020 23:17, Chris Schlipalius wrote:
> > So at my last job we used to rsync data between isilons across campus,
> > and isilon to Windows File Cluster (and back).
> >
> > I recommend using dry run to generate a list of files and then use this
> > to run with rysnc.
> >
> > This allows you also to be able to break up the transfer into batches,
> > and check if files have changed before sync (say if your isilon files
> > are not RO.
> >
> > Also ensure you have a recent version of rsync that preserves extended
> > attributes and check your ACLS.
> >
> > A dry run example:
> >
> > https://unix.stackexchange.com/a/261372
> >
> > I always felt more comfortable having a list of files before a sync?.
> >
>
> I would counsel in the strongest possible terms against that approach.
>
> Basically you have to be assured that none of your file names have
> "wacky" characters in them, because handling "wacky" characters in file
> names is exceedingly difficult. I cannot stress how hard it is and the
> above example does not handle all "wacky" characters in file names.
>
> So what do I mean by "wacky" characters. Well remember a file name can
> have just about anything in it on Linux with the exception of '/', and
> users especially when using a GUI, and even more so if they are Mac
> users can and do use what I will call "wacky" characters in their file
> names.
>
> The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too.
> Then there is the use of the wildcard characters, especially '?' but
> also '*'.
>
> Not too difficult to handle you might say. Right now deal with a file
> name with a newline character in it :-) Don't ask me how or why you even
> do that but let me assure you that I have seen them on more than one
> occasion. And now your dry run list is broken...
>
> Not only that if you have a few hundred million files to move a list
> just becomes unwieldy anyway.
>
> One thing I didn't mention is that I would run anything with in a screen
> (or tmux if that is your poison) and turn on logging.
>
> For those interested I am in the process of cleaning up the script a bit
> and will post it somewhere in due course.
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 18 Nov 2020 12:54:47 +0100 (CET)
> From: Andi Christiansen <andi at christiansen.xxx>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
>         Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> Subject: Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to
>         Scale over NFS?
> Message-ID: <1947408989.293430.1605700487095 at privateemail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hi Jonathan,
>
> i would be very interested in seeing your scripts when they are posted.
> Let me know where to get them!
>
> Thanks a bunch!
> Andi Christiansen
>
> > On 11/18/2020 12:48 PM Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> wrote:
> >
> >
> > On 17/11/2020 23:17, Chris Schlipalius wrote:
> > > So at my last job we used to rsync data between isilons across campus,
> > > and isilon to Windows File Cluster (and back).
> > >
> > > I recommend using dry run to generate a list of files and then use
> this
> > > to run with rysnc.
> > >
> > > This allows you also to be able to break up the transfer into batches,
> > > and check if files have changed before sync (say if your isilon files
> > > are not RO.
> > >
> > > Also ensure you have a recent version of rsync that preserves extended
> > > attributes and check your ACLS.
> > >
> > > A dry run example:
> > >
> > > https://unix.stackexchange.com/a/261372
> > >
> > > I always felt more comfortable having a list of files before a sync?.
> > >
> >
> > I would counsel in the strongest possible terms against that approach.
> >
> > Basically you have to be assured that none of your file names have
> > "wacky" characters in them, because handling "wacky" characters in file
> > names is exceedingly difficult. I cannot stress how hard it is and the
> > above example does not handle all "wacky" characters in file names.
> >
> > So what do I mean by "wacky" characters. Well remember a file name can
> > have just about anything in it on Linux with the exception of '/', and
> > users especially when using a GUI, and even more so if they are Mac
> > users can and do use what I will call "wacky" characters in their file
> > names.
> >
> > The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too.
> > Then there is the use of the wildcard characters, especially '?' but
> > also '*'.
> >
> > Not too difficult to handle you might say. Right now deal with a file
> > name with a newline character in it :-) Don't ask me how or why you even
> > do that but let me assure you that I have seen them on more than one
> > occasion. And now your dry run list is broken...
> >
> > Not only that if you have a few hundred million files to move a list
> > just becomes unwieldy anyway.
> >
> > One thing I didn't mention is that I would run anything with in a screen
> > (or tmux if that is your poison) and turn on logging.
> >
> > For those interested I am in the process of cleaning up the script a bit
> > and will post it somewhere in due course.
> >
> >
> > JAB.
> >
> > --
> > Jonathan A. Buzzard                         Tel: +44141-5483420
> > HPC System Administrator, ARCHIE-WeSt.
> > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 106, Issue 21
> ***********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201118/2fc06f24/attachment-0002.htm>

From valdis.kletnieks at vt.edu  Wed Nov 18 23:05:40 2020
From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)
Date: Wed, 18 Nov 2020 18:05:40 -0500
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
In-Reply-To: <7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
References: <578BE691-DEE4-43AC-97D2-546AC406E14A@pawsey.org.au>
	<7983810e-f51c-8cf7-a750-5c3285870bd4@strath.ac.uk>
Message-ID: <39863.1605740740@turing-police>

On Wed, 18 Nov 2020 11:48:52 +0000, Jonathan Buzzard said:

> So what do I mean by "wacky" characters. Well remember a file name can
> have just about anything in it on Linux with the exception of '/', and

You want to see some fireworks?  At least at one time, it was possible to use
a file system debugger that's all too trusting of hexadecimal input and create
a directory entry of '../'. Let's just say that fs/namei.c was also far too trusting,
and fsck was more than happy to make *different* errors than the kernel was....

> The obvious ones are spaces, but it's not just ASCII 0x20, but tabs too.
> Then there is the use of the wildcard characters, especially '?' but
> also '*'.

Don't forget ESC, CR, LF, backticks, forward ticks, semicolons, and pretty much
anything else that will give a shell indigestion. SQL isn't the only thing prone to
injection attacks.. :)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201118/3a3689b5/attachment-0002.sig>

From chris.schlipalius at pawsey.org.au  Wed Nov 18 23:57:26 2020
From: chris.schlipalius at pawsey.org.au (Chris Schlipalius)
Date: Thu, 19 Nov 2020 07:57:26 +0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
In-Reply-To: <B81D7B2F-512E-4394-8D48-D6289B550432@pawsey.org.au>
References: <B81D7B2F-512E-4394-8D48-D6289B550432@pawsey.org.au>
Message-ID: <6288DF78-A9DF-4BE9-B166-4478EF8C2A20@pawsey.org.au>

?  I would counsel in the strongest possible terms against that approach.

?  Basically you have to be assured that none of your file names have "wacky" characters in them, because handling "wacky" characters in file

?  names is exceedingly difficult. I cannot stress how hard it is and the above example does not handle all "wacky" characters in file names.

 
Well that?s indeed another kettle of fish if you have irregular/special naming of files, no I didn?t cover that and if you have millions of files, yes a list would be unwieldy, then I would be tarring up dirs. before moving? and then untarring on GPFS ?or breaking up the list into sets or sub lists. 

If you have these wacky types of file names well there are fixes as in the rsync manpages? yes not easy but possible..

 
Ie

 
1.       -s, --protect-args

 
2.       As per usual you can escape the spaces, or substitute for spaces. rsync -avuz user at server1.com:"${remote_path// /\\ }" .

 
3.       Single quote the file name and path inside double quotes.

 
?  One thing I didn't mention is that I would run anything with in a screen (or tmux if that is your poison) and turn on logging.

 
Absolutely agree?

 
?  For those interested I am in the process of cleaning up the script a bit and will post it somewhere in due course.

?  JAB.

 
Would be interesting to see?.

 
I?ve also had success on GPFS with DCP and possibly this would be another option 

 
Regards,

Chris Schlipalius

 
Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey Supercomputing Centre (CSIRO)

1 Bryce Avenue

Kensington  WA  6151

Australia

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/e47e1eb4/attachment-0002.htm>

From marc.caubet at psi.ch  Thu Nov 19 15:34:39 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 15:34:39 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
	filesystem
Message-ID: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>

Hi,


I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.

I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.


Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?

Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.


Thanks a lot,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/32027b1f/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Thu Nov 19 15:49:30 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 15:49:30 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
Message-ID: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>

On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> Hi,
> 
> 
> I have a filesystem holding many projects (i.e., mounted under 
> /projects), each project is managed with filesets.
> 
> I have a new big project which should be placed on a separate filesystem 
> (blocksize, replication policy, etc. will be different, and subprojects 
> of it will be managed with filesets). Ideally, this filesystem should be 
> mounted in /projects/newproject.
> 
> 
> Technically, mounting a filesystem on top of an existing filesystem 
> should be possible, but, is this discouraged for any reason? How GPFS 
> would behave with that and is there a technical reason for avoiding this 
> setup?
> 
> Another alternative would be independent mount point + symlink, but I 
> really would prefer to avoid symlinks.

This has all the hallmarks of either a Windows admin or a newbie 
Linux/Unix admin :-)

Simply put /projects is mounted on top of whatever file system is 
providing the root file system in the first place LOL.

Linux/Unix and/or GPFS does not give a monkeys about mounting another 
file system *ANYWHERE* in it period because there is no other way of 
doing it.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From spectrumscale at kiranghag.com  Thu Nov 19 16:40:47 2020
From: spectrumscale at kiranghag.com (KG)
Date: Thu, 19 Nov 2020 22:10:47 +0530
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
	filesystem
In-Reply-To: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
Message-ID: <CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>

You can also set mount priority on filesystems so that gpfs can try to
mount them in order...parent first

On Thu, Nov 19, 2020, 21:19 Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:

> On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> > Hi,
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under
> > /projects), each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects
> > of it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem
> > should be possible, but, is this discouraged for any reason? How GPFS
> > would behave with that and is there a technical reason for avoiding this
> > setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
>
> This has all the hallmarks of either a Windows admin or a newbie
> Linux/Unix admin :-)
>
> Simply put /projects is mounted on top of whatever file system is
> providing the root file system in the first place LOL.
>
> Linux/Unix and/or GPFS does not give a monkeys about mounting another
> file system *ANYWHERE* in it period because there is no other way of
> doing it.
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/2709ca22/attachment-0002.htm>

From S.J.Thompson at bham.ac.uk  Thu Nov 19 16:42:07 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 19 Nov 2020 16:42:07 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
Message-ID: <2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>

If it is a remote cluster mount from your clients (hopefully!), you might want to look at priority to order mounting of the file-systems. I don?t know what would happen if the overmounted file-system went away, you would likely want to test.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "marc.caubet at psi.ch" <marc.caubet at psi.ch>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 November 2020 at 15:39
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem


Hi,


I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.

I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.


Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?

Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.


Thanks a lot,

Marc
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/21c0ec6e/attachment-0002.htm>

From marc.caubet at psi.ch  Thu Nov 19 16:48:07 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 16:48:07 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>,
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
Message-ID: <c9e651f7f37445dc86fedb56809db871@psi.ch>

Hi Jonathan,


thanks for sharing your opinions. In the sentence "Technically, mounting a filesystem on top of an existing filesystem should be possible" , I guess I was referring to that...

I was concerned about other technical reasons, such like how would this would affect GPFS policies, or how to properly proceed with proper mounting, or any other technical reasons to consider.

For the GPFS policies, I usually applied some of the existing GPFS policies based on directories, but after checking I realized that one can manage via device (never used policies in that way, at least for the simple but necessary use cases I have on the existing filesystems).


Thanks a lot,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
Sent: Thursday, November 19, 2020 4:49:30 PM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> Hi,
>
>
> I have a filesystem holding many projects (i.e., mounted under
> /projects), each project is managed with filesets.
>
> I have a new big project which should be placed on a separate filesystem
> (blocksize, replication policy, etc. will be different, and subprojects
> of it will be managed with filesets). Ideally, this filesystem should be
> mounted in /projects/newproject.
>
>
> Technically, mounting a filesystem on top of an existing filesystem
> should be possible, but, is this discouraged for any reason? How GPFS
> would behave with that and is there a technical reason for avoiding this
> setup?
>
> Another alternative would be independent mount point + symlink, but I
> really would prefer to avoid symlinks.

This has all the hallmarks of either a Windows admin or a newbie
Linux/Unix admin :-)

Simply put /projects is mounted on top of whatever file system is
providing the root file system in the first place LOL.

Linux/Unix and/or GPFS does not give a monkeys about mounting another
file system *ANYWHERE* in it period because there is no other way of
doing it.

JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/ea5d2595/attachment-0002.htm>

From marc.caubet at psi.ch  Thu Nov 19 17:01:37 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 17:01:37 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>,
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
Message-ID: <22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>

Hi Simon,


that's a very good point, thanks a lot :) I have it remotely mounted on a client cluster, so I will consider priorities when mounting the filesystems with remote cluster mount. That's very useful.

Also, as far as I saw, same approach can be also applied to local mounts (via mmchfs) during daemon startup with the same option --mount-priority.


Thanks a lot for the hints, these are very useful. I'll test that.


Cheers,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <S.J.Thompson at bham.ac.uk>
Sent: Thursday, November 19, 2020 5:42:07 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

If it is a remote cluster mount from your clients (hopefully!), you might want to look at priority to order mounting of the file-systems. I don?t know what would happen if the overmounted file-system went away, you would likely want to test.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "marc.caubet at psi.ch" <marc.caubet at psi.ch>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 November 2020 at 15:39
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem


Hi,


I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.

I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.


Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?

Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.


Thanks a lot,

Marc
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/f8d78653/attachment-0002.htm>

From janfrode at tanso.net  Thu Nov 19 17:34:05 2020
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Thu, 19 Nov 2020 18:34:05 +0100
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
	filesystem
In-Reply-To: <22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
Message-ID: <CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>

I would not mount a GPFS filesystem within a GPFS filesystem. Technically
it should work, but I?d expect it to cause surprises if ever the lower
filesystem experienced problems. Alone, a filesystem might recover
automatically by remounting. But if there?s another filesystem mounted
within, I expect it will be a problem..

Much better to use symlinks.


  -jf

tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
marc.caubet at psi.ch>:

> Hi Simon,
>
>
> that's a very good point, thanks a lot :) I have it remotely mounted on a
> client cluster, so I will consider priorities when mounting the filesystems
> with remote cluster mount. That's very useful.
>
> Also, as far as I saw, same approach can be also applied to local mounts
> (via mmchfs) during daemon startup with the same option --mount-priority.
>
>
> Thanks a lot for the hints, these are very useful. I'll test that.
>
>
> Cheers,
>
> Marc
> _________________________________________________________
> Paul Scherrer Institut
> High Performance Computing & Emerging Technologies
> Marc Caubet Serrabou
> Building/Room: OHSA/014
> Forschungsstrasse, 111
> 5232 Villigen PSI
> Switzerland
>
> Telephone: +41 56 310 46 67
> E-Mail: marc.caubet at psi.ch
> ------------------------------
> *From:* gpfsug-discuss-bounces at spectrumscale.org <
> gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> S.J.Thompson at bham.ac.uk>
> *Sent:* Thursday, November 19, 2020 5:42:07 PM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> filesystem
>
>
> If it is a remote cluster mount from your clients (hopefully!), you might
> want to look at priority to order mounting of the file-systems. I don?t
> know what would happen if the overmounted file-system went away, you would
> likely want to test.
>
>
>
> Simon
>
>
>
> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> marc.caubet at psi.ch" <marc.caubet at psi.ch>
> *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> gpfsug-discuss at spectrumscale.org>
> *Date: *Thursday, 19 November 2020 at 15:39
> *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> >
> *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> filesystem
>
>
>
> Hi,
>
>
>
> I have a filesystem holding many projects (i.e., mounted under /projects),
> each project is managed with filesets.
>
> I have a new big project which should be placed on a separate filesystem
> (blocksize, replication policy, etc. will be different, and subprojects of
> it will be managed with filesets). Ideally, this filesystem should be
> mounted in /projects/newproject.
>
>
>
> Technically, mounting a filesystem on top of an existing filesystem should
> be possible, but, is this discouraged for any reason? How GPFS would behave
> with that and is there a technical reason for avoiding this setup?
>
> Another alternative would be independent mount point + symlink, but I
> really would prefer to avoid symlinks.
>
>
>
> Thanks a lot,
>
> Marc
>
> _________________________________________________________
> Paul Scherrer Institut
> High Performance Computing & Emerging Technologies
> Marc Caubet Serrabou
> Building/Room: OHSA/014
>
> Forschungsstrasse, 111
>
> 5232 Villigen PSI
> Switzerland
>
> Telephone: +41 56 310 46 67
> E-Mail: marc.caubet at psi.ch
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/d2841588/attachment-0002.htm>

From skylar2 at uw.edu  Thu Nov 19 17:38:07 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Thu, 19 Nov 2020 09:38:07 -0800
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
Message-ID: <20201119173807.kormirvbweqs3un6@thargelion>

Agreed, not sure how the GPFS tools would react. An alternative to symlinks
would be bind mounts, if for some reason a tool doesn't behave properly
with a symlink in the path.

On Thu, Nov 19, 2020 at 06:34:05PM +0100, Jan-Frode Myklebust wrote:
> I would not mount a GPFS filesystem within a GPFS filesystem. Technically
> it should work, but I???d expect it to cause surprises if ever the lower
> filesystem experienced problems. Alone, a filesystem might recover
> automatically by remounting. But if there???s another filesystem mounted
> within, I expect it will be a problem..
> 
> Much better to use symlinks.
> 
> 
> 
>   -jf
> 
> tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
> marc.caubet at psi.ch>:
> 
> > Hi Simon,
> >
> >
> > that's a very good point, thanks a lot :) I have it remotely mounted on a
> > client cluster, so I will consider priorities when mounting the filesystems
> > with remote cluster mount. That's very useful.
> >
> > Also, as far as I saw, same approach can be also applied to local mounts
> > (via mmchfs) during daemon startup with the same option --mount-priority.
> >
> >
> > Thanks a lot for the hints, these are very useful. I'll test that.
> >
> >
> > Cheers,
> >
> > Marc
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> > Forschungsstrasse, 111
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > ------------------------------
> > *From:* gpfsug-discuss-bounces at spectrumscale.org <
> > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> > S.J.Thompson at bham.ac.uk>
> > *Sent:* Thursday, November 19, 2020 5:42:07 PM
> > *To:* gpfsug main discussion list
> > *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> > If it is a remote cluster mount from your clients (hopefully!), you might
> > want to look at priority to order mounting of the file-systems. I don???t
> > know what would happen if the overmounted file-system went away, you would
> > likely want to test.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> > marc.caubet at psi.ch" <marc.caubet at psi.ch>
> > *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> > gpfsug-discuss at spectrumscale.org>
> > *Date: *Thursday, 19 November 2020 at 15:39
> > *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> > >
> > *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under /projects),
> > each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects of
> > it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem should
> > be possible, but, is this discouraged for any reason? How GPFS would behave
> > with that and is there a technical reason for avoiding this setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
> >
> >
> >
> > Thanks a lot,
> >
> > Marc
> >
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> >
> > Forschungsstrasse, 111
> >
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From jonathan.buzzard at strath.ac.uk  Thu Nov 19 18:08:13 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 18:08:13 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
Message-ID: <acb8a49b-1537-b35a-8ff0-2ae63a3122a6@strath.ac.uk>

On 19/11/2020 17:34, Jan-Frode Myklebust wrote:
> 
> I would not mount a GPFS filesystem within a GPFS filesystem. 
> Technically it should work, but I?d expect it to cause surprises if ever 
> the lower filesystem experienced problems. Alone, a filesystem might 
> recover automatically by remounting. But if there?s another filesystem 
> mounted within, I expect it will be a problem..
> 
> Much better to use symlinks.
> 

Think about that for a minute...


I guess if you are worried about /projects going away (which would 
suggest something really bad has happened anyway) would be to mount the 
GPFS file system that is currently holding /projects somewhere else and 
then bind mount everything into /projects

At this point I would note that bind mounts are much better than 
symlinks which suck for this sort of application.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From jonathan.buzzard at strath.ac.uk  Thu Nov 19 18:12:03 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 18:12:03 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
	<CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
Message-ID: <2f789d09-3704-2d41-ef2a-953de178dce2@strath.ac.uk>

On 19/11/2020 16:40, KG wrote:
> You can also set mount priority on filesystems so that gpfs can try to 
> mount them in order...parent first
> 

One of the things that systemd brings to the table

https://github.com/systemd/systemd/commit/3519d230c8bafe834b2dac26ace49fcfba139823


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From marc.caubet at psi.ch  Thu Nov 19 18:13:08 2020
From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI))
Date: Thu, 19 Nov 2020 18:13:08 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <20201119173807.kormirvbweqs3un6@thargelion>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>,
	<20201119173807.kormirvbweqs3un6@thargelion>
Message-ID: <0963457f2dfd418eabf8e1681ef2f801@psi.ch>


Hi all,


thanks a lot for your comments. Agreed, I better avoid it for now. I was concerned about how GPFS would behave in such case. For production I will take the safe route, but, just out of curiosity, I'll give it a try on a couple of test filesystems.


Thanks a lot for your help, it was very helpful,

Marc

_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Skylar Thompson <skylar2 at uw.edu>
Sent: Thursday, November 19, 2020 6:38:07 PM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

Agreed, not sure how the GPFS tools would react. An alternative to symlinks
would be bind mounts, if for some reason a tool doesn't behave properly
with a symlink in the path.

On Thu, Nov 19, 2020 at 06:34:05PM +0100, Jan-Frode Myklebust wrote:
> I would not mount a GPFS filesystem within a GPFS filesystem. Technically
> it should work, but I???d expect it to cause surprises if ever the lower
> filesystem experienced problems. Alone, a filesystem might recover
> automatically by remounting. But if there???s another filesystem mounted
> within, I expect it will be a problem..
>
> Much better to use symlinks.
>
>
>
>   -jf
>
> tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
> marc.caubet at psi.ch>:
>
> > Hi Simon,
> >
> >
> > that's a very good point, thanks a lot :) I have it remotely mounted on a
> > client cluster, so I will consider priorities when mounting the filesystems
> > with remote cluster mount. That's very useful.
> >
> > Also, as far as I saw, same approach can be also applied to local mounts
> > (via mmchfs) during daemon startup with the same option --mount-priority.
> >
> >
> > Thanks a lot for the hints, these are very useful. I'll test that.
> >
> >
> > Cheers,
> >
> > Marc
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> > Forschungsstrasse, 111
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > ------------------------------
> > *From:* gpfsug-discuss-bounces at spectrumscale.org <
> > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> > S.J.Thompson at bham.ac.uk>
> > *Sent:* Thursday, November 19, 2020 5:42:07 PM
> > *To:* gpfsug main discussion list
> > *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> > If it is a remote cluster mount from your clients (hopefully!), you might
> > want to look at priority to order mounting of the file-systems. I don???t
> > know what would happen if the overmounted file-system went away, you would
> > likely want to test.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> > marc.caubet at psi.ch" <marc.caubet at psi.ch>
> > *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> > gpfsug-discuss at spectrumscale.org>
> > *Date: *Thursday, 19 November 2020 at 15:39
> > *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> > >
> > *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under /projects),
> > each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects of
> > it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem should
> > be possible, but, is this discouraged for any reason? How GPFS would behave
> > with that and is there a technical reason for avoiding this setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
> >
> >
> >
> > Thanks a lot,
> >
> > Marc
> >
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> >
> > Forschungsstrasse, 111
> >
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/b2117811/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Thu Nov 19 18:32:39 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Thu, 19 Nov 2020 18:32:39 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0963457f2dfd418eabf8e1681ef2f801@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
	<20201119173807.kormirvbweqs3un6@thargelion>
	<0963457f2dfd418eabf8e1681ef2f801@psi.ch>
Message-ID: <5b8edf06-a4ab-a39e-5a02-86fd7565b90a@strath.ac.uk>

On 19/11/2020 18:13, Caubet Serrabou Marc (PSI) wrote:
> 
> Hi all,
> 
> 
> thanks a lot for your comments. Agreed, I?better avoid it for now. I was 
> concerned about how GPFS would behave in such case. For production I 
> will take the safe route, but, just out of curiosity, I'll give it a try 
> on a couple of test filesystems.
> 

Don't use symlinks there is a range of applications that will break and 
you will confuse the hell out of your users as the fact you are not 
under /projects/new but /random/new is not hidden.

Besides which if the symlink goes away because /projects goes away then 
it is all a bust anyway.

If you are worried about /projects going away then the best plan is to 
mount the GPFS file systems somewhere else and then bind mount the 
directories into /projects on all the machines where they are mounted.

GPFS is quite happy with this. We bind mount /gpfs/users into /users and 
/gpfs/software into /opt/software by default. In the past I have bind 
mounted random paths for every user (hundred plus) into /home


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From novosirj at rutgers.edu  Thu Nov 19 18:34:09 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Thu, 19 Nov 2020 18:34:09 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
Message-ID: <A8A32368-1B23-4EA6-B91C-C4D9ACB646AC@rutgers.edu>

> On Nov 19, 2020, at 10:49 AM, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
>> Hi,
>> I have a filesystem holding many projects (i.e., mounted under /projects), each project is managed with filesets.
>> I have a new big project which should be placed on a separate filesystem (blocksize, replication policy, etc. will be different, and subprojects of it will be managed with filesets). Ideally, this filesystem should be mounted in /projects/newproject.
>> Technically, mounting a filesystem on top of an existing filesystem should be possible, but, is this discouraged for any reason? How GPFS would behave with that and is there a technical reason for avoiding this setup?
>> Another alternative would be independent mount point + symlink, but I really would prefer to avoid symlinks.
> 
> This has all the hallmarks of either a Windows admin or a newbie Linux/Unix admin :-)
> 
> Simply put /projects is mounted on top of whatever file system is providing the root file system in the first place LOL.
> 
> Linux/Unix and/or GPFS does not give a monkeys about mounting another file system *ANYWHERE* in it period because there is no other way of doing it.

Some others have said, but I disagree. It wasn?t that long ago that GPFS acted really screwy with systemd because it did something in a way other than Linux expected. As it is now, their devices are not /dev/whatever or server:/wherever like just about every other filesystem type. Not unreasonable to believe it would ?act funny? compared to other FS. 

I like GPFS a lot, but this is not one of my favorite characteristics of it.

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

From UWEFALKE at de.ibm.com  Thu Nov 19 19:18:41 2020
From: UWEFALKE at de.ibm.com (Uwe Falke)
Date: Thu, 19 Nov 2020 20:18:41 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?Mounting_filesystem_on_top_of_an_exist?=
 =?utf-8?q?ing=09filesystem?=
In-Reply-To: <CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch><0b1c1d7f-580c-b5f4-1f36-351130adf412@strath.ac.uk>
	<CAA-1hNYUWi_r8SzY2ZDeJajSFWawAFcDxPTe5+86EBiG840-Ew@mail.gmail.com>
Message-ID: <OF751853B8.9C63EAA6-ONC1258625.0069CBE9-C1258625.006A14A0@notes.na.collabserv.com>

Just the risk your parent system dies which will block your access to the 
child file system mounted on a mount point within. 
If that is not  bothering , go ahead mount stacks . As for the symling 
though : it  is also gone if the parent dies :-). 

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   KG <spectrumscale at kiranghag.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   19/11/2020 17:41
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Mounting filesystem on top 
of an existing  filesystem
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


You can also set mount priority on filesystems so that gpfs can try to 
mount them in order...parent first

On Thu, Nov 19, 2020, 21:19 Jonathan Buzzard <
jonathan.buzzard at strath.ac.uk> wrote:
On 19/11/2020 15:34, Caubet Serrabou Marc (PSI) wrote:
> Hi,
> 
> 
> I have a filesystem holding many projects (i.e., mounted under 
> /projects), each project is managed with filesets.
> 
> I have a new big project which should be placed on a separate filesystem 

> (blocksize, replication policy, etc. will be different, and subprojects 
> of it will be managed with filesets). Ideally, this filesystem should be 

> mounted in /projects/newproject.
> 
> 
> Technically, mounting a filesystem on top of an existing filesystem 
> should be possible, but, is this discouraged for any reason? How GPFS 
> would behave with that and is there a technical reason for avoiding this 

> setup?
> 
> Another alternative would be independent mount point + symlink, but I 
> really would prefer to avoid symlinks.

This has all the hallmarks of either a Windows admin or a newbie 
Linux/Unix admin :-)

Simply put /projects is mounted on top of whatever file system is 
providing the root file system in the first place LOL.

Linux/Unix and/or GPFS does not give a monkeys about mounting another 
file system *ANYWHERE* in it period because there is no other way of 
doing it.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


From S.J.Thompson at bham.ac.uk  Thu Nov 19 19:37:52 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 19 Nov 2020 19:37:52 +0000
Subject: [gpfsug-discuss] Mounting filesystem on top of an existing
 filesystem
In-Reply-To: <0963457f2dfd418eabf8e1681ef2f801@psi.ch>
References: <15ad70a37e3e4ec7a5907480a9318b93@psi.ch>
	<2593D6A2-D0D9-4DB1-8A27-5163B8BF34A6@bham.ac.uk>
	<22fcf84afa274cfa8aaa8fc4d4be62bb@psi.ch>
	<CAHwPatj-GsmUT=8tmn6MU8dHOSHczczT_HO9BiP_GZ37q7+2wg@mail.gmail.com>
	<20201119173807.kormirvbweqs3un6@thargelion>
	<0963457f2dfd418eabf8e1681ef2f801@psi.ch>
Message-ID: <738D41AC-6A07-453E-A2D1-C1882BE52EDC@bham.ac.uk>

My understanding was that this was perfectly acceptable in a GPFS system. i.e. mounting parts of file-systems in others. It has been suggested to us as a way of using different vendor GPFS systems (e.g. an ESS with someone elses) as a way of working round the licensing rules about ESS and anything else, but still giving a single user ?name space?. We didn?t go that route, and of course I might have misunderstood what was being suggested.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "marc.caubet at psi.ch" <marc.caubet at psi.ch>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 November 2020 at 18:13
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem


Hi all,


thanks a lot for your comments. Agreed, I better avoid it for now. I was concerned about how GPFS would behave in such case. For production I will take the safe route, but, just out of curiosity, I'll give it a try on a couple of test filesystems.


Thanks a lot for your help, it was very helpful,

Marc
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OHSA/014
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.caubet at psi.ch
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Skylar Thompson <skylar2 at uw.edu>
Sent: Thursday, November 19, 2020 6:38:07 PM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Mounting filesystem on top of an existing filesystem

Agreed, not sure how the GPFS tools would react. An alternative to symlinks
would be bind mounts, if for some reason a tool doesn't behave properly
with a symlink in the path.

On Thu, Nov 19, 2020 at 06:34:05PM +0100, Jan-Frode Myklebust wrote:
> I would not mount a GPFS filesystem within a GPFS filesystem. Technically
> it should work, but I???d expect it to cause surprises if ever the lower
> filesystem experienced problems. Alone, a filesystem might recover
> automatically by remounting. But if there???s another filesystem mounted
> within, I expect it will be a problem..
>
> Much better to use symlinks.
>
>
>
>   -jf
>
> tor. 19. nov. 2020 kl. 18:01 skrev Caubet Serrabou Marc (PSI) <
> marc.caubet at psi.ch>:
>
> > Hi Simon,
> >
> >
> > that's a very good point, thanks a lot :) I have it remotely mounted on a
> > client cluster, so I will consider priorities when mounting the filesystems
> > with remote cluster mount. That's very useful.
> >
> > Also, as far as I saw, same approach can be also applied to local mounts
> > (via mmchfs) during daemon startup with the same option --mount-priority.
> >
> >
> > Thanks a lot for the hints, these are very useful. I'll test that.
> >
> >
> > Cheers,
> >
> > Marc
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> > Forschungsstrasse, 111
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > ------------------------------
> > *From:* gpfsug-discuss-bounces at spectrumscale.org <
> > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <
> > S.J.Thompson at bham.ac.uk>
> > *Sent:* Thursday, November 19, 2020 5:42:07 PM
> > *To:* gpfsug main discussion list
> > *Subject:* Re: [gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> > If it is a remote cluster mount from your clients (hopefully!), you might
> > want to look at priority to order mounting of the file-systems. I don???t
> > know what would happen if the overmounted file-system went away, you would
> > likely want to test.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> > marc.caubet at psi.ch" <marc.caubet at psi.ch>
> > *Reply to: *"gpfsug-discuss at spectrumscale.org" <
> > gpfsug-discuss at spectrumscale.org>
> > *Date: *Thursday, 19 November 2020 at 15:39
> > *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> > >
> > *Subject: *[gpfsug-discuss] Mounting filesystem on top of an existing
> > filesystem
> >
> >
> >
> > Hi,
> >
> >
> >
> > I have a filesystem holding many projects (i.e., mounted under /projects),
> > each project is managed with filesets.
> >
> > I have a new big project which should be placed on a separate filesystem
> > (blocksize, replication policy, etc. will be different, and subprojects of
> > it will be managed with filesets). Ideally, this filesystem should be
> > mounted in /projects/newproject.
> >
> >
> >
> > Technically, mounting a filesystem on top of an existing filesystem should
> > be possible, but, is this discouraged for any reason? How GPFS would behave
> > with that and is there a technical reason for avoiding this setup?
> >
> > Another alternative would be independent mount point + symlink, but I
> > really would prefer to avoid symlinks.
> >
> >
> >
> > Thanks a lot,
> >
> > Marc
> >
> > _________________________________________________________
> > Paul Scherrer Institut
> > High Performance Computing & Emerging Technologies
> > Marc Caubet Serrabou
> > Building/Room: OHSA/014
> >
> > Forschungsstrasse, 111
> >
> > 5232 Villigen PSI
> > Switzerland
> >
> > Telephone: +41 56 310 46 67
> > E-Mail: marc.caubet at psi.ch
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201119/b648e4d1/attachment-0002.htm>

From Kamil.Czauz at Squarepoint-Capital.com  Fri Nov 20 19:13:41 2020
From: Kamil.Czauz at Squarepoint-Capital.com (Czauz, Kamil)
Date: Fri, 20 Nov 2020 19:13:41 +0000
Subject: [gpfsug-discuss] Poor client performance with high
	cpu	usage	of	mmfsd process
In-Reply-To: <OF2441E9DD.746C6053-ONC1258622.00456CFA-C1258622.004B9E2D@notes.na.collabserv.com>
References: <BL0PR07MB3891EB0A338A94D0930EA702ACE80@BL0PR07MB3891.namprd07.prod.outlook.com><OFF51CAAC3.80A07EA7-ONC125861E.000537E1-C125861E.000AB0EB@notes.na.collabserv.com><BL0PR07MB38913EED5831143A57C835D7ACE60@BL0PR07MB3891.namprd07.prod.outlook.com><OF55FB4CA5.56C87B84-ONC125861F.0031EC25-C125861F.003362A0@LocalDomain><OF2B1993ED.0E5419A3-ONC125861F.00340B89-C125861F.0034D4D1@notes.na.collabserv.com>
	<BL0PR07MB389140149570BFBBAEEE5535ACE60@BL0PR07MB3891.namprd07.prod.outlook.com>
	<OF2441E9DD.746C6053-ONC1258622.00456CFA-C1258622.004B9E2D@notes.na.collabserv.com>
Message-ID: <BL0PR07MB38910770F6FD0D1D7E4626AEACFF0@BL0PR07MB3891.namprd07.prod.outlook.com>

Here is the output of waiters on 2 hosts that were having the issue today:

HOST 1
 [2020-11-20 09:07:53 root at nyzls149m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===
Waiting 0.0035 sec since 09:08:07, monitored, thread 135497 FileBlockReadFetchHandlerThread: on ThCond 0x7F615C152468 (MsgRecordCondvar), reason 'RPC wait' for NSD I/O completion on node 10.64.44.180 <c0n105>
Waiting 0.0036 sec since 09:08:07, monitored, thread 139228 PrefetchWorkerThread: on ThCond 0x7F627000D5D8 (MsgRecordCondvar), reason 'RPC wait' for NSD I/O completion on node 10.64.44.181 <c0n106>

[2020-11-20 09:08:07 root at nyzls149m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===

HOST 2
[2020-11-20 09:08:49 root at nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===
Waiting 0.0034 sec since 09:08:50, monitored, thread 345318 SharedHashTabFetchHandlerThread: on ThCond 0x7F049C001F08 (MsgRecordCondvar), reason 'RPC wait' for NSD I/O completion on node 10.64.44.133 <c1n2>

[2020-11-20 09:08:50 root at nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===

[2020-11-20 09:08:52 root at nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===


You can see the waiters go from 0 to 1-2 , but they are hardly blocking.

Yes there are separate pools for metadata for all of the filesystems here.


I did another trace today when the problem was happening - this time I was able to get a longer trace using the following command:
/usr/lpp/mmfs/bin/mmtracectl --start --trace=io --trace-file-size=512M --tracedev-write-mode=blocking --tracedev-buffer-size=64M -N nyzls149m


This is what the trsum output looks like:

Elapsed trace time:                                   62.412092000 seconds
Elapsed trace time from first VFS call to last:       62.412091999
Time idle between VFS calls:                           0.002913000 seconds

Operations stats:             total time(s)  count    avg-usecs        wait-time(s)    avg-usecs
  readpage                   0.003487000         9      387.444
  rdwr                       0.273721000       183     1495.743
  read_inode2                0.007304000       325       22.474
  follow_link                0.013952000        58      240.552
  pagein                     0.025974000        66      393.545
  getattr                    0.002792000        26      107.385
  revalidate                 0.009406000      2172        4.331
  create                    66.194479000         3 22064826.333
  open                       1.725505000        88    19608.011
  unlink                    18.685099000         1 18685099.000
  setattr                    0.011627000        14      830.500
  lookup                  2379.215514000       502  4739473.135
  delete_inode               0.015553000       328       47.418
  rename                    98.099073000         5 19619814.600
  release                    0.050574000        89      568.247
  permission                 0.007454000        73      102.110
  getxattr                   0.002346000        32       73.312
  statfs                     0.000081000         6       13.500
  mmap                       0.049809000        18     2767.167
  removexattr                0.000827000        14       59.071
  llseek                     0.000441000        47        9.383
  readdir                    0.002667000        34       78.441
Ops      4093 Secs      62.409178999  Ops/Sec       65.583

MaxFilesToCache is set to 12000 :
[common]
maxFilesToCache 12000


I only see gpfs_i_lookup in the tracefile, no gpfs_v_lookups
#  grep gpfs_i_lookup trcrpt.2020-11-20_09.20.38.283986.nyzls149m |wc -l
1097

They mostly look like this -

  62.346560 238895 TRACE_VNODE: gpfs_i_lookup exit: new inode 0xFFFF922178971A40 iNum 21980113 (0x14F63D1) cnP 0xFFFF922178971C88 retP 0x0 code 0 rc 0
  62.346955 238895 TRACE_VNODE: gpfs_i_lookup enter: diP 0xFFFF91A8A4991E00 dentryP 0xFFFF92C545A93500 name '20170323.txt' d_flags 0x80 d_count 1 unhashed 1
  62.367701 218442 TRACE_VNODE: gpfs_i_lookup exit: new inode 0xFFFF922071300000 iNum 29629892 (0x1C41DC4) cnP 0xFFFF922071300248 retP 0x0 code 0 rc 0
  62.367734 218444 TRACE_VNODE: gpfs_i_lookup enter: diP 0xFFFF9193CF457800 dentryP 0xFFFF9229527A89C0 name 'node.py' d_flags 0x80 d_count 1 unhashed 1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Monday, November 16, 2020 8:46 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi,
while the other nodes can well block the local one, as Frederick suggests,  there should at least be something visible locally waiting for these other nodes.
Looking at all waiters might be a good thing, but this case looks strange in other ways. Mind statement there are almost no local waiters and none of them gets older than 10 ms.

I am no developer nor do I have the code, so don't expect too much.  Can you tell what lookups you see (check in the trcrpt file, could be like gpfs_i_lookup or gpfs_v_lookup)?
Lookups are metadata ops, do you have a separate pool for your metadata?
How is that pool set up (doen to the physical block devices)?
Your trcsum down revealed 36 lookups, each one on avg taking >30ms. That is a lot (albeit the respective waiters won't show up at first glance as suspicious ...).
So, which waiters did you see  (hope you saved them, if not, do it next time).

What are the node you see this on and the whole cluster used for? What is the MaxFilesToCache setting (for that node and for others)? what HW is that, how big are your nodes (memory,CPU)?
To check the unreasonably short trace capture time: how large are the trcrpt files you obtain?


Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   13/11/2020 14:33
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Poor client performance
with high cpu   usage   of      mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

Regarding your previous message - waiters were coming / going with just
1-2 waiters when I ran the mmdiag command, with very low wait times (<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and this was the header result:

Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 cycles Measured cycle count update rate to be 2600001271 per second <---- using this value OS reported cycle count update rate as 2599999000 per second Trace milestones:
kernel trace enabled Fri Nov 13 08:20:01.800558000 2020 (TOD 1605273601.800558, cycles 20807897445779444) daemon trace enabled Fri Nov 13 08:20:01.910017000 2020 (TOD 1605273601.910017, cycles 20807897730372442) all streams included Fri Nov 13 08:20:26.423085049 2020 (TOD 1605273626.423085, cycles 20807961464381068) <---- useful part of trace extends from here trace quiesced Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, cycles 20807965037900696) <---- to here Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled with lookup commands / large lookup times. Can you help debug what those long lookup operations mean?

Unfinished operations:

27967 ***************** pagein ************** 1.362382116
27967 ***************** readpage ************** 1.362381516
139130 1.362448448 ********* Unfinished IO: buffer/disk 3002F670000
20:107498951168^\archive_data_16
104686 1.362022068 ********* Unfinished IO: buffer/disk 50011878000
1:47169618944^\archive_data_1
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFE
341710 1.362423815 ********* Unfinished IO: buffer/disk 20022218000
19:107498951680^\archive_data_15
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 18:3452986648^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 18:3452986648^\FFFFFFFF
139150 1.361122006 ********* Unfinished IO: buffer/disk 50012018000
2:47169622016^\archive_data_2
0 0.000000000 ********* Unfinished IO: buffer/disk 5003CEB8000 4:23073390592^\00000000FFFFFFFF
95782 1.361112791 ********* Unfinished IO: buffer/disk 40016300000
20:107498950656^\archive_data_16
0 0.000000000 ********* Unfinished IO: buffer/disk 2000EE78000 5:47631127040^\00000000FFFFFFFF
271076 1.361579585 ********* Unfinished IO: buffer/disk 20023DB8000
4:47169606656^\archive_data_4
341676 1.362018599 ********* Unfinished IO: buffer/disk 40038140000
5:47169614336^\archive_data_5
139150 1.361131599 MSG FSnd: nsdMsgReadExt msg_id 2930654492 Sduration
13292.382 + us
341676 1.362027104 MSG FSnd: nsdMsgReadExt msg_id 2930654495 Sduration
12396.877 + us
95782 1.361124739 MSG FSnd: nsdMsgReadExt msg_id 2930654491 Sduration
13299.242 + us
271076 1.361587653 MSG FSnd: nsdMsgReadExt msg_id 2930654493 Sduration
12836.328 + us
92182 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
341710 1.362429643 MSG FSnd: nsdMsgReadExt msg_id 2930654497 Sduration
11994.338 + us
341662 0.000000000 MSG FSnd: msg_id 0 Sduration 0.000 + us
139130 1.362458376 MSG FSnd: nsdMsgReadExt msg_id 2930654498 Sduration
11965.605 + us
104686 1.362028772 MSG FSnd: nsdMsgReadExt msg_id 2930654496 Sduration
12395.209 + us
412373 0.775676657 MSG FRep: nsdMsgReadExt msg_id 304915249 Rduration
598747.324 us Rlen 262144 Hduration 598752.112 + us
341770 0.589739579 MSG FRep: nsdMsgReadExt msg_id 338079050 Rduration
784684.402 us Rlen 4 Hduration 784692.651 + us
143315 0.536252844 MSG FRep: nsdMsgReadExt msg_id 631945522 Rduration
838171.137 us Rlen 233472 Hduration 838174.299 + us
341878 0.134331812 MSG FRep: nsdMsgReadExt msg_id 338079023 Rduration
1240092.169 us Rlen 262144 Hduration 1240094.403 + us
175478 0.587353287 MSG FRep: nsdMsgReadExt msg_id 338079047 Rduration
787070.694 us Rlen 262144 Hduration 787073.990 + us
139558 0.633517347 MSG FRep: nsdMsgReadExt msg_id 631945538 Rduration
740906.634 us Rlen 102400 Hduration 740910.172 + us
143308 0.958832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration
415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time: 1.374423981 seconds
Elapsed trace time from first VFS call to last: 1.374423980 Time idle between VFS calls: 0.001603738 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs readpage 1.151660085 1874 614.546 rdwr 0.431456904 581 742.611
read_inode2 0.001180648 934 1.264
follow_link 0.000029502 7 4.215
getattr 0.000048413 9 5.379
revalidate 0.000007080 67 0.106
pagein 1.149699537 1877 612.520
create 0.007664829 9 851.648
open 0.001032657 19 54.350
unlink 0.002563726 14 183.123
delete_inode 0.000764598 826 0.926
lookup 0.312847947 953 328.277
setattr 0.020651226 824 25.062
permission 0.000015018 1 15.018
rename 0.000529023 4 132.256
release 0.001613800 22 73.355
getxattr 0.000030494 6 5.082
mmap 0.000054767 1 54.767
llseek 0.000001130 4 0.283
readdir 0.000033947 2 16.973
removexattr 0.002119736 820 2.585

User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
42625 0.000000138 0.000031017 0.44% 99.56% 3
42378 0.000586959 0.011596801 4.82% 95.18% 32
42627 0.000000272 0.000013421 1.99% 98.01% 2
42641 0.003284590 0.012593594 20.69% 79.31% 35
42628 0.001522335 0.000002748 99.82% 0.18% 2
25464 0.003462795 0.500281914 0.69% 99.31% 12
301420 0.000016711 0.052848218 0.03% 99.97% 38
95103 0.000000544 0.000000000 100.00% 0.00% 1
145858 0.000000659 0.000794896 0.08% 99.92% 2
42221 0.000011484 0.000039445 22.55% 77.45% 5
371718 0.000000707 0.001805425 0.04% 99.96% 2
95109 0.000000880 0.008998763 0.01% 99.99% 2
95337 0.000010330 0.503057866 0.00% 100.00% 8
42700 0.002442175 0.012504429 16.34% 83.66% 35
189680 0.003466450 0.500128627 0.69% 99.31% 9
42681 0.006685396 0.000391575 94.47% 5.53% 16
42702 0.000048203 0.000000500 98.97% 1.03% 2
42703 0.000033280 0.140102087 0.02% 99.98% 9
224423 0.000000195 0.000000000 100.00% 0.00% 1
42706 0.000541098 0.000014713 97.35% 2.65% 3
106275 0.000000456 0.000000000 100.00% 0.00% 1
42721 0.000372857 0.000000000 100.00% 0.00% 1


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi Kamil,
in my mail just a few minutes ago I'd overlooked that the buffer size in your trace was indeed 128M (I suppose the trace file is adapting that size if not set in particular). That is very strange, even under high load, the trace should then capture some longer time than 10 secs, and , most of all, it should contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when everything is normal and check that out - you should see many records, the trcsum.awk should list just a small portion of unfinished ops at the end, ... If that is fine, then the tracing itself is affected by your crritical condition (never experienced that before - rather GPFS grinds to a halt than the trace is abandoned), and that might well be worth a support ticket.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: Uwe Falke/Germany/IBM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 10:21
Subject: Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process


Hi, Kamil,
looks your tracefile setting has been too low:
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here means you effectively captured a period of about 5ms only ... you can't see much from that.

I'd assumed the default trace file size would be sufficient here but it doesn't seem to.
try running with something like
mmtracectl --start --trace-file-size=512M --trace=io --tracedev-write-mode=overwrite -N <your_critical_node>.

However, if you say "no major waiter" - how many waiters did you see at any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload.

The very short cyle time of the trace points, OTOH, to high activity, OTOH the trace file setting appears quite low (trace=io doesnt' collect many trace infos, just basic IO stuff).
If I might ask: what version of GPFS are you running?

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 13/11/2020 03:33
Subject: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread
ID)
Interval for calibrating clock rate was 100.019054 seconds and
260049296314 cycles
Measured cycle count update rate to be 2599997559 per second <---- using this value OS reported cycle count update rate as 2599999000 per second Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220) daemon trace enabled Thu Nov
12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles
20701293488095152) all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 ***************** lookup ************** 0.165851604
290020 ***************** lookup ************** 0.151032241
302757 ***************** lookup ************** 0.168723402
301677 ***************** lookup ************** 0.070016530
230983 ***************** lookup ************** 0.127699082
21233 ***************** lookup ************** 0.060357257
309046 ***************** lookup ************** 0.157124551
301643 ***************** lookup ************** 0.165543982
304042 ***************** lookup ************** 0.172513838
167794 ***************** lookup ************** 0.056056815
189680 ***************** lookup ************** 0.062022237
362216 ***************** lookup ************** 0.072063619
406314 ***************** lookup ************** 0.114121838
167776 ***************** lookup ************** 0.114899642
303016 ***************** lookup ************** 0.144491120
290021 ***************** lookup ************** 0.142311603
167762 ***************** lookup ************** 0.144240366
248530 ***************** lookup ************** 0.168728131
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF

Elapsed trace time: 0.182617894 seconds
Elapsed trace time from first VFS call to last: 0.182617893 Time idle between VFS calls: 0.000006317 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.012021696 35 343.477
read_inode2 0.000100787 43 2.344
follow_link 0.000050609 8 6.326
pagein 0.000097806 10 9.781
revalidate 0.000010884 156 0.070
open 0.001001824 18 55.657
lookup 1.152449696 36 32012.492
delete_inode 0.000036816 38 0.969
permission 0.000080574 14 5.755
release 0.000470096 18 26.116
mmap 0.000340095 9 37.788
llseek 0.000001903 9 0.211


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
221919 0.000000244 0.050064080 0.00% 100.00% 4
167794 0.000011891 0.000069707 14.57% 85.43% 4
309046 0.147664569 0.000074663 99.95% 0.05% 9
349767 0.000000070 0.000000000 100.00% 0.00% 1
301677 0.017638372 0.048741086 26.57% 73.43% 12
84407 0.000010448 0.000016977 38.10% 61.90% 3
406314 0.000002279 0.000122367 1.83% 98.17% 7
25464 0.043270937 0.000006200 99.99% 0.01% 2
362216 0.000005617 0.000017498 24.30% 75.70% 2
379982 0.000000626 0.000000000 100.00% 0.00% 1
230983 0.123947465 0.000056796 99.95% 0.05% 6
21233 0.047877661 0.004887113 90.74% 9.26% 17
302757 0.154486003 0.010695642 93.52% 6.48% 24
248530 0.000006763 0.000035442 16.02% 83.98% 3
303016 0.014678039 0.000013098 99.91% 0.09% 2
301643 0.088025575 0.054036566 61.96% 38.04% 33
3339 0.000034997 0.178199426 0.02% 99.98% 35
21234 0.164240073 0.000262711 99.84% 0.16% 39
167762 0.000011886 0.000041865 22.11% 77.89% 3
336006 0.000001246 0.100519562 0.00% 100.00% 16
304042 0.121322325 0.019218406 86.33% 13.67% 33
301644 0.054325242 0.087715613 38.25% 61.75% 37
301680 0.000015005 0.020838281 0.07% 99.93% 9
290020 0.147713357 0.000121422 99.92% 0.08% 19
290021 0.000476072 0.000085833 84.72% 15.28% 10
44777 0.040819757 0.000010957 99.97% 0.03% 3
189680 0.000000044 0.000002376 1.82% 98.18% 1
241759 0.000000698 0.000000000 100.00% 0.00% 1
184839 0.000001621 0.150341986 0.00% 100.00% 28
362220 0.000010818 0.000020949 34.05% 65.95% 2
104687 0.000000495 0.000000000 100.00% 0.00% 1

# total App-read/write = 45 Average duration = 0.000269322 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 34 0.755556 0.755556 34 0 32889 0
0.001000 10 0.222222 0.977778 10 0 108136 0
0.004000 1 0.022222 1.000000 1 0 8 0

# max concurrant App-read/write = 2
# conc count % %ile
1 38 0.844444 0.844444
2 7 0.155556 1.000000


Capture 2

Unfinished operations:

335096 ***************** lookup ************** 0.289127895
334691 ***************** lookup ************** 0.225380797
362246 ***************** lookup ************** 0.052106493
334694 ***************** lookup ************** 0.048567769
362220 ***************** lookup ************** 0.054825580
333972 ***************** lookup ************** 0.275355791
406314 ***************** lookup ************** 0.283219905
334686 ***************** lookup ************** 0.285973208
289606 ***************** lookup ************** 0.064608288
21233 ***************** lookup ************** 0.074923689
189680 ***************** lookup ************** 0.089702578
335100 ***************** lookup ************** 0.151553955
334685 ***************** lookup ************** 0.117808430
167700 ***************** lookup ************** 0.119441314
336813 ***************** lookup ************** 0.120572137
334684 ***************** lookup ************** 0.124718126
21234 ***************** lookup ************** 0.131124745
84407 ***************** lookup ************** 0.132442945
334696 ***************** lookup ************** 0.140938740
335094 ***************** lookup ************** 0.201637910
167735 ***************** lookup ************** 0.164059859
334687 ***************** lookup ************** 0.252930745
334695 ***************** lookup ************** 0.278037098
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000
3:439888512^\scratch_metadata_5
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration
199.688 + us
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration
266959.867 us Rlen 0 Hduration 266963.954 + us

Elapsed trace time: 0.292021772 seconds
Elapsed trace time from first VFS call to last: 0.292021771 Time idle between VFS calls: 0.001436519 seconds

Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs rdwr 0.000831801 4 207.950
read_inode2 0.000082347 31 2.656
pagein 0.000033905 3 11.302
revalidate 0.000013109 156 0.084
open 0.000237969 22 10.817
lookup 1.233407280 10 123340.728
delete_inode 0.000013877 33 0.421
permission 0.000046486 8 5.811
release 0.000172456 21 8.212
mmap 0.000064411 2 32.206
llseek 0.000000391 2 0.196
readdir 0.000213657 36 5.935


User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops
335094 0.053506265 0.000170270 99.68% 0.32% 16
167700 0.000008522 0.000027547 23.63% 76.37% 2
167776 0.000008293 0.000019462 29.88% 70.12% 2
334684 0.000023562 0.000160872 12.78% 87.22% 8
349767 0.000000467 0.250029787 0.00% 100.00% 5
84407 0.000000230 0.000017947 1.27% 98.73% 2
334685 0.000028543 0.000094147 23.26% 76.74% 8
406314 0.221755229 0.000009720 100.00% 0.00% 2
334694 0.000024913 0.000125229 16.59% 83.41% 10
335096 0.254359005 0.000240785 99.91% 0.09% 18
334695 0.000028966 0.000127823 18.47% 81.53% 10
334686 0.223770082 0.000267271 99.88% 0.12% 24
334687 0.000031265 0.000132905 19.04% 80.96% 9
334696 0.000033808 0.000131131 20.50% 79.50% 9
129075 0.000000102 0.000000000 100.00% 0.00% 1
341842 0.000000318 0.000000000 100.00% 0.00% 1
335100 0.059518133 0.000287934 99.52% 0.48% 19
224423 0.000000471 0.000000000 100.00% 0.00% 1
336812 0.000042720 0.000193294 18.10% 81.90% 10
21233 0.000556984 0.000083399 86.98% 13.02% 11
289606 0.000000088 0.000018043 0.49% 99.51% 2
362246 0.014440188 0.000046516 99.68% 0.32% 4
21234 0.000524848 0.000162353 76.37% 23.63% 13
336813 0.000046426 0.000175666 20.90% 79.10% 9
3339 0.000011816 0.272396876 0.00% 100.00% 29
341818 0.000000778 0.000000000 100.00% 0.00% 1
167735 0.000007866 0.000049468 13.72% 86.28% 3
175480 0.000000278 0.000000000 100.00% 0.00% 1
336006 0.000001170 0.250020470 0.00% 100.00% 16
44777 0.000000367 0.250149757 0.00% 100.00% 6
189680 0.000002717 0.000006518 29.42% 70.58% 1
184839 0.000003001 0.250144214 0.00% 100.00% 35
145858 0.000000687 0.000000000 100.00% 0.00% 1
333972 0.218656404 0.000043897 99.98% 0.02% 4
334691 0.187695040 0.000295117 99.84% 0.16% 25

# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) count % %ile read write avgBytesR avgBytesW
0.000500 7 1.000000 1.000000 7 0 1172 0


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Uwe Falke
Sent: Wednesday, November 11, 2020 8:57 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Hi, Kamil,
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters

That might give you a first idea what is taking long and which components are involved.

Also,
mmdiag --iohist
shows you the last IOs and some stats (service time, size) for them.

Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).
GPFS has a nice tracing which you can configure or just run the default trace.

Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by mmtracectl --stop -N <your_critical_node>

Do not wait to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).


There you should see lines with FIO which carry the inode of the related file after the "tag" keyword.
example:
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path.
you need to build this first if not yet done:
cd /usr/lpp/mmfs/samples/util ; make
, then run
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool :
/usr/lpp/mmfs/samples/debugtools/trsum.awk.

Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Gesch?ftsf?hrung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:
Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122


From: "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 11/11/2020 23:36
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-bounces at spectrumscale.org


We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete. This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science. I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier. Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website www.squarepoint-capital.com. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201120/8259daa6/attachment-0002.htm>

From hooft at natlab.research.philips.com  Sat Nov 21 00:37:01 2020
From: hooft at natlab.research.philips.com (Peter van Hooft)
Date: Sat, 21 Nov 2020 01:37:01 +0100
Subject: [gpfsug-discuss] mmchdisk /dev/fs start -a progress
Message-ID: <20201121003701.GA32509@pc67340132.natlab.research.philips.com>


Hello,

Is it possible to find out the progress of the 'mmchdisk /dev/fs start -a'
command when the controlling terminal had been lost?

We can see the task running on the fs manager node with 'mmdiag --commands' with
attributes 'hold PIT/disk waitTime 0'
We are starting to worry the mmchdisk is taking too long, and see continuously waiters like
Waiting 3.1946 sec since 01:28:23, ignored, thread 22092 TSCHDISKCmdThread: on ThCond 0x180267573D0 (SGManagementMgrDataCondvar), reason 'waiting for stripe group to recover'

Thanks for any hints.

Peter van Hooft
Philips Research


From jonathan.buzzard at strath.ac.uk  Sat Nov 21 10:13:42 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Sat, 21 Nov 2020 10:13:42 +0000
Subject: [gpfsug-discuss] mmchdisk /dev/fs start -a progress
In-Reply-To: <20201121003701.GA32509@pc67340132.natlab.research.philips.com>
References: <20201121003701.GA32509@pc67340132.natlab.research.philips.com>
Message-ID: <ecf94819-9cd5-7868-4d7f-13e9e43bc87a@strath.ac.uk>

On 21/11/2020 00:37, Peter van Hooft wrote:
> 
> Hello,
> 
> Is it possible to find out the progress of the 'mmchdisk /dev/fs start -a'
> command when the controlling terminal had been lost?
> 

I don't think so. You are lucky it is still running

> We can see the task running on the fs manager node with 'mmdiag --commands' with
> attributes 'hold PIT/disk waitTime 0'
> We are starting to worry the mmchdisk is taking too long, and see continuously waiters like
> Waiting 3.1946 sec since 01:28:23, ignored, thread 22092 TSCHDISKCmdThread: on ThCond 0x180267573D0 (SGManagementMgrDataCondvar), reason 'waiting for stripe group to recover'
> 
> Thanks for any hints.
> 

Not that this is going to help this time, but it is why you should 
*ALWAYS* without exception run these sorts of commands within a 
screen/tmux session so when you loose the connection to the server you 
can just reconnect and pick it up again.

This is introductory system administration 101. No critical or long 
running command should ever be dependant on a remote controlling 
terminal. If you can't run them locally then run them in a screen or 
tmux session.

There are plenty of good howto's for both screen and tmux on the 
internet. Depending on which distribution you use I would note that 
RedHat have very annoyingly and for completely specious reasons removed 
screen from RHEL8 and left tmux. So if you are starting from scratch 
tmux is the one to learn :-(


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From robert.horton at icr.ac.uk  Mon Nov 23 15:06:05 2020
From: robert.horton at icr.ac.uk (Robert Horton)
Date: Mon, 23 Nov 2020 15:06:05 +0000
Subject: [gpfsug-discuss] AFM experiences?
Message-ID: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>

Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

From novosirj at rutgers.edu  Mon Nov 23 15:30:47 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Mon, 23 Nov 2020 15:30:47 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>

We use it similar to how you describe it. We now run 5.0.4.1 on the client side (I mean actual client nodes, not the home or cache clusters). Before that, we had reliability problems (failure to cache libraries of programs that were executing, etc.). The storage clusters in our case are 5.0.3-2.3.

We also got bit by the quotas thing. You have to set them the same on both sides, or you will have problems. It seems a little silly that they are not kept in sync by GPFS, but that?s how it is. If memory serves, the result looked like an AFM failure (queue not being cleared), but it turned out to be that the files just could not be written at the home cluster because the user was over quota there. I also think I?ve seen load average increase due to this sort of thing, but I may be mixing that up with another problem scenario.

We monitor via Nagios which I believe monitors using mmafmctl commands. Really can?t think of a single time, apart from the other day, where the queue backed up. The instance the other day only lasted a few minutes (if you suddenly create many small files, like installing new software, it may not catch up instantly).

--
#BlackLivesMatter
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Nov 23, 2020, at 10:19, Robert Horton <robert.horton at icr.ac.uk> wrote:

?Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

--
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201123/a5efe006/attachment-0002.htm>

From dean.flanders at fmi.ch  Mon Nov 23 17:58:12 2020
From: dean.flanders at fmi.ch (Flanders, Dean)
Date: Mon, 23 Nov 2020 17:58:12 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
	<440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>
Message-ID: <b37dedb470bb426692cdcc9c560d7b09@ex2013mbx2.fmi.ch>

Hello Rob,

We looked at AFM years ago for DR, but after reading the bug reports, we avoided it, and also have had seen a case where it had to be removed from one customer, so we have kept things simple. Now looking again a few years later there are still issues, IBM Spectrum Scale Active File Management (AFM) issues which may result in undetected data corruption<https://www.ibm.com/support/pages/ibm-spectrum-scale-active-file-management-afm-issues-which-may-result-undetected-data-corruption>, and that was just my first google hit. We have kept it simple, and use a parallel rsync process with policy engine and can hit wire speed for copying of millions of small files in order to have isolation between the sites at GB/s. I am not saying it is bad, just that it needs an appropriate risk/reward ratio to implement as it increases overall complexity.

Kind regards,

Dean

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
Sent: Monday, November 23, 2020 4:31 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM experiences?

We use it similar to how you describe it. We now run 5.0.4.1 on the client side (I mean actual client nodes, not the home or cache clusters). Before that, we had reliability problems (failure to cache libraries of programs that were executing, etc.). The storage clusters in our case are 5.0.3-2.3.

We also got bit by the quotas thing. You have to set them the same on both sides, or you will have problems. It seems a little silly that they are not kept in sync by GPFS, but that?s how it is. If memory serves, the result looked like an AFM failure (queue not being cleared), but it turned out to be that the files just could not be written at the home cluster because the user was over quota there. I also think I?ve seen load average increase due to this sort of thing, but I may be mixing that up with another problem scenario.

We monitor via Nagios which I believe monitors using mmafmctl commands. Really can?t think of a single time, apart from the other day, where the queue backed up. The instance the other day only lasted a few minutes (if you suddenly create many small files, like installing new software, it may not catch up instantly).

--
#BlackLivesMatter
____
|| \\UTGERS<file://UTGERS>,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'


On Nov 23, 2020, at 10:19, Robert Horton <robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk>> wrote:
?Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

--
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk<mailto:robert.horton at icr.ac.uk> | W www.icr.ac.uk<http://www.icr.ac.uk> |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201123/eb7b3f63/attachment-0002.htm>

From abeattie at au1.ibm.com  Mon Nov 23 21:54:39 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Mon, 23 Nov 2020 21:54:39 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <OFEDB69982.350EFD12-ON00258629.00785C1F-1606168479528@notes.na.collabserv.com>


Rob,

Talk to Jake Carroll from the University of Queensland, he has done a
number of presentations at Scale User Groups of UQ?s MeDiCI data fabric
which is based on Spectrum Scale and does very aggressive use of AFM.

Their use of AFM is not only on campus, but to remote Storage clusters
between 30km and 1500km away from their Home cluster.  They have also
tested AFM between Australia, Japan, and USA

Sent from my iPhone

> On 24 Nov 2020, at 01:20, Robert Horton <robert.horton at icr.ac.uk> wrote:
>
> ?Hi all,
>
> We're thinking about deploying AFM and would be interested in hearing
> from anyone who has used it in anger - particularly independent writer.
>
> Our scenario is we have a relatively large but slow (mainly because it
> is stretched over two sites with a 10G link) cluster for long/medium-
> term storage and a smaller but faster cluster for scratch storage in
> our HPC system. What we're thinking of doing is using some/all of the
> scratch capacity as an IW cache of some/all of the main cluster, the
> idea to reduce the need for people to manually move data between the
> two.
>
> It seems to generally work as expected in a small test environment,
> although we have a few concerns:
>
> - Quota management on the home cluster - we need a way of ensuring
> people don't write data to the cache which can't be accomodated on
> home. Probably not insurmountable but needs a bit of thought...
>
> - It seems inodes on the cache only get freed when they are deleted on
> the cache cluster - not if they get deleted from the home cluster or
> when the blocks are evicted from the cache. Does this become an issue
> in time?
>
> If anyone has done anything similar I'd be interested to hear how you
> got on. It would be intresting to know if you created a cache fileset
> for each home fileset or just one for the whole lot, as well as any
> other pearls of wisdom you may have to offer.
>
> Thanks!
> Rob
>
> --
> Robert Horton | Research Data Storage Lead
> The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
> T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
> Twitter @ICR_London
> Facebook: www.facebook.com/theinstituteofcancerresearch
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable
Company Limited by Guarantee, Registered in England under Company No.
534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the addressee only.
If the message is received by anyone other than the addressee, please
return the message to the sender by replying to it and then delete the
message from your computer and network.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201123/0d01cf78/attachment-0002.htm>

From novosirj at rutgers.edu  Mon Nov 23 23:14:08 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Mon, 23 Nov 2020 23:14:08 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <OFEDB69982.350EFD12-ON00258629.00785C1F-1606168479528@notes.na.collabserv.com>
References: <OFEDB69982.350EFD12-ON00258629.00785C1F-1606168479528@notes.na.collabserv.com>
Message-ID: <2C7317A6-B9DF-450A-92A6-AE156396204A@rutgers.edu>

Ours are about 50 and 100 km from the home cluster, but it?s over 100Gb fiber.

> On Nov 23, 2020, at 4:54 PM, Andrew Beattie <abeattie at au1.ibm.com> wrote:
> 
> Rob,
> 
> Talk to Jake Carroll from the University of Queensland, he has done a number of presentations at Scale User Groups of UQ?s MeDiCI data fabric which is based on Spectrum Scale and does very aggressive use of AFM.
> 
> Their use of AFM is not only on campus, but to remote Storage clusters between 30km and 1500km away from their Home cluster. They have also tested AFM between Australia, Japan, and USA
> 
> Sent from my iPhone
> 
> > On 24 Nov 2020, at 01:20, Robert Horton <robert.horton at icr.ac.uk> wrote:
> > 
> > ?Hi all,
> > 
> > We're thinking about deploying AFM and would be interested in hearing
> > from anyone who has used it in anger - particularly independent writer.
> > 
> > Our scenario is we have a relatively large but slow (mainly because it
> > is stretched over two sites with a 10G link) cluster for long/medium-
> > term storage and a smaller but faster cluster for scratch storage in
> > our HPC system. What we're thinking of doing is using some/all of the
> > scratch capacity as an IW cache of some/all of the main cluster, the
> > idea to reduce the need for people to manually move data between the
> > two.
> > 
> > It seems to generally work as expected in a small test environment,
> > although we have a few concerns:
> > 
> > - Quota management on the home cluster - we need a way of ensuring
> > people don't write data to the cache which can't be accomodated on
> > home. Probably not insurmountable but needs a bit of thought...
> > 
> > - It seems inodes on the cache only get freed when they are deleted on
> > the cache cluster - not if they get deleted from the home cluster or
> > when the blocks are evicted from the cache. Does this become an issue
> > in time?
> > 
> > If anyone has done anything similar I'd be interested to hear how you
> > got on. It would be intresting to know if you created a cache fileset
> > for each home fileset or just one for the whole lot, as well as any
> > other pearls of wisdom you may have to offer.
> > 
> > Thanks!
> > Rob
> > 
> > -- 
> > Robert Horton | Research Data Storage Lead
> > The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
> > T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
> > Twitter @ICR_London
> > Facebook: www.facebook.com/theinstituteofcancerresearch
> > 
> > The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
> > 
> > This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'


From vpuvvada at in.ibm.com  Tue Nov 24 02:32:01 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Tue, 24 Nov 2020 08:02:01 +0530
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <OF7C863960.89C105B0-ON6525862A.000D46CF-6525862A.000DEB0B@notes.na.collabserv.com>

>- Quota management on the home cluster - we need a way of ensuring
>people don't write data to the cache which can't be accomodated on
>home. Probably not insurmountable but needs a bit of thought...

You could set same quotas between cache and home clusters. AFM does not 
support replication of filesystem metadata like quotas, fileset 
configuration etc...

>- It seems inodes on the cache only get freed when they are deleted on
>the cache cluster - not if they get deleted from the home cluster or
>when the blocks are evicted from the cache. Does this become an issue
>in time?

AFM periodically revalidates with home cluster. If the files/dirs were 
already deleted at home cluster, AFM moves them to <fileset path>/.ptrash 
directory at cache cluster during the revalidation. These files can be 
removed manually by user or auto eviction process. If the .ptrash 
directory is not cleaned up on time, it might result into quota issues at 
cache cluster.

~Venkat (vpuvvada at in.ibm.com)


From:   Robert Horton <robert.horton at icr.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/23/2020 08:51 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM experiences?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only.  If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/71e61c21/attachment-0002.htm>

From vpuvvada at in.ibm.com  Tue Nov 24 02:37:18 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Tue, 24 Nov 2020 08:07:18 +0530
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <b37dedb470bb426692cdcc9c560d7b09@ex2013mbx2.fmi.ch>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk><440D8981-58C7-4CF1-BF06-BC11F27A47BC@rutgers.edu>
	<b37dedb470bb426692cdcc9c560d7b09@ex2013mbx2.fmi.ch>
Message-ID: <OFE7699146.DC34213F-ON6525862A.000DF453-6525862A.000E66F1@notes.na.collabserv.com>

Dean,

This is one of the corner case which is associated with sparse files at 
the home cluster. You could try with latest versions of scale, AFM 
indepedent-writer mode have many performance/functional improvements in 
newer releases. 

~Venkat (vpuvvada at in.ibm.com)


From:   "Flanders, Dean" <dean.flanders at fmi.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   11/23/2020 11:44 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] AFM experiences?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello Rob,
 
We looked at AFM years ago for DR, but after reading the bug reports, we 
avoided it, and also have had seen a case where it had to be removed from 
one customer, so we have kept things simple. Now looking again a few years 
later there are still issues, IBM Spectrum Scale Active File Management 
(AFM) issues which may result in undetected data corruption, and that was 
just my first google hit. We have kept it simple, and use a parallel rsync 
process with policy engine and can hit wire speed for copying of millions 
of small files in order to have isolation between the sites at GB/s. I am 
not saying it is bad, just that it needs an appropriate risk/reward ratio 
to implement as it increases overall complexity.
 
Kind regards,
 
Dean
 
From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
Sent: Monday, November 23, 2020 4:31 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM experiences?
 
We use it similar to how you describe it. We now run 5.0.4.1 on the client 
side (I mean actual client nodes, not the home or cache clusters). Before 
that, we had reliability problems (failure to cache libraries of programs 
that were executing, etc.). The storage clusters in our case are 
5.0.3-2.3. 
 
We also got bit by the quotas thing. You have to set them the same on both 
sides, or you will have problems. It seems a little silly that they are 
not kept in sync by GPFS, but that?s how it is. If memory serves, the 
result looked like an AFM failure (queue not being cleared), but it turned 
out to be that the files just could not be written at the home cluster 
because the user was over quota there. I also think I?ve seen load average 
increase due to this sort of thing, but I may be mixing that up with 
another problem scenario. 

We monitor via Nagios which I believe monitors using mmafmctl commands. 
Really can?t think of a single time, apart from the other day, where the 
queue backed up. The instance the other day only lasted a few minutes (if 
you suddenly create many small files, like installing new software, it may 
not catch up instantly). 
 
-- 
#BlackLivesMatter
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS 
Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, 
Newark
    `'


On Nov 23, 2020, at 10:19, Robert Horton <robert.horton at icr.ac.uk> wrote:
?Hi all,

We're thinking about deploying AFM and would be interested in hearing
from anyone who has used it in anger - particularly independent writer.

Our scenario is we have a relatively large but slow (mainly because it
is stretched over two sites with a 10G link) cluster for long/medium-
term storage and a smaller but faster cluster for scratch storage in
our HPC system. What we're thinking of doing is using some/all of the
scratch capacity as an IW cache of some/all of the main cluster, the
idea to reduce the need for people to manually move data between the
two.

It seems to generally work as expected in a small test environment,
although we have a few concerns:

- Quota management on the home cluster - we need a way of ensuring
people don't write data to the cache which can't be accomodated on
home. Probably not insurmountable but needs a bit of thought...

- It seems inodes on the cache only get freed when they are deleted on
the cache cluster - not if they get deleted from the home cluster or
when the blocks are evicted from the cache. Does this become an issue
in time?

If anyone has done anything similar I'd be interested to hear how you
got on. It would be intresting to know if you created a cache fileset
for each home fileset or just one for the whole lot, as well as any
other pearls of wisdom you may have to offer.

Thanks!
Rob

-- 
Robert Horton | Research Data Storage Lead
The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
Twitter @ICR_London
Facebook: www.facebook.com/theinstituteofcancerresearch

The Institute of Cancer Research: Royal Cancer Hospital, a charitable 
Company Limited by Guarantee, Registered in England under Company No. 
534147 with its Registered Office at 123 Old Brompton Road, London SW7 
3RP.

This e-mail message is confidential and for use by the addressee only.  If 
the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message 
from your computer and network.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/e89b392c/attachment-0002.htm>

From vpuvvada at in.ibm.com  Tue Nov 24 02:41:21 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Tue, 24 Nov 2020 08:11:21 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?Migrate/syncronize_data_from_Isilon_to?=
 =?utf-8?q?_Scale_over=09NFS=3F?=
In-Reply-To: <OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>
Message-ID: <OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>

AFM provides near zero downtime for migration.  As of today,  AFM 
migration does not support ACLs or other EAs migration from non scale 
(GPFS) source.

https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1ins_uc_migrationusingafmmigrationenhancements.htm

~Venkat (vpuvvada at in.ibm.com)


From:   "Frederick Stock" <stockf at us.ibm.com>
To:     gpfsug-discuss at spectrumscale.org
Cc:     gpfsug-discuss at spectrumscale.org
Date:   11/17/2020 03:14 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over       NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Have you considered using the AFM feature of Spectrum Scale?  I doubt it 
will provide any speed improvement but it would allow for data to be 
accessed as it was being migrated.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com
 
 
----- Original message -----
From: Andi Christiansen <andi at christiansen.xxx>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon 
to Scale over NFS?
Date: Mon, Nov 16, 2020 2:44 PM
 
Hi all,
 
i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes...
 
as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 
 
does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit...
 
 
if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/b7fd7863/attachment-0002.htm>

From luke.raimbach at googlemail.com  Tue Nov 24 12:16:55 2020
From: luke.raimbach at googlemail.com (Luke Raimbach)
Date: Tue, 24 Nov 2020 12:16:55 +0000
Subject: [gpfsug-discuss] AFM experiences?
In-Reply-To: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
References: <be601c83c16999b3583890684362ba3663ea7dab.camel@icr.ac.uk>
Message-ID: <CAAGb8Nu4wATFNhUnva0r7gw-N3t7zP9GFR-bs+-Nhe4kwNafVA@mail.gmail.com>

Hi Rob,

Some things to think about from experiences a year or so ago...

If you intend to perform any HPC workload (writing / updating / deleting
files) inside a cache, then appropriately specified gateway nodes will be
your friend:

1. When creating, updating or deleting files in the cache, each operation
requires acknowledgement from the gateway handling that particular cache,
before returning ACK to the application. This will add a latency overhead
to the workload - if your storage is IB connected to the compute cluster
and using verbsRdmaSend for example, this will increase your happiness.
Connecting low-spec gateway nodes over 10GbE with the expectation that they
will "drain down" over time was a sore learning experience in the early
days of AFM for me.

2. AFM queues can quickly eat up memory. I think around 350bytes of memory
is consumed for each operation in the AFM queue, so if you have huge file
churn inside a cache then the queue will grow very quickly. If you run out
of memory, the node dies and you enter cache recovery when it comes back up
(or another node takes over). This can end up cycling the node as it tries
to revalidate a cache and keep up with any other queues. Get more memory!

I've not used AFM for a while now and I think the latter enormity has some
mitigation against create / delete cycles (i.e. the create operation is
expunged from the queue instead of two operations being played back to the
home). I expect IBM experts will tell you more about those improvements.
Also, several smaller caches are better than one large one (parallel
execution of queues helps utilise the available bandwidth and you have a
better failover spread if you have multiple gateways, for example).

Independent Writer mode comes with some small danger (user error or
impatience mainly) inasmuch as whoever updates a file last will win; e.g.
home user A writes a file, then cache user B updates the file after reading
it and tells user A the update is complete, when really the gateway queue
is long and the change is waiting to go back home. User A uses the file
expecting the changes are made, then updates it with some results.
Meanwhile the AFM queue drains down and user B's change arrives after user
A has completed their changes. The interim version of the file user B
modified will persist at home and user A's latest changes are lost. Some
careful thought about workflow (or good user training about eventual
consistency) will save some potential misery on this front.

Hope this helps,
Luke


On Mon, 23 Nov 2020 at 15:19, Robert Horton <robert.horton at icr.ac.uk> wrote:

> Hi all,
>
> We're thinking about deploying AFM and would be interested in hearing
> from anyone who has used it in anger - particularly independent writer.
>
> Our scenario is we have a relatively large but slow (mainly because it
> is stretched over two sites with a 10G link) cluster for long/medium-
> term storage and a smaller but faster cluster for scratch storage in
> our HPC system. What we're thinking of doing is using some/all of the
> scratch capacity as an IW cache of some/all of the main cluster, the
> idea to reduce the need for people to manually move data between the
> two.
>
> It seems to generally work as expected in a small test environment,
> although we have a few concerns:
>
> - Quota management on the home cluster - we need a way of ensuring
> people don't write data to the cache which can't be accomodated on
> home. Probably not insurmountable but needs a bit of thought...
>
> - It seems inodes on the cache only get freed when they are deleted on
> the cache cluster - not if they get deleted from the home cluster or
> when the blocks are evicted from the cache. Does this become an issue
> in time?
>
> If anyone has done anything similar I'd be interested to hear how you
> got on. It would be intresting to know if you created a cache fileset
> for each home fileset or just one for the whole lot, as well as any
> other pearls of wisdom you may have to offer.
>
> Thanks!
> Rob
>
> --
> Robert Horton | Research Data Storage Lead
> The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB
> T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk |
> Twitter @ICR_London
> Facebook: www.facebook.com/theinstituteofcancerresearch
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable
> Company Limited by Guarantee, Registered in England under Company No.
> 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the addressee only.  If
> the message is received by anyone other than the addressee, please return
> the message to the sender by replying to it and then delete the message
> from your computer and network.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/0c2dc53e/attachment-0002.htm>

From yeep at robust.my  Tue Nov 24 14:09:34 2020
From: yeep at robust.my (T.A. Yeep)
Date: Tue, 24 Nov 2020 22:09:34 +0800
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
 over NFS?
In-Reply-To: <OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>
References: <1388247256.209171.1605555854969@privateemail.com>
	<OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com>
	<OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>
Message-ID: <CAFaHHOzDk9rxZJSouJoHOm++Z0HuMKqUPrngKMwXOZknOS8Y+Q@mail.gmail.com>

Hi Venkat,

If ACLs and other EAs migration from non scale is not supported by AFM, is
there any 3rd party tool that could complement that when paired with AFM?

On Tue, Nov 24, 2020 at 10:41 AM Venkateswara R Puvvada <vpuvvada at in.ibm.com>
wrote:

> AFM provides near zero downtime for migration.  As of today,  AFM
> migration does not support ACLs or other EAs migration from non scale
> (GPFS) source.
>
>
> https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1ins_uc_migrationusingafmmigrationenhancements.htm
>
> ~Venkat (vpuvvada at in.ibm.com)
>
>
>
> From:        "Frederick Stock" <stockf at us.ibm.com>
> To:        gpfsug-discuss at spectrumscale.org
> Cc:        gpfsug-discuss at spectrumscale.org
> Date:        11/17/2020 03:14 AM
> Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data
> from Isilon to Scale over        NFS?
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Have you considered using the AFM feature of Spectrum Scale?  I doubt it
> will provide any speed improvement but it would allow for data to be
> accessed as it was being migrated.
>
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> stockf at us.ibm.com
>
>
> ----- Original message -----
> From: Andi Christiansen <andi at christiansen.xxx>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
> Cc:
> Subject: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon
> to Scale over NFS?
> Date: Mon, Nov 16, 2020 2:44 PM
>
> Hi all,
>
> i have got a case where a customer wants 700TB migrated from isilon to
> Scale and the only way for him is exporting the same directory on NFS from
> two different nodes...
>
> as of now we are using multiple rsync processes on different parts of
> folders within the main directory. this is really slow and will take
> forever.. right now 14 rsync processes spread across 3 nodes fetching from
> 2..
>
> does anyone know of a way to speed it up? right now we see from 1Gbit to
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from
> scale nodes and 20Gbits from isilon so we should be able to reach just
> under 20Gbit...
>
>
> if anyone have any ideas they are welcome!
>
>
> Thanks in advance
> Andi Christiansen
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Best regards

*T.A. Yeep*Mobile: +6-016-719 8506 | Tel: +6-03-7628 0526 |
www.robusthpc.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/02bbc9ee/attachment-0002.htm>

From chair at spectrumscale.org  Tue Nov 24 09:39:47 2020
From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair))
Date: Tue, 24 Nov 2020 09:39:47 +0000
Subject: [gpfsug-discuss] SSUG::Digital with CIUK
Message-ID: <>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/d3275165/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: meeting.ics
Type: text/calendar
Size: 2623 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/d3275165/attachment-0002.ics>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 3499622 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/d3275165/attachment-0002.png>

From prasad.surampudi at theatsgroup.com  Tue Nov 24 16:05:19 2020
From: prasad.surampudi at theatsgroup.com (Prasad Surampudi)
Date: Tue, 24 Nov 2020 16:05:19 +0000
Subject: [gpfsug-discuss] mmhealth reports fserrinvalid errors on CNFS
	servers
Message-ID: <MN2PR13MB297645D9A920B7E5E2D16C8F9EFB0@MN2PR13MB2976.namprd13.prod.outlook.com>

We are seeing fserrinvalid error on couple of filesystems in Spectrum Scale cluster. These errors are reported but mmhealth  only couple of nodes  (CNFS servers) in the cluster, but mmhealth on other nodes shows no issues. Any idea what this error means? And why its reported on CNFS servers and not on other nodes? What need to be done to fix this issue?

sudo /usr/lpp/mmfs/bin/mmhealth node show FILESYSTEM -v

Node name:      cnfs05-gpfs

Component      Status              Reasons
-------------------------------------------------------------------
FILESYSTEM     DEGRADED            fserrinvalid(vol)
  argus        HEALTHY             -
  dytech       HEALTHY             -
  enlnt_E      HEALTHY             -
  enlnt_Es     HEALTHY             -
  haaforfs     HEALTHY             -
  haaforfs2    HEALTHY             -
  historical   HEALTHY             -
  prcfs        HEALTHY             -
  qmtfs        HEALTHY             -
  research     HEALTHY             -
  research2    HEALTHY             -
  schon_raw    HEALTHY             -
  uhdb_vol1    HEALTHY             -
  vol          DEGRADED            fserrinvalid(vol)

Event                Parameter      Severity            Event Message
----------------------------------------------------------------------------------------------------------
fserrinvalid         vol            ERROR               FS=vol,ErrNo=1124,Unknown error=0464000000010000000180A108BC000079B4000000000000003400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/ee724fed/attachment-0002.htm>

From NSCHULD at de.ibm.com  Tue Nov 24 16:44:35 2020
From: NSCHULD at de.ibm.com (Norbert Schuld)
Date: Tue, 24 Nov 2020 17:44:35 +0100
Subject: [gpfsug-discuss]
 =?utf-8?q?mmhealth_reports_fserrinvalid_errors_o?=
 =?utf-8?q?n_CNFS=09servers?=
In-Reply-To: <MN2PR13MB297645D9A920B7E5E2D16C8F9EFB0@MN2PR13MB2976.namprd13.prod.outlook.com>
References: <MN2PR13MB297645D9A920B7E5E2D16C8F9EFB0@MN2PR13MB2976.namprd13.prod.outlook.com>
Message-ID: <OFCA885747.7355EDC3-ONC125862A.005B0FC2-C125862A.005BF90D@notes.na.collabserv.com>


To get an explanation for any event one can ask the system:

# mmhealth event show fserrinvalid

Event Name:              fserrinvalid

Event ID:                999338

Description:             Unrecognized FSSTRUCT error received. Check
documentation

Cause:                   A filesystem corruption detected

User Action:             Check error message for details and the
mmfs.log.latest log for further details. See the topic Checking and
repairing a file system in the IBM Spectrum Scale documentation:
Administering. Managing file systems. If the file system is severely
damaged, the best course of action is to follow the procedures in section:
Additional information to collect for file system corruption or
MMFS_FSSTRUCT errors
Severity:                ERROR

State:                   DEGRADED

The event is triggered by a callback which may not fire on all nodes, that
is why only a subset of nodes have the information.
Depending on the version of scale the procedure to remove the event varies:
For newer release please use

# mmhealth event resolve
Missing arguments.
Usage:
  mmhealth event resolve {EventName} [Identifier]

For older releases it is described here:
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_fsstruc.htm
mmsysmonc event filesystem fsstruct_fixed <filesystem_name>
<filesystem_name>


Mit freundlichen Gr??en / Kind regards

Norbert Schuld

M925:IBM Spectrum Scale Software Development
                                                                                                                
                                                                                                                
 Phone:            +49-160 70 70 335                   IBM Deutschland Research & Development                   
                                                      GmbH                                                      
                                                                                                                
 Email:            nschuld at de.ibm.com                  Am Weiher 24                                             
                                                                                                                
                                                       65451 Kelsterbach                                        
                                                                                                                
 Knowing is not                                                                                                 
 enough; we must                                                                                                
 apply. Willing is                                                                                              
 not enough; we                                                                                                 
 must do.                                                                                                       
                                                                                                                
                                                                                                                
 IBM Data Privacy                                                                                               
 Statement                                                                                                      
                                                                                                                
 IBM Deutschland                                                                                                
 Research &                                                                                                     
 Development                                                                                                    
 GmbH /                                                                                                         
 Vorsitzender des                                                                                               
 Aufsichtsrats:                                                                                                 
 Gregor Pillen                                                                                                  
 Gesch?ftsf?hrung:                                                                                              
 Dirk Wittkopp                                                                                                  
 Sitz der                                                                                                       
 Gesellschaft:                                                                                                  
 B?blingen /                                                                                                    
 Registergericht:                                                                                               
 Amtsgericht                                                                                                    
 Stuttgart, HRB                                                                                                 
 243294                                                                                                         
                                                                                                                

From:	Prasad Surampudi <prasad.surampudi at theatsgroup.com>
To:	"gpfsug-discuss at spectrumscale.org"
            <gpfsug-discuss at spectrumscale.org>
Date:	24.11.2020 17:05
Subject:	[EXTERNAL] [gpfsug-discuss] mmhealth reports fserrinvalid
            errors on CNFS	servers
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


We are seeing fserrinvalid error on couple of filesystems in Spectrum Scale
cluster. These errors are reported but mmhealth  only couple of nodes
(CNFS servers) in the cluster, but mmhealth on other nodes shows no issues.
Any idea what this error means? And why its reported on CNFS servers and
not on other nodes? What need to be done to fix this issue?

sudo /usr/lpp/mmfs/bin/mmhealth node show FILESYSTEM -v

Node name:      cnfs05-gpfs

Component      Status              Reasons
-------------------------------------------------------------------
FILESYSTEM     DEGRADED            fserrinvalid(vol)
  argus        HEALTHY             -
  dytech       HEALTHY             -
  enlnt_E      HEALTHY             -
  enlnt_Es     HEALTHY             -
  haaforfs     HEALTHY             -
  haaforfs2    HEALTHY             -
  historical   HEALTHY             -
  prcfs        HEALTHY             -
  qmtfs        HEALTHY             -
  research     HEALTHY             -
  research2    HEALTHY             -
  schon_raw    HEALTHY             -
  uhdb_vol1    HEALTHY             -
  vol          DEGRADED            fserrinvalid(vol)

Event                Parameter      Severity            Event Message
----------------------------------------------------------------------------------------------------------
fserrinvalid         vol            ERROR
FS=vol,ErrNo=1124,Unknown
error=0464000000010000000180A108BC000079B4000000000000003400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0006.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1D963707.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0007.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201124/fff781b9/attachment-0008.gif>

From jake.carroll at uq.edu.au  Wed Nov 25 21:29:24 2020
From: jake.carroll at uq.edu.au (Jake Carroll)
Date: Wed, 25 Nov 2020 21:29:24 +0000
Subject: [gpfsug-discuss] IB routers in ESS configuration + 3 different
	subnets - valid config?
Message-ID: <SYYP282MB12486F10A6010E8EC3C19B69D8FA0@SYYP282MB1248.AUSP282.PROD.OUTLOOK.COM>

Hi.

I am just in the process of sanity-checking a potential future configuration.

Let's say I have an ESS 5000 and an ESS 3000 placed on the data centre floor to form the basis of a new scratch array.

Let's then suppose that I have three existing supercomputers in that same location. Each of those supercomputers has a separate IB subnet and their networks are unrelated to each other, IB-wise.

My understanding is that it is valid and possible to use MLNX EDR IB *routers* in order to be able to transport NSD communications back and forth across those separate subnets, back to the ESS (which lives on its own unique subnet). So at this point, I've got four unique subnets - one for the ESS, one for each super. As I understand it, there is an upper limit of *SIX* unique subnets on those EDR IB routers.

As I understand it - for IPoIB transport, I'd also need some "gateway" boxes more or less - essentially some decent servers which I put EDR/HDR cards in as dog legs that act as an IPoIB gateway interface to each subnet.

I appreciate that there is devil in the detail - but what I'm asking is if it is valid to "route" NSD with IB Routers (not switches) this way to separate subnets.

Colleagues at IBM have all said "yeah....should work....we've not done it....but should be fine?"

Colleagues at Mellanox (uhhh...nvidia...) say "Yes, this is valid and does exactly as the IB Router should and there is nothing unusual about this".

If someone has experience doing this or could call out any oddity/weirdness/gotchas, I'd be very appreciative. I'm fairly sure this is all very low risk - but given nobody locally could tell me "Yeah, all certified and valid!" I'd like the wisdom of the wider crowd.

Thank you.

--jc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201125/f088aa8d/attachment-0002.htm>

From vpuvvada at in.ibm.com  Fri Nov 27 11:46:05 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 27 Nov 2020 17:16:05 +0530
Subject: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale
	over NFS?
In-Reply-To: <CAFaHHOzDk9rxZJSouJoHOm++Z0HuMKqUPrngKMwXOZknOS8Y+Q@mail.gmail.com>
References: <1388247256.209171.1605555854969@privateemail.com><OFE352EB0C.E2BD795B-ON00258622.00775784-00258622.00776E5D@notes.na.collabserv.com><OFFC0324D3.B14EB935-ON6525862A.000E7B4B-6525862A.000EC5E7@notes.na.collabserv.com>
	<CAFaHHOzDk9rxZJSouJoHOm++Z0HuMKqUPrngKMwXOZknOS8Y+Q@mail.gmail.com>
Message-ID: <OFE214CC39.AED577C8-ON6525862B.0045BE44-6525862D.0040A4A5@notes.na.collabserv.com>

Hi Yeep,

>If ACLs and other EAs migration from non scale is not supported by AFM, 
is there any 3rd party tool that could complement that when paired with 
AFM?

rsync can be used to just fix metadata like ACLs and EAs.  AFM does not 
revalidate the files with source system if rsync changes the ACLs on them. 
So ACLs can only be fixed after or during the cutover.  ACL inheritance 
may be used by setting on ACLs on required parent dirs upfront if this 
option is sufficient, there was an user who migrated to scale using this 
method.

~Venkat (vpuvvada at in.ibm.com)


From:   "T.A. Yeep" <yeep at robust.my>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:     gpfsug-discuss-bounces at spectrumscale.org
Date:   11/24/2020 07:40 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi Venkat,

If ACLs and other EAs migration from non scale is not supported by AFM, is 
there any 3rd party tool that could complement that when paired with AFM?

On Tue, Nov 24, 2020 at 10:41 AM Venkateswara R Puvvada <
vpuvvada at in.ibm.com> wrote:
AFM provides near zero downtime for migration.  As of today,  AFM 
migration does not support ACLs or other EAs migration from non scale 
(GPFS) source.

https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1ins_uc_migrationusingafmmigrationenhancements.htm


~Venkat (vpuvvada at in.ibm.com)


From:        "Frederick Stock" <stockf at us.ibm.com>
To:        gpfsug-discuss at spectrumscale.org
Cc:        gpfsug-discuss at spectrumscale.org
Date:        11/17/2020 03:14 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over        NFS?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Have you considered using the AFM feature of Spectrum Scale?  I doubt it 
will provide any speed improvement but it would allow for data to be 
accessed as it was being migrated.

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com
 
 
----- Original message -----
From: Andi Christiansen <andi at christiansen.xxx>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon 
to Scale over NFS?
Date: Mon, Nov 16, 2020 2:44 PM
 
Hi all,
 
i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes...
 
as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 
 
does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit...
 
 
if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Best regards 
T.A. Yeep
Mobile: +6-016-719 8506 | Tel: +6-03-7628 0526 | www.robusthpc.com


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201127/aaf80327/attachment-0002.htm>

From carlz at us.ibm.com  Mon Nov 30 13:49:12 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Mon, 30 Nov 2020 13:49:12 +0000
Subject: [gpfsug-discuss] Licensing costs for data lakes (SSUG follow-up)
Message-ID: <C85DC92F-1E02-4763-8E94-FF44DDC0AA6C@us.ibm.com>

I am seeking some help on a topic I know many of you care deeply about: licensing costs

I am trying to gather some more information about a request that has come up a couple of times, pricing for ?data lakes?. I would like to understand better what people are looking for here.

- Is it as simple as ?much steeper discounts for very large deployments?? Or is a ?data lake? something specific, e.g. a large deployment that is not performance/latency sensitive; a storage pool that is [primarily] HDD; a tier that has specific read/write patterns such as moving entire large datasets in or out; or something else? Bear in mind that if we have special licensing for data lakes, we need a rigorous definition so that both you and we know whether your use of that licensing is compliant. Nobody likes ambiguity in licensing!

- Are you expecting pricing to get very flat/discounting to get steep for large deployments? Or a different price tier/structure for ?data lakes? if we can rigorously define what one means? Do you agree or disagree with the proposition that if you keep adding storage hardware/capacity, that the software licensing cost should rise in proportion (even if that proportion is much smaller for a ?data lake? than for a performance tier)?

- Feel free to be creative and imaginative. For example, would you be interested in a low-cost pricing model for storage that is an AFM Home and is _only_ accessed by using AFM to move data in and out of an AFM Cache (probably on the performance tier)? This would be conceptually similar to the way you can now (5.1) use AFM-Object to park data in a cheap object store.

- Also feel free to answer questions I didn?t ask?


If you prefer to discuss this in Slack rather than email, I started a discussion there a little while ago (please thread your comments!): https://ssug-poweraiug.slack.com/archives/CEVVCEE8M/p1605815075188800


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_1545794140]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201130/71cf82fc/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201130/71cf82fc/attachment-0002.png>

From david_johnson at brown.edu  Mon Nov 30 21:41:30 2020
From: david_johnson at brown.edu (David Johnson)
Date: Mon, 30 Nov 2020 16:41:30 -0500
Subject: [gpfsug-discuss] internal details on GPFS inode expansion
Message-ID: <D0E447B2-778A-41D6-8282-9BDCEEF4A583@brown.edu>

When GPFS needs to add inodes to the filesystem, it seems to pre-create about 4 million of them.
Judging by the logs, it seems it only takes a few (13 maybe) seconds to do this.
However we are suspecting that this might only be to request the additional inodes and 
that there is some background activity for some time afterwards.  
Would someone who has knowledge of the actual internals be willing to confirm or deny this,
and if there is background activity, is it on all nodes in the cluster, NSD nodes, "default worker nodes"?

Thanks,
 -- ddj
Dave Johnson
ddj at brown.edu