<span style=" font-size:12pt;font-family:Arial">Thanks for the information.

 Since the waiters information is from one of the IO servers then

the threads waiting for IO should be waiting for actual IO requests to

the storage.  Seeing IO operations taking seconds long generally indicates

your storage is not working optimally.  We would expect IOs to complete

in sub-second time, as in some number of milliseconds.</span><br><br><span style=" font-size:12pt;font-family:Arial">You are using a record

size of 16M yet you stated the file system block size is 1M.  Is that

really what you wanted to test?  Also, you have included the -fsync

option to gpfsperf which will impact the results.</span><br><br><span style=" font-size:12pt;font-family:Arial">Have you considered

using the nsdperf program instead of the gpfsperf program?  You can

find nsdperf in the samples/net directory.</span><br><br><span style=" font-size:12pt;font-family:Arial">One last thing I noticed

was in the configuration of your management node.  It showed the following.</span><br><br><span style=" font-size:10pt;font-family:Tahoma">[merlindssmgt01,dssg]<br>prefetchPct 20<br>nsdRAIDTracks 128k<br>nsdMaxWorkerThreads 3k<br>nsdMinWorkerThreads 3k</span><br><br><span style=" font-size:12pt;font-family:Arial">To my understanding

the management node has no direct access to the storage, that is any IO

requests to the file system from the management node go through the IO

nodes.  That being true GPFS will not make use of NSD worker threads

on the management node.  As you can see your configuration is creating

3K NSD worker threads and none will be used so you might want to consider

changing that value to 1.  It will not change your performance numbers

but it should free up a bit of memory on the management node.</span><br><br><span style=" font-size:10pt;font-family:sans-serif">Regards, The Spectrum

Scale (GPFS) team<br><br>------------------------------------------------------------------------------------------------------------------<br>If you feel that your question can benefit other users of  Spectrum

Scale (GPFS), then please post it to the public IBM developerWroks Forum

at <a href="https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479"><span style=" font-size:10pt;font-family:sans-serif">https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479</a><span style=" font-size:10pt;font-family:sans-serif">.

<br><br>If your query concerns a potential software error in Spectrum Scale (GPFS)

and you have an IBM software maintenance contract please contact  1-800-237-5511

in the United States or your local IBM Service Center in other countries.

<br><br>The forum is informally monitored as time permits and should not be used

for priority messages to the Spectrum Scale (GPFS) team.</span><br><br><br><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">From:

       </span><span style=" font-size:9pt;font-family:sans-serif">"Caubet

Serrabou Marc (PSI)" <marc.caubet@psi.ch> <span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">To:

       </span><span style=" font-size:9pt;font-family:sans-serif">gpfsug

main discussion list <gpfsug-discuss@spectrumscale.org></span><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">Cc:

       </span><span style=" font-size:9pt;font-family:sans-serif">"gpfsug-discuss-bounces@spectrumscale.org"

<gpfsug-discuss-bounces@spectrumscale.org></span><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">Date:

       </span><span style=" font-size:9pt;font-family:sans-serif">04/18/2019

01:45 PM</span><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">Subject:

       </span><span style=" font-size:9pt;font-family:sans-serif">Re:

[gpfsug-discuss] Performance problems + (MultiThreadWorkInstanceCond),

reason 'waiting for helper threads'</span><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">Sent

by:        </span><span style=" font-size:9pt;font-family:sans-serif">gpfsug-discuss-bounces@spectrumscale.org</span><br><hr noshade><br><br><br><span style=" font-size:10pt;font-family:Tahoma">Hi,</span><br><br><span style=" font-size:10pt;font-family:Tahoma">thanks a lot. About

the requested information:</span><br><br><span style=" font-size:10pt;font-family:Tahoma">* Waiters were captured

with the command 'mmdiag --waiters', and it was performed on one of the

IO (NSD) nodes.</span><br><span style=" font-size:10pt;font-family:Tahoma">* Connection between

storage and client clusters is with Infiniband EDR. For the GPFS client

cluster we have 3 chassis, each one has 24 blades with unmanaged EDR switch

(24 for the blades, 12 external), and currently 10 EDR external ports are

connected for external connectivity. On the other hand, the GPFS storage

cluster has 2 IO nodes (as commented in the previous e-mail, DSS G240).

Each IO node has connected 4 x EDR ports. Regarding the Infiniband connectivty,

my network contains 2 top EDR managed switches configured with up/down

routing, connecting the unmanaged switches from the chassis and the 2 managed

Infiniband switches for the storage (for redundancy).</span><br><br><span style=" font-size:10pt;font-family:Tahoma">Whenever needed I

can go through PMR if this would easy the debug, no problem for me. I was

wondering about the meaning "waiting for helper threads" and

what could be the reason for that </span><br><br><span style=" font-size:10pt;font-family:Tahoma">Thanks a lot for your

help and best regards,</span><br><span style=" font-size:10pt;font-family:Tahoma">Marc    

           </span><br><span style=" font-size:10pt;font-family:Tahoma">_________________________________________<br>Paul Scherrer Institut <br>High Performance Computing<br>Marc Caubet Serrabou<br>Building/Room: WHGA/019A</span><br><span style=" font-size:10pt;font-family:Tahoma">Forschungsstrasse,

111</span><br><span style=" font-size:10pt;font-family:Tahoma">5232 Villigen PSI<br>Switzerland<br><br>Telephone: +41 56 310 46 67<br>E-Mail: marc.caubet@psi.ch</span><br><hr><br><span style=" font-size:10pt;font-family:Tahoma"><b>From:</b> gpfsug-discuss-bounces@spectrumscale.org

[gpfsug-discuss-bounces@spectrumscale.org] on behalf of IBM Spectrum Scale

[scale@us.ibm.com]<b><br>Sent:</b> Thursday, April 18, 2019 5:54 PM<b><br>To:</b> gpfsug main discussion list<b><br>Cc:</b> gpfsug-discuss-bounces@spectrumscale.org<b><br>Subject:</b> Re: [gpfsug-discuss] Performance problems + (MultiThreadWorkInstanceCond),

reason 'waiting for helper threads'</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><br><span style=" font-size:12pt;font-family:Arial">We can try to provide

some guidance on what you are seeing but generally to do true analysis

of performance issues customers should contact IBM lab based services (LBS).

 We need some additional information to understand what is happening.</span><ul><li><span style=" font-size:12pt;font-family:Arial">On which node did you

collect the waiters and what command did you run to capture the data?</span><li><span style=" font-size:12pt;font-family:Arial">What is the network

connection between the remote cluster and the storage cluster?</span></ul><span style=" font-size:10pt;font-family:sans-serif"><br>Regards, The Spectrum Scale (GPFS) team<br><br>------------------------------------------------------------------------------------------------------------------<br>If you feel that your question can benefit other users of  Spectrum

Scale (GPFS), then please post it to the public IBM developerWroks Forum

at <a href="https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479" target="_blank"><span style=" font-size:10pt;color:blue;font-family:sans-serif">https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479</a><span style=" font-size:10pt;font-family:sans-serif">.

<br><br>If your query concerns a potential software error in Spectrum Scale (GPFS)

and you have an IBM software maintenance contract please contact  1-800-237-5511

in the United States or your local IBM Service Center in other countries.

<br><br>The forum is informally monitored as time permits and should not be used

for priority messages to the Spectrum Scale (GPFS) team.</span><span style=" font-size:12pt;font-family:Times New Roman"><br><br><br></span><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif"><br>From:        </span><span style=" font-size:9pt;font-family:sans-serif">"Caubet

Serrabou Marc (PSI)" <marc.caubet@psi.ch></span><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif"><br>To:        </span><span style=" font-size:9pt;font-family:sans-serif">gpfsug

main discussion list <gpfsug-discuss@spectrumscale.org></span><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif"><br>Date:        </span><span style=" font-size:9pt;font-family:sans-serif">04/18/2019

11:41 AM</span><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif"><br>Subject:        </span><span style=" font-size:9pt;font-family:sans-serif">[gpfsug-discuss]

Performance problems + (MultiThreadWorkInstanceCond), reason 'waiting for

helper threads'</span><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif"><br>Sent by:        </span><span style=" font-size:9pt;font-family:sans-serif">gpfsug-discuss-bounces@spectrumscale.org</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><hr noshade><span style=" font-size:12pt;font-family:Times New Roman"><br><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>Hi all,</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>I would like to have some hints about the following problem:</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>Waiting 26.6431 sec since 17:18:32, ignored, thread 38298 NSPDDiscoveryRunQueueThread:

on ThCond 0x7FC98EB6A2B8 (MultiThreadWorkInstanceCond), reason 'waiting

for helper threads'<br>Waiting 2.7969 sec since 17:18:55, monitored, thread 39736 NSDThread: for

I/O completion<br>Waiting 2.8024 sec since 17:18:55, monitored, thread 39580 NSDThread: for

I/O completion<br>Waiting 3.0435 sec since 17:18:55, monitored, thread 39448 NSDThread: for

I/O completion</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>I am testing a new GPFS cluster (GPFS cluster client with computing nodes

remotely mounting the Storage GPFS Cluster) and I am running 65 gpfsperf

commands (1 command per client in parallell) as follows:</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>/usr/lpp/mmfs/samples/perf/gpfsperf create seq /gpfs/home/caubet_m/gpfsperf/$(hostname).txt

-fsync -n 24g -r 16m -th 8 </span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>I am unable to reach more than 6.5GBps (Lenovo DSS G240 GPFS 5.0.2-1, on

a testing a 'home' filesystem with 1MB blocksize and subblocks of 8KB).

After several seconds I see many waiters for I/O completion (up to 5 seconds)<br>and also the 'waiting for helper threads' message shown above. Can somebody

explain me the meaning for this message? How could I improve that?</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>Current config in the storage cluster is:</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>[root@merlindssio02 ~]# mmlsconfig <br>Configuration data for cluster merlin.psi.ch:<br>---------------------------------------------<br>clusterName merlin.psi.ch<br>clusterId 1511090979434548295<br>autoload no<br>dmapiFileHandleSize 32<br>minReleaseLevel 5.0.2.0<br>ccrEnabled yes<br>nsdRAIDFirmwareDirectory /opt/lenovo/dss/firmware<br>cipherList AUTHONLY<br>maxblocksize 16m<br>[merlindssmgt01]<br>ignorePrefetchLUNCount yes<br>[common]<br>pagepool 4096M<br>[merlindssio01,merlindssio02]<br>pagepool 270089M<br>[merlindssmgt01,dssg]<br>pagepool 57684M<br>maxBufferDescs 2m<br>numaMemoryInterleave yes<br>[common]<br>prefetchPct 50<br>[merlindssmgt01,dssg]<br>prefetchPct 20<br>nsdRAIDTracks 128k<br>nsdMaxWorkerThreads 3k<br>nsdMinWorkerThreads 3k<br>nsdRAIDSmallThreadRatio 2<br>nsdRAIDThreadsPerQueue 16<br>nsdClientCksumTypeLocal ck64<br>nsdClientCksumTypeRemote ck64<br>nsdRAIDFlusherFWLogHighWatermarkMB 1000<br>nsdRAIDBlockDeviceMaxSectorsKB 0<br>nsdRAIDBlockDeviceNrRequests 0<br>nsdRAIDBlockDeviceQueueDepth 0<br>nsdRAIDBlockDeviceScheduler off<br>nsdRAIDMaxPdiskQueueDepth 128<br>nsdMultiQueue 512<br>verbsRdma enable<br>verbsPorts mlx5_0/1 mlx5_1/1<br>verbsRdmaSend yes<br>scatterBufferSize 256K<br>maxFilesToCache 128k<br>maxMBpS 40000<br>workerThreads 1024<br>nspdQueues 64<br>[common]<br>subnets 192.168.196.0/merlin-hpc.psi.ch;merlin.psi.ch<br>adminMode central<br><br>File systems in cluster merlin.psi.ch:<br>--------------------------------------<br>/dev/home<br>/dev/t16M128K<br>/dev/t16M16K<br>/dev/t1M8K<br>/dev/t4M16K<br>/dev/t4M32K<br>/dev/test</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>And for the computing cluster:</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>[root@merlin-c-001 ~]# mmlsconfig <br>Configuration data for cluster merlin-hpc.psi.ch:<br>-------------------------------------------------<br>clusterName merlin-hpc.psi.ch<br>clusterId 14097036579263601931<br>autoload yes<br>dmapiFileHandleSize 32<br>minReleaseLevel 5.0.2.0<br>ccrEnabled yes<br>cipherList AUTHONLY<br>maxblocksize 16M<br>numaMemoryInterleave yes<br>maxFilesToCache 128k<br>maxMBpS 20000<br>workerThreads 1024<br>verbsRdma enable<br>verbsPorts mlx5_0/1<br>verbsRdmaSend yes<br>scatterBufferSize 256K<br>ignorePrefetchLUNCount yes<br>nsdClientCksumTypeLocal ck64<br>nsdClientCksumTypeRemote ck64<br>pagepool 32G<br>subnets 192.168.196.0/merlin-hpc.psi.ch;merlin.psi.ch<br>adminMode central<br><br>File systems in cluster merlin-hpc.psi.ch:<br>------------------------------------------<br>(none)</span><span style=" font-size:12pt;font-family:Times New Roman"><br></span><span style=" font-size:10pt;font-family:Tahoma"><br>Thanks a lot and best regards,<br>Marc                <br>_________________________________________<br>Paul Scherrer Institut <br>High Performance Computing<br>Marc Caubet Serrabou<br>Building/Room: WHGA/019A<br>Forschungsstrasse, 111<br>5232 Villigen PSI<br>Switzerland<br><br>Telephone: +41 56 310 46 67<br>E-Mail: marc.caubet@psi.ch</span><span style=" font-size:10pt;font-family:Times New Roman">_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org</span><span style=" font-size:12pt;color:blue;font-family:Times New Roman"><u><br></u></span><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" target="_blank"><span style=" font-size:10pt;color:blue;font-family:Times New Roman"><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></span></a><span style=" font-size:12pt;font-family:Times New Roman"><br><br><br></span><tt><span style=" font-size:10pt">_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></span></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><span style=" font-size:10pt">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</span></tt></a><tt><span style=" font-size:10pt"><br></span></tt><br><br><BR>