<div dir="ltr">You know I hadn't seen the original message when I commented earlier, sorry about that.<div><br></div><div>It may sound dumb but have you tried tuning downwards as Zdenek says? Try the storage on a single RAID group and a 1MB block size, and then move up from there. We're on practically ancient infrastructure compared to what you have and get throughput equivalent.</div><div><br></div><div><ul><li>Try returning the page pool to 8GB maybe you're prefetching TOO much data (this effect happens on Oracle when it is given too much memory everything slows to a crawl as it tries to keep every DB in memory).</li><li>Define your FS on a single RAID group, see what your max performance is (are you sure you're not wrapping any disk/controller access with so many platters being engaged adding latency?)</li><li>Tune your block size to 1MB. (the large block size could add latency depending on how the storage is handling that chunk)</li></ul></div><div>Once you've got a baseline on a more normal configuration you can scale up one variable at a time and see what gives better or worse performance. At IBM labs we were able to achieve 2.5GB/s on a single thread and 18GB/s to an ESS on a single Linux VM and our GPFS configuration was more or less baseline and with 1MB block size if I remember right.</div><div><br></div><div>I'm building on Zdenek's answer as it's likely the step I would take in this instance, look for areas where the scale of your configuration could be introducing latencies.</div><div><br></div><div>Alec</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 9, 2024 at 12:21 AM Zdenek Salvet <<a href="mailto:salvet@ics.muni.cz">salvet@ics.muni.cz</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Feb 08, 2024 at 02:59:15PM +0000, Michal Hruška wrote:<br>
> @Uwe<br>
> Using iohist we found out that gpfs is overloading one dm-device (it took about 500ms to finish IOs). We replaced the "problematic" dm-device (as we have enough drives to play with) for new one but the overloading issue just jumped to another dm-device.<br>
> We believe that this behaviour is caused by the gpfs but we are unable to locate the root cause of it.<br>
<br>
Hello,<br>
this behaviour could be caused by an assymmetry in data paths <br>
of your storage, relatively small imbalance can make request queue<br>
of a slightly slower disk grow seemingly unproportionally.<br>
<br>
In general, I think you need to scale your GPFS parameters down, not up,<br>
in order to force better write clustering and achieve top speed<br>
of rotational disks unless array controllers use huge cache memory.<br>
If you can change your benchmark workload, try synchronous writes<br>
(dd oflag=dsync ...).<br>
<br>
Best regards,<br>
Zdenek Salvet <a href="mailto:salvet@ics.muni.cz" target="_blank">salvet@ics.muni.cz</a> <br>
Institute of Computer Science of Masaryk University, Brno, Czech Republic<br>
and CESNET, z.s.p.o., Prague, Czech Republic<br>
Phone: ++420-549 49 6534 Fax: ++420-541 212 747<br>
----------------------------------------------------------------------------<br>
Teamwork is essential -- it allows you to blame someone else.<br>
<br>
<br>
_______________________________________________<br>
gpfsug-discuss mailing list<br>
gpfsug-discuss at <a href="http://gpfsug.org" rel="noreferrer" target="_blank">gpfsug.org</a><br>
<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org</a><br>
</blockquote></div>