[gpfsug-discuss] sequential I/O write - performance

Sat Feb 10 19:14:22 GMT 2024

You know I hadn't seen the original message when I commented earlier, sorry
about that.

It may sound dumb but have you tried tuning downwards as Zdenek says?  Try
the storage on a single RAID group and a 1MB block size, and then move up
from there.  We're on practically ancient infrastructure compared to what
you have and get throughput equivalent.

   - Try returning the page pool to 8GB maybe you're prefetching TOO much
   data (this effect happens on Oracle when it is given too much memory
   everything slows to a crawl as it tries to keep every DB in memory).
   - Define your FS on a single RAID group, see what your max performance
   is (are you sure you're not wrapping any disk/controller access with so
   many platters being engaged adding latency?)
   - Tune your block size to 1MB. (the large block size could add latency
   depending on how the storage is handling that chunk)

Once you've got a baseline on a more normal configuration you can scale up
one variable at a time and see what gives better or worse performance.  At
IBM labs we were able to achieve 2.5GB/s on a single thread and 18GB/s to
an ESS on a single Linux VM and our GPFS configuration was more or less
baseline and with 1MB block size if I remember right.

I'm building on Zdenek's answer as it's likely the step I would take in
this instance, look for areas where the scale of your configuration could
be introducing latencies.

Alec

On Fri, Feb 9, 2024 at 12:21 AM Zdenek Salvet <salvet at ics.muni.cz> wrote:

> On Thu, Feb 08, 2024 at 02:59:15PM +0000, Michal Hruška wrote:
> > @Uwe
> > Using iohist we found out that gpfs is overloading one dm-device (it
> took about 500ms to finish IOs). We replaced the "problematic" dm-device
> (as we have enough drives to play with) for new one but the overloading
> issue just jumped to another dm-device.
> > We believe that this behaviour is caused by the gpfs but we are unable
> to locate the root cause of it.
>
> Hello,
> this behaviour could be caused by an assymmetry in data paths
> of your storage, relatively small imbalance can make request queue
> of a slightly slower disk grow seemingly unproportionally.
>
> In general, I think you need to scale your GPFS parameters down, not up,
> in order to force better write clustering and achieve top speed
> of rotational disks unless array controllers use huge cache memory.
> If you can change your benchmark workload, try synchronous writes
> (dd oflag=dsync ...).
>
> Best regards,
> Zdenek Salvet
> salvet at ics.muni.cz
> Institute of Computer Science of Masaryk University, Brno, Czech Republic
> and CESNET, z.s.p.o., Prague, Czech Republic
> Phone: ++420-549 49 6534                           Fax: ++420-541 212 747
>
> ----------------------------------------------------------------------------
>       Teamwork is essential -- it allows you to blame someone else.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240210/594a2a71/attachment.htm>