[gpfsug-discuss] IO sizes

Wed Feb 23 18:39:07 GMT 2022

Hi,

Metadata I/Os will always be smaller than the usual data block size, right?
Which version of GPFS?

Regards,
Alex

On Wed, Feb 23, 2022 at 10:26 AM Uwe Falke <uwe.falke at kit.edu> wrote:

> Dear all,
>
> sorry for asking a question which seems not directly GPFS related:
>
> In a setup with 4 NSD servers (old-style, with storage controllers in
> the back end), 12 clients and 10 Seagate storage systems, I do see in
> benchmark tests that  just one of the NSD servers does send smaller IO
> requests to the storage  than the other 3 (that is, both reads and
> writes are smaller).
>
> The NSD servers form 2 pairs, each pair is connected to 5 seagate boxes
> ( one server to the controllers A, the other one to controllers B of the
> Seagates, resp.).
>
> All 4 NSD servers are set up similarly:
>
> kernel: 3.10.0-1160.el7.x86_64 #1 SMP
>
> HBA: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
>
> driver : mpt3sas 31.100.01.00
>
> max_sectors_kb=8192 (max_hw_sectors_kb=16383 , not 16384, as limited by
> mpt3sas) for all sd devices and all multipath (dm) devices built on top.
>
> scheduler: deadline
>
> multipath (actually we do have 3 paths to each volume, so there is some
> asymmetry, but that should not affect the IOs, shouldn't it?, and if it
> did we would see the same effect in both pairs of NSD servers, but we do
> not).
>
> All 4 storage systems are also configured the same way (2 disk groups /
> pools / declustered arrays, one managed by  ctrl A, one by ctrl B,  and
> 8 volumes out of each; makes altogether 2 x 8 x 10 = 160 NSDs).
>
>
> GPFS BS is 8MiB , according to iohistory (mmdiag) we do see clean IO
> requests of 16384 disk blocks (i.e. 8192kiB) from GPFS.
>
> The first question I have - but that is not my main one: I do see, both
> in iostat and on the storage systems, that the default IO requests are
> about 4MiB, not 8MiB as I'd expect from above settings (max_sectors_kb
> is really in terms of kiB, not sectors, cf.
> https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt).
>
> But what puzzles me even more: one of the server compiles IOs even
> smaller, varying between 3.2MiB and 3.6MiB mostly - both for reads and
> writes ... I just cannot see why.
>
> I have to suspect that this will (in writing to the storage) cause
> incomplete stripe writes on our erasure-coded volumes (8+2p)(as long as
> the controller is not able to re-coalesce the data properly; and it
> seems it cannot do it completely at least)
>
>
> If someone of you has seen that already and/or knows a potential
> explanation I'd be glad to learn about.
>
>
> And if some of you wonder: yes, I (was) moved away from IBM and am now
> at KIT.
>
> Many thanks in advance
>
> Uwe
>
>
> --
> Karlsruhe Institute of Technology (KIT)
> Steinbuch Centre for Computing (SCC)
> Scientific Data Management (SDM)
>
> Uwe Falke
>
> Hermann-von-Helmholtz-Platz 1, Building 442, Room 187
> D-76344 Eggenstein-Leopoldshafen
>
> Tel: +49 721 608 28024
> Email: uwe.falke at kit.edu
> www.scc.kit.edu
>
> Registered office:
> Kaiserstraße 12, 76131 Karlsruhe, Germany
>
> KIT – The Research University in the Helmholtz Association
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20220223/86ccfbd4/attachment-0002.htm>