[gpfsug-discuss] Slow performance on writes when using direct io

Olaf Weiser olaf.weiser at de.ibm.com
Tue Mar 12 12:12:24 GMT 2024


just for completeness, let's use this thread to document an undocumented parameter, which should not be used any more

disableDIO=yes  IS NOT disabling DIO 😉

The parameter is a bit misleading by its name.  DIRECT IO is  a hint from an application to bypass cache. However mostly it is expected by application programmers to have the IO safely on disk, when it gets ack'd
To make this behavior absolutely sure, there is the O_SYNC flag for writes to make that happen. However, is pretty common , just to expect DIRECT is similar seen as a synonym   more details here :https://man7.org/linux/man-pages/man2/open.2.html
However, GPFS will handle O_DIRECT similar to follow the expectation of programmers.

So in GPFS, for directIO (O_CIRECT) (by passing caches) we can benefit and use a so called optimized IO path .. which takes advantage of LINUX AIO...
however, there are multiple situation, where you can NOT write directly.. e.g. if the IO is not aligned or you want to append a file, when there so no block (no disk address yet allocated to that file) etc...Then you can't process the IO in this directIO .. optimized path
[[some other  aspects:
for data, which gets accessed w/o caching.. is that you disable prefetching 😉
and
more relevant here:   you need other (more efficient tokens) than for buffered writes]]

Lets say, an application appends a file (similar to create a file and write, ) Usually direct IOs are rather small. As long you write small IOs into an existing block, direct IO is fine - but then - if  you fill the last block fully with data, you need to allocate  a new block. You can't write directly DIRECT_IO .. to no-where 😉 So we need to allocate a new block (as any other file system as well) .
 This means for GPFS, that we need to leave the optimized (direct)IO path and allocate a new block by allocating according buffers first
When this is done, we finally  sync  the data from the "direct"-IO  before ack'in  the IO to follow the expected symantec of the application programmer, which obviously used the O_DIRECT .

For some workloads , which mostly create/write and append files, this is happen on each block .. so frequently... causing GPFS to change going in and out the optimized IO path, which causes also changing tokens
To avoid that, a very long time ago -disableDIO - was introduced as a quick and efficient work around.

in the meantime, there is an heuristic in our code, that automatically detects such cases and should NOT use this parameter disableDIO any more

Since GPFS 5.0.x  we introduced  dioSmallSeqWriteBatching=yes .  PLEASE use this parameter. By default, the optimization kicks in when we see three AIO/DIO writes that  are no larger 64k bytes each and no more than than one write length appart.
If you know, that your application does larger DIO writes..let us know, open a SF ticket, there are further options.


BACK to the origin question 🙂you may consider
--preallocate blocks to the file (s)
--double check "active snapshots" (copy on write for DIRECT IO is very expensive)
--adjust your block size / RAID config to lower write amplification
--check network rtt for token traffic !!!
--try to avoid HDD based backend, as they # IOPS is very limited
last but not least - talk to your application programmers ... 😉 if they really need, what they programmed 😉

________________________________
Von: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> im Auftrag von Uwe Falke <uwe.falke at kit.edu>
Gesendet: Dienstag, 12. März 2024 12:21
An: gpfsug-discuss at gpfsug.org <gpfsug-discuss at gpfsug.org>
Betreff: [EXTERNAL] Re: [gpfsug-discuss] Slow performance on writes when using direct io


Just thinking: an application should do direct IO for a good reason, and only then. "Forcing DIO" is probably not the right thing to do - rather check why an app does DIO and either change the app's behaviour if reasonable are maybe use a special  pool for it using mirrored SSDs or so.

BTW, the ESS have some nice mechanism to do small IOs (also direct ones I suppose) quickly by buffering them on flash/NVRAM  (where the data is considered persistently stored, hence the IO requests are completed quickly).

Uwe


On 12.03.24 11:59, Peter Hruška wrote:
Hello,

The direct writes are problematic on both writes and rewrites. Rewrites alone are another issue we have noticed.
Since indirect (direct=0) workloads are fine, it seems that the easiest solution could be to force indirect IO operations for all workloads. However we didn't find such possibility.


--


S přáním pěkného dne / Best regards

Mgr. Peter Hruška
IT specialista

M Computers s.r.o.
Úlehlova 3100/10, 628 00 Brno-Líšeň (mapa<https://mapy.cz/s/gafufehufe>)
T:+420 515 538 136
E: peter.hruska at mcomputers.cz<mailto:peter.hruska at mcomputers.cz>

www.mcomputers.cz<http://www.mcomputers.cz/>
www.lenovoshop.cz<http://www.lenovoshop.cz/>
[cid:04ece728-7e44-4ab2-b1c6-1928bb3c29ee]



On Tue, 2024-03-12 at 09:59 +0100, Zdenek Salvet wrote:
EXTERNÍ ODESÍLATEL


On Mon, Mar 11, 2024 at 01:21:32PM +0000, Peter Hruška wrote:
We encountered a problem with performance of writes on GPFS when the application uses direct io access. To simulate the issue it is enough to run fio with option direct=1. The performance drop is quite dramatic - 250 MiB/s vs. 2955 MiB/s. We've tried to instruct GPFS to ignore direct IO by using "disableDIO=yes". The directive didn't have any effect. Is there any possibility how to achieve that GPFS would ignore direct IO requests and use caching for everything?

Hello,
did you use pre-allocated file(s) (was it re-write) ?
libaio traffic is not really asynchronous with respect to necessary metadata
operations (allocating new space and writing allocation structures to disk)
in most Linux filesystems and I guess this case is not heavily optimized
in GPFS either (dioSmallSeqWriteBatching feature may help a little but
it targets different scenario I think).

Best regards,
Zdenek Salvet                                              salvet at ics.muni.cz<mailto:salvet at ics.muni.cz>
Institute of Computer Science of Masaryk University, Brno, Czech Republic
and CESNET, z.s.p.o., Prague, Czech Republic
Phone: ++420-549 49 6534                           Fax: ++420-541 212 747
----------------------------------------------------------------------------
      Teamwork is essential -- it allows you to blame someone else.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org


--
Karlsruhe Institute of Technology (KIT)
Scientific Computing Centre (SCC)
Scientific Data Management (SDM)

Uwe Falke

Hermann-von-Helmholtz-Platz 1, Building 442, Room 187
D-76344 Eggenstein-Leopoldshafen

Tel: +49 721 608 28024
Email: uwe.falke at kit.edu<mailto:uwe.falke at kit.edu>
www.scc.kit.edu<http://www.scc.kit.edu>

Registered office:
Kaiserstraße 12, 76131 Karlsruhe, Germany

KIT – The Research University in the Helmholtz Association

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240312/3fce4a9b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-fvfsez5b.png
Type: image/png
Size: 13955 bytes
Desc: Outlook-fvfsez5b.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240312/3fce4a9b/attachment-0001.png>


More information about the gpfsug-discuss mailing list