[gpfsug-discuss] sequential I/O write - performance

Thu Feb 8 21:32:17 GMT 2024

Michal,

I think you need to revise your testing method. Let me explain.

Based on my understandings:

3 FE servers and one storage system

~4500 MiB/s from 8 RAID groups using XFS (one XFS per one RAID group) and parallel fio test.
 one FS accross all 8 RAID groups and we observed performance drop down to ~3300

The test you are running is a non-clustered fs versus a clustered fs.

XFS,

8 XFS filesystems.
Each FS has it own Array and independent Meta, not shared between nodes
Array will see sequential IO for each array and will be able to aggregate IO’s and prefetch on write.
No lock traffic between nodes
Didn’t mention for the FIO runs is this one node or the three nodes with fs’s spread across?

Clusterd,

1 Filesystem (fs0) In this case
Parallel Filsystem with shared Meta and access
Lock and Meta traffic across nodes
GPFS Stripes across NSD, 8 in this case.  
Each FIO stream will in a less sequential stream at the array level
The LBA will be spread causing the array to work harder
Array logics will not see this a sequential and delivery a much lower performance from a sequential point of view.

What to do,

Try 
8 FS with your FIO test like XFS test
1 FS 1 Array and matching 1 FIO ( then x8 result)

PS: You haven’t mention the type of array used? Sometimes the following is important.

Disable prefetch at the array.  This causes the array to sometimes over work it backend due to incorrectly fetching data that is never used causing xtra io and cache displacement.  Ie GPFS aggressively prefetches which triggers the array to do further prefetch and both are not used.

Dale

> On 9 Feb 2024, at 6:30 am, Alec <anacreo at gmail.com> wrote:
> 
> This won't affect your current issue but if you're doing a lot of large sequential IO you may want to consider setting prefetchPct to 40 to 60 percent instead of default 20%.  In our environment that has measurable impact, but we have a lot less ram than you do in the pagepool (8g versus 64g).
> 
> Also do you have a dedicated meta pool?  If not that could be a source of contention.  Highly recommend a small pinnable LUN or two as a dedicated meta pool.
> 
> Alec
> 
> On Thu, Feb 8, 2024, 7:01 AM Michal Hruška <Michal.Hruska at mcomputers.cz <mailto:Michal.Hruska at mcomputers.cz>> wrote:
>> @Aaron
>> 
>> Yes, I can confirm that 2MB blocks are transfered over.
>> 
>> 
>> @ Jan-Frode
>> 
>> We tried to change multiple parameters, but if you know the best combination for sequential IO, please let me know.
>> 
>>  
>> 
>> #mmlsconfig
>> 
>> autoload no
>> 
>> dmapiFileHandleSize 32
>> 
>> minReleaseLevel 5.1.9.0
>> 
>> tscCmdAllowRemoteConnections no
>> 
>> ccrEnabled yes
>> 
>> cipherList AUTHONLY
>> 
>> sdrNotifyAuthEnabled yes
>> 
>> pagepool 64G
>> 
>> maxblocksize 16384K
>> 
>> maxMBpS 40000
>> 
>> maxReceiverThreads 32
>> 
>> nsdMaxWorkerThreads 512
>> 
>> nsdMinWorkerThreads 8
>> 
>> nsdMultiQueue 256
>> 
>> nsdSmallThreadRatio 0
>> 
>> nsdThreadsPerQueue 3
>> 
>> prefetchAggressiveness 2
>> 
>> adminMode central
>> 
>>  
>> 
>> /dev/fs0
>> 
>> 
>> @Uwe
>> 
>> Using iohist we found out that gpfs is overloading one dm-device (it took about 500ms to finish IOs). We replaced the „problematic“ dm-device (as we have enough drives to play with) for new one but the overloading issue just jumped to another dm-device.
>> We believe that this behaviour is caused by the gpfs but we are unable to locate the root cause of it.
>> 
>>  
>> 
>> Best,
>> Michal
>> 
>>  
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org <http://gpfsug.org/>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240209/fd4dde29/attachment-0001.htm>