[gpfsug-discuss] sequential I/O write - performance

Wed Feb 14 18:47:20 GMT 2024

Hi Michal,

How are you ?

Can you also tell:

  1.  How much luns are allocate for the GPFS File-System from Storage (minimum is 16) ?
  2.  What is the Block-size defining in the GPFS FS ?
  3.  How much pools you have in the FS ?
  4.  Does all tests are run on single server ?
Regards

Yaron Daniel
94 Em Ha'Moshavot Rd
[cid:image001.png at 01DA5F86.FF6F94D0]
Storage and Cloud Consultant
Petach Tiqva, 49527
Technology Services
IBM Technology Lifecycle Service
Israel

Phone:
+972-3-916-5672

Fax:
+972-3-916-5672

Mobile:
+972-52-8395593

e-mail:
yard at il.ibm.com<mailto:yard at il.ibm.com>

Webex:            https://ibm.webex.com/meet/yard<webex:%20%20%20%20%20%20%20%20%20%20%20%20https://ibm.webex.com/meet/yard>
IBM Israel<webex:%20%20%20%20%20%20%20%20%20%20%20%20%20https://ibm.webex.com/meet/yard%0dIBM%20Israel>

From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> On Behalf Of Michal Hruška
Sent: Wednesday, 14 February 2024 20:29
To: gpfsug-discuss at gpfsug.org
Subject: [EXTERNAL] Re: [gpfsug-discuss] sequential I/O write - performance

Dear friends, Thank you all for your time and thoughts/ideas! The main goal for sharing our test results comparing XFS and GPFS was to show, that the storage subsystem is able to do better if the I/O is provided in different way. We were not
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
    Report Suspicious  <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!2e-rYp51ahlw0GUbEivFBBHwoZM0xRLp7zx9uoYh3vwIvX_PsvdQxWgLetD5R1ZaeiVSdoVnUTPHHILYFoplm1IzLjvECA$>   ‌
ZjQcmQRYFpfptBannerEnd
Dear friends,

Thank you all for your time and thoughts/ideas!
The main goal for sharing our test results comparing XFS and GPFS was to show, that the storage subsystem is able to do better if the I/O is provided in different way. We were not trying to compare XFS and GPFS directly, we understand that there will be some performance drop using GPFS (compared to “raw” performance) but we are just surprised by the ~20-25% performance drop.

We tried to change multiple suggested parameters but we got no performance gain. As there was no change we tried to do more troubleshooting using different configurations.
To better understand what we tried I have to describe our environment a bit more:
Our underlying storage system is IBM FS7300 (each controller has 384 GB cache). There are 8 DRAIDs (8+2+1). Each DRAID has its own pool and each pool has one Volume (LUN). Every FE server (we have 3 of them) is connected directly to this storage using two 32 GFC connections. 3 client servers and FE servers are connected to LAN switch using 100GbE connection.
Testing results (metadata are located on NVMe SSD DRAID):

  1.  We used second - identical storage to test the performance but we are getting almost the same results compared to first storage. In iohist we can see that one LUN (dm-device) is probably overloaded as IO time is high – from 300 to 500 ms.
  2.  Using both storage systems together in one big FS (GPFS): always is only one dm-device slow (according to iohist output) but the “problematic” dm-device changes in time.
  3.  During out tests we also tried synchronous fio test but we observed significant performance drop.
  4.  We tried to compare single LUN performance GPFS against XFS: GPFS 435MB/s compared to XFS 485MB/s. From single server. The drop is not so significant but when we added more LUNs to the comparison the performance drop was more painful.
For this testing “session” we were able to gather data by Storage Insights to check storage performance:

  1.  There is no problematic HDD – the worst latency seen is 42ms from all 176 drives in two storage systems. Average latency is 15ms.
  2.  CPU usage was at 25% max.
  3.  “Problematic” DRAID latency – average is 16ms the worst is 430ms. I can not tell if there was the same peak in latency during XFS tests but I think that no (or not so bad) – as the XFS is able to perform better than GPFS.
  4.  During our tests the write cache for all pools was fully allocated. Both for XFS and GPFS tests. Which is expected state as the cache is much faster than HDDs and it should help organize writes before they are forwarded to RAID groups.

Do you see some other possible problems we missed?
I do not want to leave it behind “unfinished” but I am out of ideas. 😊

Best,
Michal
From: Michal Hruška
Sent: Thursday, February 8, 2024 3:59 PM
To: 'gpfsug-discuss at gpfsug.org' <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Subject: Re: [gpfsug-discuss] sequential I/O write - performance

@Aaron
Yes, I can confirm that 2MB blocks are transfered over.
@ Jan-Frode
We tried to change multiple parameters, but if you know the best combination for sequential IO, please let me know.

#mmlsconfig
autoload no
dmapiFileHandleSize 32
minReleaseLevel 5.1.9.0
tscCmdAllowRemoteConnections no
ccrEnabled yes
cipherList AUTHONLY
sdrNotifyAuthEnabled yes
pagepool 64G
maxblocksize 16384K
maxMBpS 40000
maxReceiverThreads 32
nsdMaxWorkerThreads 512
nsdMinWorkerThreads 8
nsdMultiQueue 256
nsdSmallThreadRatio 0
nsdThreadsPerQueue 3
prefetchAggressiveness 2
adminMode central

/dev/fs0
@Uwe
Using iohist we found out that gpfs is overloading one dm-device (it took about 500ms to finish IOs). We replaced the „problematic“ dm-device (as we have enough drives to play with) for new one but the overloading issue just jumped to another dm-device.
We believe that this behaviour is caused by the gpfs but we are unable to locate the root cause of it.

Best,
Michal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240214/5e488c59/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1049 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240214/5e488c59/attachment-0001.png>