[gpfsug-discuss] Long IO waiters and IBM Storwize V5030
Uwe Falke
UWEFALKE at de.ibm.com
Fri May 28 21:04:19 BST 2021
Hi, odd prefetch strategy would affect read performance, but write latency
is claimed to be even worse ...
Have you simply checked what the actual IO performance of the v5k box
under that load is and how it compares to its nominal performance and that
of its disks?
how is the storage organised? how many LUNs/NSDs, what RAID code (V5k
cannot do declustered RAID, can it?), any thin provisioning or other
gimmicks in the game?
what IO sizes ?
tons of things to look at.
Mit freundlichen Grüßen / Kind regards
Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation
Services
+49 175 575 2877 Mobile
Rochlitzer Str. 19, 09111 Chemnitz, Germany
uwefalke at de.ibm.com
IBM Services
IBM Data Privacy Statement
IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122
From: Jan-Frode Myklebust <janfrode at tanso.net>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 28/05/2021 19:50
Subject: [EXTERNAL] Re: [gpfsug-discuss] Long IO waiters and IBM
Storwize V5030
Sent by: gpfsug-discuss-bounces at spectrumscale.org
One thing to check: Storwize/SVC code will *always* guess wrong on
prefetching for GPFS. You can see this with having a lot higher read data
throughput on mdisk vs. on on vdisks in the webui. To fix it, disable
cache_prefetch with "chsystem -cache_prefetch off".
This being a global setting, you probably only should set it if the system
is only used for GPFS.
-jf
On Fri, May 28, 2021 at 5:58 PM Saula, Oluwasijibomi <
oluwasijibomi.saula at ndsu.edu> wrote:
Hi Folks,
So, we are experiencing some very long IO waiters in our GPFS cluster:
# mmdiag --waiters
=== mmdiag: waiters ===
Waiting 17.3823 sec since 10:41:01, monitored, thread 21761 NSDThread: for
I/O completion
Waiting 16.6140 sec since 10:41:02, monitored, thread 21730 NSDThread: for
I/O completion
Waiting 15.3004 sec since 10:41:03, monitored, thread 21763 NSDThread: for
I/O completion
Waiting 15.2013 sec since 10:41:03, monitored, thread 22175
However, GPFS support is pointing to our IBM Storwize V5030 disk system as
the source of latency. Unfortunately, we don't have paid support for the
system so we are polling for anyone who might be able to assist.
Does anyone by chance have any experience with IBM Storwize V5030 or
possess a problem determination guide for the V5030?
We've briefly reviewed the V5030 management portal, but we still haven't
identified a cause for the increased latencies (i.e. read ~129ms, write
~198ms).
Granted, we have some heavy client workloads, yet we seem to experience
this drastic drop in performance every couple of months, probably
exacerbated by heavy IO demands.
Any assistance would be much appreciated.
Thanks,
Oluwasijibomi (Siji) Saula
HPC Systems Administrator / Information Technology
Research 2 Building 220B / Fargo ND 58108-6050
p: 701.231.7749 / www.ndsu.edu
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
More information about the gpfsug-discuss
mailing list