[gpfsug-discuss] Long IO waiters and IBM Storwize V5030

Fri May 28 21:04:19 BST 2021

Hi, odd prefetch strategy would affect read performance, but write latency 
is claimed to be even worse ...
Have you simply checked what the actual IO performance of the v5k box 
under that load is and how it compares to its nominal performance and that 
of its disks?
how is the storage organised? how many LUNs/NSDs, what RAID code (V5k 
cannot do declustered RAID, can it?), any thin provisioning or other 
gimmicks in the game?
what IO sizes ?
tons of things to look at. 

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rochlitzer Str. 19, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122

From:   Jan-Frode Myklebust <janfrode at tanso.net>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   28/05/2021 19:50
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Long IO waiters and IBM 
Storwize V5030
Sent by:        gpfsug-discuss-bounces at spectrumscale.org

One thing to check: Storwize/SVC code will *always* guess wrong on 
prefetching for GPFS. You can see this with having a lot higher read data 
throughput on mdisk vs. on on vdisks in the webui. To fix it, disable 
cache_prefetch with "chsystem -cache_prefetch off".

This being a global setting, you probably only should set it if the system 
is only used for GPFS. 

   -jf

On Fri, May 28, 2021 at 5:58 PM Saula, Oluwasijibomi <
oluwasijibomi.saula at ndsu.edu> wrote:
Hi Folks,

So, we are experiencing some very long IO waiters in our GPFS cluster:

#  mmdiag --waiters 

=== mmdiag: waiters ===
Waiting 17.3823 sec since 10:41:01, monitored, thread 21761 NSDThread: for 
I/O completion
Waiting 16.6140 sec since 10:41:02, monitored, thread 21730 NSDThread: for 
I/O completion
Waiting 15.3004 sec since 10:41:03, monitored, thread 21763 NSDThread: for 
I/O completion
Waiting 15.2013 sec since 10:41:03, monitored, thread 22175 

However, GPFS support is pointing to our IBM Storwize V5030 disk system as 
the source of latency. Unfortunately, we don't have paid support for the 
system so we are polling for anyone who might be able to assist.

Does anyone by chance have any experience with IBM Storwize V5030 or 
possess a problem determination guide for the V5030?

We've briefly reviewed the V5030 management portal, but we still haven't 
identified a cause for the increased latencies (i.e. read ~129ms, write 
~198ms). 

Granted, we have some heavy client workloads, yet we seem to experience 
this drastic drop in performance every couple of months, probably 
exacerbated by heavy IO demands.

Any assistance would be much appreciated.

Thanks,

Oluwasijibomi (Siji) Saula
HPC Systems Administrator  /  Information Technology

Research 2 Building 220B / Fargo ND 58108-6050
p: 701.231.7749 / www.ndsu.edu

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss