[gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Aaron Knister aaron.knister at gmail.com
Fri Jun 12 14:25:15 BST 2020


I would double check your cpu frequency scaling settings in your NSD servers (cpupower frequency-info) and look at the governor. You’ll want it to be the performance governor. If it’s not what can happen is the CPUs scale back their clock rate which hurts RDMA performance. Running the I/o test on the NSD servers themselves may have been enough to kick the processors up into a higher frequency which afforded you good performance. 

Sent from my iPhone

> On Jun 12, 2020, at 00:19, Luis Bolinches <luis.bolinches at fi.ibm.com> wrote:
> 
> 
> Hi
>  
> the block for writes increases the IOPS on those cards that might be already at the limit so I would not discard taht lowering the IOPS for writes has a positive effect on reads or not but it is a smoking gun that needs to be addressed. My experience of ignoring those is not a positive one.
>  
> In regards of this HW I woudl love to see a baseline at RAW. run FIO (or any other tool that is not DD) on RAW device (not scale) to see what actually each drive can do AND then all the drives at the same time. We seen RAID controllers got to its needs even on reads when parallel access to many drives are put into the RAID controller. That is why we had to create a tool to get KPIs for ECE but can be applied here as way to see what the system can do. I would build numbers for RAW before I start looking into any filesystem numbers.
>  
> you can use whatever tool you like but this one if just a FIO frontend that will do what I mention above https://github.com/IBM/SpectrumScale_ECE_STORAGE_READINESS. If you can I would also do the write part, as reads is part of the story, and you need to understand what the HW can do (+1 to Lego comment before)
> --
> Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions
> Luis Bolinches
> Consultant IT Specialist
> IBM Spectrum Scale development
> ESS & client adoption teams
> Mobile Phone: +358503112585
>  
> https://www.youracclaim.com/user/luis-bolinches
>  
> Ab IBM Finland Oy
> Laajalahdentie 23
> 00330 Helsinki
> Uusimaa - Finland
> 
> "If you always give you will always have" --  Anonymous
>  
>  
>  
> ----- Original message -----
> From: "Uwe Falke" <UWEFALKE at de.ibm.com>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc: gpfsug-discuss-bounces at spectrumscale.org, Agostino Funel <agostino.funel at enea.it>
> Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN
> Date: Thu, Jun 11, 2020 23:42
>  
> Hi Giovanni, how do the waiters look on your clients when reading?
> 
> 
> Mit freundlichen Grüßen / Kind regards
> 
> Dr. Uwe Falke
> IT Specialist
> Global Technology Services / Project Services Delivery / High Performance
> Computing
> +49 175 575 2877 Mobile
> Rathausstr. 7, 09111 Chemnitz, Germany
> uwefalke at de.ibm.com
> 
> IBM Services
> 
> IBM Data Privacy Statement
> 
> IBM Deutschland Business & Technology Services GmbH
> Geschäftsführung: Dr. Thomas Wolter, Sven Schooss
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
> 
> 
> 
> From:   Giovanni Bracco <giovanni.bracco at enea.it>
> To:     gpfsug-discuss at spectrumscale.org
> Cc:     Agostino Funel <agostino.funel at enea.it>
> Date:   05/06/2020 14:22
> Subject:        [EXTERNAL] [gpfsug-discuss] very low read performance in
> simple spectrum scale/gpfs cluster with a storage-server SAN
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> In our lab we have received two storage-servers, Super micro
> SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID
> controller (2 GB cache) and before putting them in production for other
> purposes we have setup a small GPFS test cluster to verify if they can
> be used as storage (our gpfs production cluster has the licenses based
> on the NSD sockets, so it would be interesting to expand the storage
> size just by adding storage-servers in a infiniband based SAN, without
> changing the number of NSD servers)
> 
> The test cluster consists of:
> 
> 1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale each.
> 2) a Mellanox FDR switch used as a SAN switch
> 3) a Truescale QDR switch as GPFS cluster switch
> 4) two GPFS clients (Supermicro AMD nodes) one port QDR each.
> 
> All the nodes run CentOS 7.7.
> 
> On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been
> configured and it is exported via infiniband as an iSCSI target so that
> both appear as devices accessed by the srp_daemon on the NSD servers,
> where multipath (not really necessary in this case) has been configured
> for these two LIO-ORG devices.
> 
> GPFS version 5.0.4-0 has been installed and the RDMA has been properly
> configured
> 
> Two NSD disk have been created and a GPFS file system has been configured.
> 
> Very simple tests have been performed using lmdd serial write/read.
> 
> 1) storage-server local performance: before configuring the RAID6 volume
> as NSD disk, a local xfs file system was created and lmdd write/read
> performance for 100 GB file was verified to be about 1 GB/s
> 
> 2) once the GPFS cluster has been created write/read test have been
> performed directly from one of the NSD server at a time:
> 
> write performance 2 GB/s, read performance 1 GB/s for 100 GB file
> 
> By checking with iostat, it was observed that the I/O in this case
> involved only the NSD server where the test was performed, so when
> writing, the double of base performances was obtained,  while in reading
> the same performance as on a local file system, this seems correct.
> Values are stable when the test is repeated.
> 
> 3) when the same test is performed from the GPFS clients the lmdd result
> for a 100 GB file are:
> 
> write - 900 MB/s and stable, not too bad but half of what is seen from
> the NSD servers.
> 
> read - 30 MB/s to 300 MB/s: very low and unstable values
> 
> No tuning of any kind in all the configuration of the involved system,
> only default values.
> 
> Any suggestion to explain the very bad  read performance from a GPFS
> client?
> 
> Giovanni
> 
> here are the configuration of the virtual drive on the storage-server
> and the file system configuration in GPFS
> 
> 
> Virtual drive
> ==============
> 
> Virtual Drive: 2 (Target Id: 2)
> Name                :
> RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
> Size                : 81.856 TB
> Sector Size         : 512
> Is VD emulated      : Yes
> Parity Size         : 18.190 TB
> State               : Optimal
> Strip Size          : 256 KB
> Number Of Drives    : 11
> Span Depth          : 1
> Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
> Bad BBU
> Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
> Bad BBU
> Default Access Policy: Read/Write
> Current Access Policy: Read/Write
> Disk Cache Policy   : Disabled
> 
> 
> GPFS file system from mmlsfs
> ============================
> 
> mmlsfs vsd_gexp2
> flag                value                    description
> ------------------- ------------------------
> -----------------------------------
>   -f                 8192                     Minimum fragment
> (subblock) size in bytes
>   -i                 4096                     Inode size in bytes
>   -I                 32768                    Indirect block size in bytes
>   -m                 1                        Default number of metadata
> replicas
>   -M                 2                        Maximum number of metadata
> replicas
>   -r                 1                        Default number of data
> replicas
>   -R                 2                        Maximum number of data
> replicas
>   -j                 cluster                  Block allocation type
>   -D                 nfs4                     File locking semantics in
> effect
>   -k                 all                      ACL semantics in effect
>   -n                 512                      Estimated number of nodes
> that will mount file system
>   -B                 1048576                  Block size
>   -Q                 user;group;fileset       Quotas accounting enabled
>                      user;group;fileset       Quotas enforced
>                      none                     Default quotas enabled
>   --perfileset-quota No                       Per-fileset quota
> enforcement
>   --filesetdf        No                       Fileset df enabled?
>   -V                 22.00 (5.0.4.0)          File system version
>   --create-time      Fri Apr  3 19:26:27 2020 File system creation time
>   -z                 No                       Is DMAPI enabled?
>   -L                 33554432                 Logfile size
>   -E                 Yes                      Exact mtime mount option
>   -S                 relatime                 Suppress atime mount option
>   -K                 whenpossible             Strict replica allocation
> option
>   --fastea           Yes                      Fast external attributes
> enabled?
>   --encryption       No                       Encryption enabled?
>   --inode-limit      134217728                Maximum number of inodes
>   --log-replicas     0                        Number of log replicas
>   --is4KAligned      Yes                      is4KAligned?
>   --rapid-repair     Yes                      rapidRepair enabled?
>   --write-cache-threshold 0                   HAWC Threshold (max 65536)
>   --subblocks-per-full-block 128              Number of subblocks per
> full block
>   -P                 system                   Disk storage pools in file
> system
>   --file-audit-log   No                       File Audit Logging enabled?
>   --maintenance-mode No                       Maintenance Mode enabled?
>   -d                 nsdfs4lun2;nsdfs5lun2    Disks in file system
>   -A                 yes                      Automatic mount option
>   -o                 none                     Additional mount options
>   -T                 /gexp2                   Default mount point
>   --mount-priority   0                        Mount priority
> 
> 
> --
> Giovanni Bracco
> phone  +39 351 8804788
> E-mail  giovanni.bracco at enea.it
> WWW
> http://www.afs.enea.it/bracco 
> 
> 
> 
> ==================================================
> 
> Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle
> persone indicate e la casella di posta elettronica da cui e' stata inviata
> e' da qualificarsi quale strumento aziendale.
> La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza
> di queste informazioni sono rigorosamente vietate (art. 616 c.p, D.Lgs. n.
> 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679).
> Qualora abbiate ricevuto questo documento per errore siete cortesemente
> pregati di darne immediata comunicazione al mittente e di provvedere alla
> sua distruzione. Grazie.
> 
> This e-mail and any attachments is confidential and may contain privileged
> information intended for the addressee(s) only.
> Dissemination, copying, printing or use by anybody else is unauthorised
> (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent amendments and GDPR UE
> 2016/679).
> If you are not the intended recipient, please delete this message and any
> attachments and advise the sender by return e-mail. Thanks.
> 
> ==================================================
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
>  
>  
> 
> Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
> Oy IBM Finland Ab
> PL 265, 00101 Helsinki, Finland
> Business ID, Y-tunnus: 0195876-3 
> Registered in Finland
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200612/f3726c77/attachment-0002.htm>


More information about the gpfsug-discuss mailing list