[gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Jan-Frode Myklebust janfrode at tanso.net
Fri Jun 5 13:58:39 BST 2020


Could maybe be interesting to drop the NSD servers, and let all nodes
access the storage via srp ?

Maybe turn off readahead, since it can cause performance degradation when
GPFS reads 1 MB blocks scattered on the NSDs, so that read-ahead always
reads too much. This might be the cause of the slow read seen — maybe
you’ll also overflow it if reading from both NSD-servers at the same time?


Plus.. it’s always nice to give a bit more pagepool to hhe clients than the
default.. I would prefer to start with 4 GB.



  -jf

fre. 5. jun. 2020 kl. 14:22 skrev Giovanni Bracco <giovanni.bracco at enea.it>:

> In our lab we have received two storage-servers, Super micro
> SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID
> controller (2 GB cache) and before putting them in production for other
> purposes we have setup a small GPFS test cluster to verify if they can
> be used as storage (our gpfs production cluster has the licenses based
> on the NSD sockets, so it would be interesting to expand the storage
> size just by adding storage-servers in a infiniband based SAN, without
> changing the number of NSD servers)
>
> The test cluster consists of:
>
> 1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale each.
> 2) a Mellanox FDR switch used as a SAN switch
> 3) a Truescale QDR switch as GPFS cluster switch
> 4) two GPFS clients (Supermicro AMD nodes) one port QDR each.
>
> All the nodes run CentOS 7.7.
>
> On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been
> configured and it is exported via infiniband as an iSCSI target so that
> both appear as devices accessed by the srp_daemon on the NSD servers,
> where multipath (not really necessary in this case) has been configured
> for these two LIO-ORG devices.
>
> GPFS version 5.0.4-0 has been installed and the RDMA has been properly
> configured
>
> Two NSD disk have been created and a GPFS file system has been configured.
>
> Very simple tests have been performed using lmdd serial write/read.
>
> 1) storage-server local performance: before configuring the RAID6 volume
> as NSD disk, a local xfs file system was created and lmdd write/read
> performance for 100 GB file was verified to be about 1 GB/s
>
> 2) once the GPFS cluster has been created write/read test have been
> performed directly from one of the NSD server at a time:
>
> write performance 2 GB/s, read performance 1 GB/s for 100 GB file
>
> By checking with iostat, it was observed that the I/O in this case
> involved only the NSD server where the test was performed, so when
> writing, the double of base performances was obtained,  while in reading
> the same performance as on a local file system, this seems correct.
> Values are stable when the test is repeated.
>
> 3) when the same test is performed from the GPFS clients the lmdd result
> for a 100 GB file are:
>
> write - 900 MB/s and stable, not too bad but half of what is seen from
> the NSD servers.
>
> read - 30 MB/s to 300 MB/s: very low and unstable values
>
> No tuning of any kind in all the configuration of the involved system,
> only default values.
>
> Any suggestion to explain the very bad  read performance from a GPFS
> client?
>
> Giovanni
>
> here are the configuration of the virtual drive on the storage-server
> and the file system configuration in GPFS
>
>
> Virtual drive
> ==============
>
> Virtual Drive: 2 (Target Id: 2)
> Name                :
> RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
> Size                : 81.856 TB
> Sector Size         : 512
> Is VD emulated      : Yes
> Parity Size         : 18.190 TB
> State               : Optimal
> Strip Size          : 256 KB
> Number Of Drives    : 11
> Span Depth          : 1
> Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
> Bad BBU
> Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
> Bad BBU
> Default Access Policy: Read/Write
> Current Access Policy: Read/Write
> Disk Cache Policy   : Disabled
>
>
> GPFS file system from mmlsfs
> ============================
>
> mmlsfs vsd_gexp2
> flag                value                    description
> ------------------- ------------------------
> -----------------------------------
>   -f                 8192                     Minimum fragment
> (subblock) size in bytes
>   -i                 4096                     Inode size in bytes
>   -I                 32768                    Indirect block size in bytes
>   -m                 1                        Default number of metadata
> replicas
>   -M                 2                        Maximum number of metadata
> replicas
>   -r                 1                        Default number of data
> replicas
>   -R                 2                        Maximum number of data
> replicas
>   -j                 cluster                  Block allocation type
>   -D                 nfs4                     File locking semantics in
> effect
>   -k                 all                      ACL semantics in effect
>   -n                 512                      Estimated number of nodes
> that will mount file system
>   -B                 1048576                  Block size
>   -Q                 user;group;fileset       Quotas accounting enabled
>                      user;group;fileset       Quotas enforced
>                      none                     Default quotas enabled
>   --perfileset-quota No                       Per-fileset quota enforcement
>   --filesetdf        No                       Fileset df enabled?
>   -V                 22.00 (5.0.4.0)          File system version
>   --create-time      Fri Apr  3 19:26:27 2020 File system creation time
>   -z                 No                       Is DMAPI enabled?
>   -L                 33554432                 Logfile size
>   -E                 Yes                      Exact mtime mount option
>   -S                 relatime                 Suppress atime mount option
>   -K                 whenpossible             Strict replica allocation
> option
>   --fastea           Yes                      Fast external attributes
> enabled?
>   --encryption       No                       Encryption enabled?
>   --inode-limit      134217728                Maximum number of inodes
>   --log-replicas     0                        Number of log replicas
>   --is4KAligned      Yes                      is4KAligned?
>   --rapid-repair     Yes                      rapidRepair enabled?
>   --write-cache-threshold 0                   HAWC Threshold (max 65536)
>   --subblocks-per-full-block 128              Number of subblocks per
> full block
>   -P                 system                   Disk storage pools in file
> system
>   --file-audit-log   No                       File Audit Logging enabled?
>   --maintenance-mode No                       Maintenance Mode enabled?
>   -d                 nsdfs4lun2;nsdfs5lun2    Disks in file system
>   -A                 yes                      Automatic mount option
>   -o                 none                     Additional mount options
>   -T                 /gexp2                   Default mount point
>   --mount-priority   0                        Mount priority
>
>
> --
> Giovanni Bracco
> phone  +39 351 8804788
> E-mail  giovanni.bracco at enea.it
> WWW http://www.afs.enea.it/bracco
>
>
> ==================================================
>
> Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle
> persone indicate e la casella di posta elettronica da cui e' stata inviata
> e' da qualificarsi quale strumento aziendale.
> La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza
> di queste informazioni sono rigorosamente vietate (art. 616 c.p, D.Lgs. n.
> 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679).
> Qualora abbiate ricevuto questo documento per errore siete cortesemente
> pregati di darne immediata comunicazione al mittente e di provvedere alla
> sua distruzione. Grazie.
>
> This e-mail and any attachments is confidential and may contain privileged
> information intended for the addressee(s) only.
> Dissemination, copying, printing or use by anybody else is unauthorised
> (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent amendments and GDPR UE
> 2016/679).
> If you are not the intended recipient, please delete this message and any
> attachments and advise the sender by return e-mail. Thanks.
>
> ==================================================
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200605/78af4f29/attachment-0002.htm>


More information about the gpfsug-discuss mailing list