[gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN
Giovanni Bracco
giovanni.bracco at enea.it
Fri Jun 5 14:53:23 BST 2020
answer in the text
On 05/06/20 14:58, Jan-Frode Myklebust wrote:
>
> Could maybe be interesting to drop the NSD servers, and let all nodes
> access the storage via srp ?
no we can not: the production clusters fabric is a mix of a QDR based
cluster and a OPA based cluster and NSD nodes provide the service to both.
>
> Maybe turn off readahead, since it can cause performance degradation
> when GPFS reads 1 MB blocks scattered on the NSDs, so that read-ahead
> always reads too much. This might be the cause of the slow read seen —
> maybe you’ll also overflow it if reading from both NSD-servers at the
> same time?
I have switched the readahead off and this produced a small (~10%)
increase of performances when reading from a NSD server, but no change
in the bad behaviour for the GPFS clients
>
>
> Plus.. it’s always nice to give a bit more pagepool to hhe clients than
> the default.. I would prefer to start with 4 GB.
we'll do also that and we'll let you know!
Giovanni
>
>
>
> -jf
>
> fre. 5. jun. 2020 kl. 14:22 skrev Giovanni Bracco
> <giovanni.bracco at enea.it <mailto:giovanni.bracco at enea.it>>:
>
> In our lab we have received two storage-servers, Super micro
> SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID
> controller (2 GB cache) and before putting them in production for other
> purposes we have setup a small GPFS test cluster to verify if they can
> be used as storage (our gpfs production cluster has the licenses based
> on the NSD sockets, so it would be interesting to expand the storage
> size just by adding storage-servers in a infiniband based SAN, without
> changing the number of NSD servers)
>
> The test cluster consists of:
>
> 1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale
> each.
> 2) a Mellanox FDR switch used as a SAN switch
> 3) a Truescale QDR switch as GPFS cluster switch
> 4) two GPFS clients (Supermicro AMD nodes) one port QDR each.
>
> All the nodes run CentOS 7.7.
>
> On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been
> configured and it is exported via infiniband as an iSCSI target so that
> both appear as devices accessed by the srp_daemon on the NSD servers,
> where multipath (not really necessary in this case) has been configured
> for these two LIO-ORG devices.
>
> GPFS version 5.0.4-0 has been installed and the RDMA has been properly
> configured
>
> Two NSD disk have been created and a GPFS file system has been
> configured.
>
> Very simple tests have been performed using lmdd serial write/read.
>
> 1) storage-server local performance: before configuring the RAID6
> volume
> as NSD disk, a local xfs file system was created and lmdd write/read
> performance for 100 GB file was verified to be about 1 GB/s
>
> 2) once the GPFS cluster has been created write/read test have been
> performed directly from one of the NSD server at a time:
>
> write performance 2 GB/s, read performance 1 GB/s for 100 GB file
>
> By checking with iostat, it was observed that the I/O in this case
> involved only the NSD server where the test was performed, so when
> writing, the double of base performances was obtained, while in
> reading
> the same performance as on a local file system, this seems correct.
> Values are stable when the test is repeated.
>
> 3) when the same test is performed from the GPFS clients the lmdd
> result
> for a 100 GB file are:
>
> write - 900 MB/s and stable, not too bad but half of what is seen from
> the NSD servers.
>
> read - 30 MB/s to 300 MB/s: very low and unstable values
>
> No tuning of any kind in all the configuration of the involved system,
> only default values.
>
> Any suggestion to explain the very bad read performance from a GPFS
> client?
>
> Giovanni
>
> here are the configuration of the virtual drive on the storage-server
> and the file system configuration in GPFS
>
>
> Virtual drive
> ==============
>
> Virtual Drive: 2 (Target Id: 2)
> Name :
> RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
> Size : 81.856 TB
> Sector Size : 512
> Is VD emulated : Yes
> Parity Size : 18.190 TB
> State : Optimal
> Strip Size : 256 KB
> Number Of Drives : 11
> Span Depth : 1
> Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
> Bad BBU
> Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
> Bad BBU
> Default Access Policy: Read/Write
> Current Access Policy: Read/Write
> Disk Cache Policy : Disabled
>
>
> GPFS file system from mmlsfs
> ============================
>
> mmlsfs vsd_gexp2
> flag value description
> ------------------- ------------------------
> -----------------------------------
> -f 8192 Minimum fragment
> (subblock) size in bytes
> -i 4096 Inode size in bytes
> -I 32768 Indirect block size
> in bytes
> -m 1 Default number of
> metadata
> replicas
> -M 2 Maximum number of
> metadata
> replicas
> -r 1 Default number of data
> replicas
> -R 2 Maximum number of data
> replicas
> -j cluster Block allocation type
> -D nfs4 File locking
> semantics in
> effect
> -k all ACL semantics in effect
> -n 512 Estimated number of
> nodes
> that will mount file system
> -B 1048576 Block size
> -Q user;group;fileset Quotas accounting enabled
> user;group;fileset Quotas enforced
> none Default quotas enabled
> --perfileset-quota No Per-fileset quota
> enforcement
> --filesetdf No Fileset df enabled?
> -V 22.00 (5.0.4.0) File system version
> --create-time Fri Apr 3 19:26:27 2020 File system creation time
> -z No Is DMAPI enabled?
> -L 33554432 Logfile size
> -E Yes Exact mtime mount option
> -S relatime Suppress atime mount
> option
> -K whenpossible Strict replica
> allocation
> option
> --fastea Yes Fast external attributes
> enabled?
> --encryption No Encryption enabled?
> --inode-limit 134217728 Maximum number of inodes
> --log-replicas 0 Number of log replicas
> --is4KAligned Yes is4KAligned?
> --rapid-repair Yes rapidRepair enabled?
> --write-cache-threshold 0 HAWC Threshold (max
> 65536)
> --subblocks-per-full-block 128 Number of subblocks per
> full block
> -P system Disk storage pools in
> file
> system
> --file-audit-log No File Audit Logging
> enabled?
> --maintenance-mode No Maintenance Mode enabled?
> -d nsdfs4lun2;nsdfs5lun2 Disks in file system
> -A yes Automatic mount option
> -o none Additional mount options
> -T /gexp2 Default mount point
> --mount-priority 0 Mount priority
>
>
> --
> Giovanni Bracco
> phone +39 351 8804788
> E-mail giovanni.bracco at enea.it <mailto:giovanni.bracco at enea.it>
> WWW http://www.afs.enea.it/bracco
>
>
> ==================================================
>
> Questo messaggio e i suoi allegati sono indirizzati esclusivamente
> alle persone indicate e la casella di posta elettronica da cui e'
> stata inviata e' da qualificarsi quale strumento aziendale.
> La diffusione, copia o qualsiasi altra azione derivante dalla
> conoscenza di queste informazioni sono rigorosamente vietate (art.
> 616 c.p, D.Lgs. n. 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679).
> Qualora abbiate ricevuto questo documento per errore siete
> cortesemente pregati di darne immediata comunicazione al mittente e
> di provvedere alla sua distruzione. Grazie.
>
> This e-mail and any attachments is confidential and may contain
> privileged information intended for the addressee(s) only.
> Dissemination, copying, printing or use by anybody else is
> unauthorised (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent
> amendments and GDPR UE 2016/679).
> If you are not the intended recipient, please delete this message
> and any attachments and advise the sender by return e-mail. Thanks.
>
> ==================================================
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Giovanni Bracco
phone +39 351 8804788
E-mail giovanni.bracco at enea.it
WWW http://www.afs.enea.it/bracco
More information about the gpfsug-discuss
mailing list