[gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Thu Jun 11 11:13:36 BST 2020

On Thu, Jun 11, 2020 at 9:53 AM Giovanni Bracco <giovanni.bracco at enea.it>
wrote:

>
> >
> > You could potentially still do SRP from QDR nodes, and via NSD for your
> > omnipath nodes. Going via NSD seems like a bit pointless indirection.
>
> not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share
> the same data lake in Spectrum Scale/GPFS so the NSD servers support the
> flexibility of the setup.
>

Maybe there's something I don't understand, but couldn't you use the
NSD-servers to serve to your
OPA nodes, and then SRP directly for your 300 QDR-nodes??

> At this moment this is the output of mmlsconfig
>
> # mmlsconfig
> Configuration data for cluster GPFSEXP.portici.enea.it:
> -------------------------------------------------------
> clusterName GPFSEXP.portici.enea.it
> clusterId 13274694257874519577
> autoload no
> dmapiFileHandleSize 32
> minReleaseLevel 5.0.4.0
> ccrEnabled yes
> cipherList AUTHONLY
> verbsRdma enable
> verbsPorts qib0/1
> [cresco-gpfq7,cresco-gpfq8]
> verbsPorts qib0/2
> [common]
> pagepool 4G
> adminMode central
>
> File systems in cluster GPFSEXP.portici.enea.it:
> ------------------------------------------------
> /dev/vsd_gexp2
> /dev/vsd_gexp3
>
>

So, trivial close to default config.. assume the same for the client
cluster.

I would correct MaxMBpS -- put it at something reasonable, enable
verbsRdmaSend=yes and
ignorePrefetchLUNCount=yes.

>
> >
> >
> > 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip size.
> > When you write one GPFS block, less than a half RAID stripe is written,
> > which means you  need to read back some data to calculate new parities.
> > I would prefer 4 MB block size, and maybe also change to 8+p+q so that
> > one GPFS is a multiple of a full 2 MB stripe.
> >
> >
> >     -jf
>
> we have now added another file system based on 2 NSD on RAID6 8+p+q,
> keeping the 1MB block size just not to change too many things at the
> same time, but no substantial change in very low readout performances,
> that are still of the order of 50 MB/s while write performance are 1000MB/s
>
> Any other suggestion is welcomed!
>
>

Maybe rule out the storage, and check if you get proper throughput from
nsdperf?

Maybe also benchmark using "gpfsperf" instead of "lmdd", and show your full
settings -- so that
we see that the benchmark is sane :-)

  -jf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200611/2f870cfb/attachment-0002.htm>