<div dir="ltr"><div dir="ltr">On Thu, Jun 11, 2020 at 9:53 AM Giovanni Bracco <<a href="mailto:giovanni.bracco@enea.it">giovanni.bracco@enea.it</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

> <br>

> You could potentially still do SRP from QDR nodes, and via NSD for your <br>

> omnipath nodes. Going via NSD seems like a bit pointless indirection.<br>

<br>

not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share <br>

the same data lake in Spectrum Scale/GPFS so the NSD servers support the <br>

flexibility of the setup.<br></blockquote><div><br></div><div>Maybe there's something I don't understand, but couldn't you use the NSD-servers to serve to your <br></div><div>OPA nodes, and then SRP directly for your 300 QDR-nodes??<br></div><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

At this moment this is the output of mmlsconfig<br>

<br>

# mmlsconfig<br>

Configuration data for cluster <a href="http://GPFSEXP.portici.enea.it" rel="noreferrer" target="_blank">GPFSEXP.portici.enea.it</a>:<br>

-------------------------------------------------------<br>

clusterName <a href="http://GPFSEXP.portici.enea.it" rel="noreferrer" target="_blank">GPFSEXP.portici.enea.it</a><br>

clusterId 13274694257874519577<br>

autoload no<br>

dmapiFileHandleSize 32<br>

minReleaseLevel 5.0.4.0<br>

ccrEnabled yes<br>

cipherList AUTHONLY<br>

verbsRdma enable<br>

verbsPorts qib0/1<br>

[cresco-gpfq7,cresco-gpfq8]<br>

verbsPorts qib0/2<br>

[common]<br>

pagepool 4G<br>

adminMode central<br>

<br>

File systems in cluster <a href="http://GPFSEXP.portici.enea.it" rel="noreferrer" target="_blank">GPFSEXP.portici.enea.it</a>:<br>

------------------------------------------------<br>

/dev/vsd_gexp2<br>

/dev/vsd_gexp3<br>

<br></blockquote><div><br></div><div><br></div><div>So, trivial close to default config.. assume the same for the client cluster.<br></div><div><br></div><div>I would correct MaxMBpS -- put it at something reasonable, enable verbsRdmaSend=yes and <br></div><div>ignorePrefetchLUNCount=yes. <br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> <br>

> <br>

> 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip size. <br>

> When you write one GPFS block, less than a half RAID stripe is written, <br>

> which means you  need to read back some data to calculate new parities. <br>

> I would prefer 4 MB block size, and maybe also change to 8+p+q so that <br>

> one GPFS is a multiple of a full 2 MB stripe.<br>

> <br>

> <br>

>     -jf<br>

<br>

we have now added another file system based on 2 NSD on RAID6 8+p+q, <br>

keeping the 1MB block size just not to change too many things at the <br>

same time, but no substantial change in very low readout performances, <br>

that are still of the order of 50 MB/s while write performance are 1000MB/s<br>

<br>

Any other suggestion is welcomed!<br>

<br></blockquote><div><br></div><div><br></div><div>Maybe rule out the storage, and check if you get proper throughput from nsdperf?</div><div><br></div><div>Maybe also benchmark using "gpfsperf" instead of "lmdd", and show your full settings -- so that</div><div>we see that the benchmark is sane :-)<br></div><div><br></div><div><br></div><div><br></div><div>  -jf<br></div></div></div>