<html><body><p><font size="2">8 data * 256K does not align to your 1MB <br></font><font size="2"><br></font><font size="2">Raid 6 is already not the best option for writes. I would look into use multiples of 2MB block sizes. <br></font><font size="2"><br></font><font size="2">--<br></font><font size="2">Cheers<br></font><font size="2"><br></font><font size="2">> On 11. Jun 2020, at 17.07, Giovanni Bracco <giovanni.bracco@enea.it> wrote:<br></font><font size="2">> <br></font><font size="2">> 256K<br></font><font size="2">> <br></font><font size="2">> Giovanni<br></font><font size="2">> <br></font><font size="2">>> On 11/06/20 10:01, Luis Bolinches wrote:<br></font><font size="2">>> On that RAID 6 what is the logical RAID block size? 128K, 256K, other?<br></font><font size="2">>> --<br></font><font size="2">>> Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations <br></font><font size="2">>> / Salutacions<br></font><font size="2">>> Luis Bolinches<br></font><font size="2">>> Consultant IT Specialist<br></font><font size="2">>> IBM Spectrum Scale development<br></font><font size="2">>> ESS & client adoption teams<br></font><font size="2">>> Mobile Phone: +358503112585<br></font><font size="2">>> *https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youracclaim.com_user_luis-2Dbolinches-2A&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=0sBbPyJrNuU4BjRb4Cv2f8Z0ot7MiVpqshdkyAHqiuE&e= <br></font><font size="2">>> Ab IBM Finland Oy<br></font><font size="2">>> Laajalahdentie 23<br></font><font size="2">>> 00330 Helsinki<br></font><font size="2">>> Uusimaa - Finland<br></font><font size="2">>> <br></font><font size="2">>> *"If you always give you will always have" -- Anonymous*<br></font><font size="2">>> <br></font><font size="2">>> ----- Original message -----<br></font><font size="2">>> From: Giovanni Bracco <giovanni.bracco@enea.it><br></font><font size="2">>> Sent by: gpfsug-discuss-bounces@spectrumscale.org<br></font><font size="2">>> To: Jan-Frode Myklebust <janfrode@tanso.net>, gpfsug main discussion<br></font><font size="2">>> list <gpfsug-discuss@spectrumscale.org><br></font><font size="2">>> Cc: Agostino Funel <agostino.funel@enea.it><br></font><font size="2">>> Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance<br></font><font size="2">>> in simple spectrum scale/gpfs cluster with a storage-server SAN<br></font><font size="2">>> Date: Thu, Jun 11, 2020 10:53<br></font><font size="2">>> Comments and updates in the text:<br></font><font size="2">>> <br></font><font size="2">>>> On 05/06/20 19:02, Jan-Frode Myklebust wrote:<br></font><font size="2">>>> fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco<br></font><font size="2">>>> <giovanni.bracco@enea.it <<a href="mailto:giovanni.bracco@enea.it">mailto:giovanni.bracco@enea.it</a>>>:<br></font><font size="2">>>> <br></font><font size="2">>>> answer in the text<br></font><font size="2">>>> <br></font><font size="2">>>>> On 05/06/20 14:58, Jan-Frode Myklebust wrote:<br></font><font size="2">>>> ><br></font><font size="2">>>> > Could maybe be interesting to drop the NSD servers, and<br></font><font size="2">>> let all<br></font><font size="2">>>> nodes<br></font><font size="2">>>> > access the storage via srp ?<br></font><font size="2">>>> <br></font><font size="2">>>> no we can not: the production clusters fabric is a mix of a<br></font><font size="2">>> QDR based<br></font><font size="2">>>> cluster and a OPA based cluster and NSD nodes provide the<br></font><font size="2">>> service to<br></font><font size="2">>>> both.<br></font><font size="2">>>> <br></font><font size="2">>>> <br></font><font size="2">>>> You could potentially still do SRP from QDR nodes, and via NSD<br></font><font size="2">>> for your<br></font><font size="2">>>> omnipath nodes. Going via NSD seems like a bit pointless indirection.<br></font><font size="2">>> <br></font><font size="2">>> not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share<br></font><font size="2">>> the same data lake in Spectrum Scale/GPFS so the NSD servers support the<br></font><font size="2">>> flexibility of the setup.<br></font><font size="2">>> <br></font><font size="2">>> NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at<br></font><font size="2">>> the moment 3 different generations of DDN storages are connected,<br></font><font size="2">>> 9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less<br></font><font size="2">>> expensive storage, to be used when performance is not the first<br></font><font size="2">>> priority.<br></font><font size="2">>> <br></font><font size="2">>>> <br></font><font size="2">>>> <br></font><font size="2">>>> <br></font><font size="2">>>> ><br></font><font size="2">>>> > Maybe turn off readahead, since it can cause performance<br></font><font size="2">>> degradation<br></font><font size="2">>>> > when GPFS reads 1 MB blocks scattered on the NSDs, so that<br></font><font size="2">>>> read-ahead<br></font><font size="2">>>> > always reads too much. This might be the cause of the slow<br></font><font size="2">>> read<br></font><font size="2">>>> seen —<br></font><font size="2">>>> > maybe you’ll also overflow it if reading from both<br></font><font size="2">>> NSD-servers at<br></font><font size="2">>>> the<br></font><font size="2">>>> > same time?<br></font><font size="2">>>> <br></font><font size="2">>>> I have switched the readahead off and this produced a small<br></font><font size="2">>> (~10%)<br></font><font size="2">>>> increase of performances when reading from a NSD server, but<br></font><font size="2">>> no change<br></font><font size="2">>>> in the bad behaviour for the GPFS clients<br></font><font size="2">>>> <br></font><font size="2">>>> <br></font><font size="2">>>> ><br></font><font size="2">>>> ><br></font><font size="2">>>> > Plus.. it’s always nice to give a bit more pagepool to hhe<br></font><font size="2">>>> clients than<br></font><font size="2">>>> > the default.. I would prefer to start with 4 GB.<br></font><font size="2">>>> <br></font><font size="2">>>> we'll do also that and we'll let you know!<br></font><font size="2">>>> <br></font><font size="2">>>> <br></font><font size="2">>>> Could you show your mmlsconfig? Likely you should set maxMBpS to<br></font><font size="2">>>> indicate what kind of throughput a client can do (affects GPFS<br></font><font size="2">>>> readahead/writebehind). Would typically also increase<br></font><font size="2">>> workerThreads on<br></font><font size="2">>>> your NSD servers.<br></font><font size="2">>> <br></font><font size="2">>> At this moment this is the output of mmlsconfig<br></font><font size="2">>> <br></font><font size="2">>> # mmlsconfig<br></font><font size="2">>> Configuration data for cluster GPFSEXP.portici.enea.it:<br></font><font size="2">>> -------------------------------------------------------<br></font><font size="2">>> clusterName GPFSEXP.portici.enea.it<br></font><font size="2">>> clusterId 13274694257874519577<br></font><font size="2">>> autoload no<br></font><font size="2">>> dmapiFileHandleSize 32<br></font><font size="2">>> minReleaseLevel 5.0.4.0<br></font><font size="2">>> ccrEnabled yes<br></font><font size="2">>> cipherList AUTHONLY<br></font><font size="2">>> verbsRdma enable<br></font><font size="2">>> verbsPorts qib0/1<br></font><font size="2">>> [cresco-gpfq7,cresco-gpfq8]<br></font><font size="2">>> verbsPorts qib0/2<br></font><font size="2">>> [common]<br></font><font size="2">>> pagepool 4G<br></font><font size="2">>> adminMode central<br></font><font size="2">>> <br></font><font size="2">>> File systems in cluster GPFSEXP.portici.enea.it:<br></font><font size="2">>> ------------------------------------------------<br></font><font size="2">>> /dev/vsd_gexp2<br></font><font size="2">>> /dev/vsd_gexp3<br></font><font size="2">>> <br></font><font size="2">>> <br></font><font size="2">>>> <br></font><font size="2">>>> <br></font><font size="2">>>> 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip<br></font><font size="2">>> size.<br></font><font size="2">>>> When you write one GPFS block, less than a half RAID stripe is<br></font><font size="2">>> written,<br></font><font size="2">>>> which means you need to read back some data to calculate new<br></font><font size="2">>> parities.<br></font><font size="2">>>> I would prefer 4 MB block size, and maybe also change to 8+p+q so<br></font><font size="2">>> that<br></font><font size="2">>>> one GPFS is a multiple of a full 2 MB stripe.<br></font><font size="2">>>> <br></font><font size="2">>>> <br></font><font size="2">>>> -jf<br></font><font size="2">>> <br></font><font size="2">>> we have now added another file system based on 2 NSD on RAID6 8+p+q,<br></font><font size="2">>> keeping the 1MB block size just not to change too many things at the<br></font><font size="2">>> same time, but no substantial change in very low readout performances,<br></font><font size="2">>> that are still of the order of 50 MB/s while write performance are<br></font><font size="2">>> 1000MB/s<br></font><font size="2">>> <br></font><font size="2">>> Any other suggestion is welcomed!<br></font><font size="2">>> <br></font><font size="2">>> Giovanni<br></font><font size="2">>> <br></font><font size="2">>> <br></font><font size="2">>> <br></font><font size="2">>> --<br></font><font size="2">>> Giovanni Bracco<br></font><font size="2">>> phone +39 351 8804788<br></font><font size="2">>> E-mail giovanni.bracco@enea.it<br></font><font size="2">>> WWW <a href="http://www.afs.enea.it/bracco">http://www.afs.enea.it/bracco</a> <br></font><font size="2">>> _______________________________________________<br></font><font size="2">>> gpfsug-discuss mailing list<br></font><font size="2">>> gpfsug-discuss at spectrumscale.org<br></font><font size="2">>> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a> <br></font><font size="2">>> <br></font><font size="2">>> <br></font><font size="2">>> Ellei edellä ole toisin mainittu: / Unless stated otherwise above:<br></font><font size="2">>> Oy IBM Finland Ab<br></font><font size="2">>> PL 265, 00101 Helsinki, Finland<br></font><font size="2">>> Business ID, Y-tunnus: 0195876-3<br></font><font size="2">>> Registered in Finland<br></font><font size="2">>> <br></font><font size="2">>> <br></font><font size="2">>> _______________________________________________<br></font><font size="2">>> gpfsug-discuss mailing list<br></font><font size="2">>> gpfsug-discuss at spectrumscale.org<br></font><font size="2">>> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a> <br></font><font size="2">>> <br></font><font size="2">> <br></font><font size="2">> -- <br></font><font size="2">> Giovanni Bracco<br></font><font size="2">> phone +39 351 8804788<br></font><font size="2">> E-mail giovanni.bracco@enea.it<br></font><font size="2">> WWW <a href="http://www.afs.enea.it/bracco">http://www.afs.enea.it/bracco</a> <br></font><font size="2">> <br></font><BR>
Ellei edellä ole toisin mainittu: / Unless stated otherwise above:<BR>
Oy IBM Finland Ab<BR>
PL 265, 00101 Helsinki, Finland<BR>
Business ID, Y-tunnus: 0195876-3 <BR>
Registered in Finland<BR>
<BR>
</body></html>