<div class="iw_mail" dir="ltr" style="font-size: 13px;"><div><br></div><div>Dear Wei,</div><div><br></div><div>Not a lot on information to go on there... e.g. layout of the MPI processes on compute nodes, the interconnect and the GPFS settings... but the standout information appears to be:</div><div><br></div><div>"10X slower than local SSD, and nfs reexport of another gpfs filesystem"</div><div><br></div><div>"The per process IO is very slow, 4-5 MiB/s, while on ssd and nfs I got 20-40 MiB/s"</div><div><br></div><div>You also not 2GB/s performance for 4MB writes, and 1.7GB/s read. That is only 500 IOPS, I assume you'd see more with 4kB reads/writes.</div><div><br></div><div>I'd also note that 10x slower is kind of an intermediate number, its bad but not totally unproductive.</div><div><br></div><div>I think the likely issues are going to be around the GPFS (client) config, although you might also be struggling with IOPS. The fact that the NFS re-export trick works (allowing O/S-level lazy caching and instant re-opening of files) suggests that total performance is not your issue. Upping the pagepool and/or maxStatCache etc may just make all these issues go away.</div><div><br></div><div>If I picked out the right benchmark, then it is one with a 360 box size which is not too small... I don't know how many files comprise your particle set...</div><div><br></div><div>Regards,</div><div>Robert</div><div><br></div><div class="iw-signature">--<br><br>Dr Robert Esnouf<br><br>University Research Lecturer,<br>Director of Research Computing BDI,<br>Head of Research Computing Core WHG,<br>NDM Research Computing Strategy Officer<br><br>Main office:<br>Room 10/028, Wellcome Centre for Human Genetics,<br>Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK<br><br>Emails:<br>robert@strubi.ox.ac.uk / robert@well.ox.ac.uk / robert.esnouf@bdi.ox.ac.uk<br><br>Tel: (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI)<br> </div><div><br></div><div style="font-size: 13px;font-family:Roboto, Tahoma, Helvetica, sans-serif;line-height:normal;" dir="LTR" class="iw-reply-block"><div style="margin:0;font-family:Roboto, Tahoma, Helvetica, sans-serif;font-size:13px;font-weight:300;line-height:150%;letter-spacing:normal;color:#333333;"><div style="display:none;margin:0;font-family:Roboto, Tahoma, Helvetica, sans-serif;font-size:13px;font-weight:300;line-height:150%;letter-spacing:normal;color:#333333;">----- Original Message -----</div><hr size="1" width="100%" style="width:100%;padding:0;margin:10px 0;color:#888888;background-color:#888888;border-color:#DDDDDD;">From: Guo, Wei (<a style="font-family: Helvetica, sans-serif; font-size: 12px; font-weight: 300; line-height: 150%; color: rgb(0, 136, 204); text-decoration: none;" href="mailto:Wei.Guo@STJUDE.ORG">Wei.Guo@STJUDE.ORG</a>)<br>Date: 08/08/19 23:19<br>To: <a style="font-family: Helvetica, sans-serif; font-size: 12px; font-weight: 300; line-height: 150%; color: rgb(0, 136, 204); text-decoration: none;" href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>, <a style="font-family: Helvetica, sans-serif; font-size: 12px; font-weight: 300; line-height: 150%; color: rgb(0, 136, 204); text-decoration: none;" href="mailto:robert@strubi.ox.ac.uk">robert@strubi.ox.ac.uk</a>, <a style="font-family: Helvetica, sans-serif; font-size: 12px; font-weight: 300; line-height: 150%; color: rgb(0, 136, 204); text-decoration: none;" href="mailto:robert@well.ox.ac.uk">robert@well.ox.ac.uk</a>, <a style="font-family: Helvetica, sans-serif; font-size: 12px; font-weight: 300; line-height: 150%; color: rgb(0, 136, 204); text-decoration: none;" href="mailto:robert.esnouf@bid.ox.ac.uk">robert.esnouf@bid.ox.ac.uk</a><br>Subject: <span style="font-family:Helvetica, sans-serif;font-size:12px;font-weight:300;line-height:150%;color:#333;text-decoration:none;font-weight:bold;">[gpfsug-discuss] relion software using GPFS storage</span></div><br><div><div style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri, Helvetica, sans-serif, EmojiFont, 'Apple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols;" id="webClient_divtagdefaultwrapper" dir="ltr"><p style="margin-top:0;margin-bottom:0;"><br></p><pre style="white-space:pre-wrap;">Hi, Robert and Michael, </pre><pre style="white-space:pre-wrap;"><br></pre><pre style="white-space:pre-wrap;"><br></pre><pre style="font-size:16px;white-space:pre-wrap;">What are the settings within relion for parallel file systems?</pre><br><pre style="white-space:pre-wrap;">Sorry to bump this old threads, as I don't see any further conversation, and I cannot join the mailing list recently due to </pre><pre style="white-space:pre-wrap;">the spectrumscale.<a>org:10000</a> web server error. I used to be in this mailing list with my previous work (email). </pre><pre style="white-space:pre-wrap;"><br></pre><pre style="white-space:pre-wrap;">The problem is I also see Relion 3 does not like GPFS. It is obscenely slow, slower than anything... local ssd, nfs reexport of gpfs. </pre><pre style="white-space:pre-wrap;">I am using the standard benchmarks from Relion 3 website. </pre><pre style="white-space:pre-wrap;"><br></pre><pre style="white-space:pre-wrap;">The mpirun -n 9 `which relion_refine_mpi` is 10X slower than local SSD, and nfs reexport of another gpfs filesystem. </pre><pre style="white-space:pre-wrap;">The latter two I can get close results (1hr25min) as compared with the publish results (1hr13min) on the same Intel Xeon Gold 6148 CPU @2.40GHz and 4 V100 GPU cards, with the same command. </pre><pre style="white-space:pre-wrap;">Running the same standard benchmark it takes 15-20 min for one iteration, should be <1.7 mins. </pre><pre style="white-space:pre-wrap;">The per process IO is very slow, 4-5 MiB/s, while on ssd and nfs I got 20-40 MiB/s if watching the /proc/<PID>/io of the relion_refine processes. </pre><pre style="white-space:pre-wrap;"><br></pre><pre style="white-space:pre-wrap;">My gpfs client can see ~2GB/s when benchmarking with IOZONE, yes, 2GB/s because of small system, 70? drives. </pre><pre style="white-space:pre-wrap;"><br></pre><div><div>Record Size 4096 kB</div><div>O_DIRECT feature enabled</div><div>File size set to 20971520 kB</div><div>Command line used: iozone -r 4m -I -t 16 -s 20g</div><div>Output is in kBytes/sec</div><div>Time Resolution = 0.000001 seconds.</div><div>Processor cache size set to 1024 kBytes.</div><div>Processor cache line size set to 32 bytes.</div><div>File stride size set to 17 * record size.</div><div>Throughput test with 16 processes</div><div>Each process writes a 20971520 kByte file in 4096 kByte records</div><div><br></div><div>Children see throughput for 16 initial writers = 1960218.38 kB/sec</div><div>Parent sees throughput for 16 initial writers = 1938463.07 kB/sec</div><div>Min throughput per process =  120415.66 kB/sec </div><div>Max throughput per process =  123652.07 kB/sec</div><div>Avg throughput per process =  122513.65 kB/sec</div><div>Min xfer = 20426752.00 kB</div><div><br></div><div>Children see throughput for 16 readers = 1700354.00 kB/sec</div><div>Parent sees throughput for 16 readers = 1700046.71 kB/sec</div><div>Min throughput per process =  104587.73 kB/sec </div><div>Max throughput per process =  108182.84 kB/sec</div><div>Avg throughput per process =  106272.12 kB/sec</div><div>Min xfer = 20275200.00 kB</div><div><br></div><br></div><div><br></div><pre style="white-space:pre-wrap;">The --no_parallel_disk_io is even worse. <span>--only_do_unfinished_movies does not help much. </span></pre><pre style="white-space:pre-wrap;"><br></pre><pre style="white-space:pre-wrap;">Please advise.</pre><pre style="white-space:pre-wrap;"><br></pre><pre style="white-space:pre-wrap;">Thanks</pre><pre style="white-space:pre-wrap;"><br></pre><pre style="white-space:pre-wrap;">Wei Guo</pre><pre style="white-space:pre-wrap;">Computational Engineer, </pre><pre style="white-space:pre-wrap;">St Jude Children's Research Hospital</pre><pre style="white-space:pre-wrap;">wei.guo@stjude.org</pre><pre style="white-space:pre-wrap;"><br></pre><pre style="white-space:pre-wrap;"><br></pre><pre style="white-space:pre-wrap;"><br class="Apple-interchange-newline"><br>Dear Michael,<br><br>There are settings within relion for parallel file systems, you should check they are enabled if you have SS underneath.<br><br>Otherwise, check which version of relion and then try to understand the problem that is being analysed a little more.<br><br>If the box size is very small and the internal symmetry low then the user may read 100,000s of small "picked particle" files for each iteration opening and closing the files each time.<br><br>I believe that relion3 has some facility for extracting these small particles from the larger raw images and that is more SS-friendly. Alternatively, the size of the set of picked particles is often only in 50GB range and so staging to one or more local machines is quite feasible...<br><br>Hope one of those suggestions helps.<br>Regards,<br>Robert<br><br>--<br><br>Dr Robert Esnouf <br><br>University Research Lecturer, <br>Director of Research Computing BDI, <br>Head of Research Computing Core WHG, <br>NDM Research Computing Strategy Officer <br><br>Main office: <br>Room 10/028, Wellcome Centre for Human Genetics, <br>Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK <br><br>Emails: <br><a rel="noreferrer noopener" id="webClient_LPlnk353953" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" class="OWAAutoLink">robert at strubi.ox.ac.uk</a> / <a rel="noreferrer noopener" id="webClient_LPlnk507641" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" class="OWAAutoLink">robert at well.ox.ac.uk</a> / <a rel="noreferrer noopener" id="webClient_LPlnk620873" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" class="OWAAutoLink">robert.esnouf at bdi.ox.ac.uk</a> <br><br>Tel:   (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI)<br> <br><br>-----Original Message-----<br>From: "Michael Holliday" <<a rel="noreferrer noopener" id="webClient_LPlnk955501" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" class="OWAAutoLink">michael.holliday at crick.ac.uk</a>><br>To: <a rel="noreferrer noopener" id="webClient_LPlnk954355" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" class="OWAAutoLink">gpfsug-discuss at spectrumscale.org</a><br>Date: 27/02/19 12:21<br>Subject: [gpfsug-discuss] relion software using GPFS storage<br><br><br>Hi All,<br> <br>We’ve recently had an issue where a job on our client GPFS cluster caused out main storage to go extremely slowly.   The job was running relion using MPI (<a rel="noreferrer noopener" id="webClient_LPlnk261043" href="https://www2.mrc-lmb.cam.ac.uk/relion/index.php?title=Main_Page" class="OWAAutoLink">https://www2.mrc-lmb.cam.ac.uk/relion/index.php?title=Main_Page</a>)<br> <br>It caused waiters across the cluster, and caused the load to spike on NSDS on at a time.  When the spike ended on one NSD, it immediately started on another. <br> <br>There were no obvious errors in the logs and the issues cleared immediately after the job was cancelled. <br> <br>Has anyone else see any issues with relion using GPFS storage?<br> <br>Michael<br> <br>Michael Holliday RITTech MBCS<br>Senior HPC & Research Data Systems Engineer | eMedLab Operations Team<br>Scientific Computing STP | The Francis Crick Institute<br>1, Midland Road | London | NW1 1AT | United Kingdom<br>Tel: 0203 796 3167<br> <br>The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT<br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br><a rel="noreferrer noopener" id="webClient_LPlnk678306" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" class="OWAAutoLink">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></pre><div><br></div><br><p><br></p></div><br><hr><br>Email Disclaimer: <a>www.stjude.org/emaildisclaimer</a><br>Consultation Disclaimer: <a>www.stjude.org/consultationdisclaimer</a></div></div></div>