[gpfsug-discuss] relion software using GPFS storage

Fri Aug 9 14:29:14 BST 2019

Dear Wei,

Not a lot on information to go on there... e.g. layout of the MPI processes on compute nodes, the interconnect and the GPFS settings... but the standout information appears to be:

"10X slower than local SSD, and nfs reexport of another gpfs filesystem"

"The per process IO is very slow, 4-5 MiB/s, while on ssd and nfs I got 20-40 MiB/s"

You also not 2GB/s performance for 4MB writes, and 1.7GB/s read. That is only 500 IOPS, I assume you'd see more with 4kB reads/writes.

I'd also note that 10x slower is kind of an intermediate number, its bad but not totally unproductive.

I think the likely issues are going to be around the GPFS (client) config, although you might also be struggling with IOPS. The fact that the NFS re-export trick works (allowing O/S-level lazy caching and instant re-opening of files) suggests that total performance is not your issue. Upping the pagepool and/or maxStatCache etc may just make all these issues go away.

If I picked out the right benchmark, then it is one with a 360 box size which is not too small... I don't know how many files comprise your particle set...

Regards,
Robert
--

Dr Robert Esnouf

University Research Lecturer,
Director of Research Computing BDI,
Head of Research Computing Core WHG,
NDM Research Computing Strategy Officer

Main office:
Room 10/028, Wellcome Centre for Human Genetics,
Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK

Emails:
robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk

Tel: (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI)

----- Original Message -----
From: Guo, Wei (Wei.Guo at STJUDE.ORG)
Date: 08/08/19 23:19
To: gpfsug-discuss at spectrumscale.org, robert at strubi.ox.ac.uk, robert at well.ox.ac.uk, robert.esnouf at bid.ox.ac.uk
Subject: [gpfsug-discuss] relion software using GPFS storage

Hi, Robert and Michael, 

What are the settings within relion for parallel file systems?

Sorry to bump this old threads, as I don't see any further conversation, and I cannot join the mailing list recently due to 

the spectrumscale.org:10000 web server error. I used to be in this mailing list with my previous work (email). 

The problem is I also see Relion 3 does not like GPFS. It is obscenely slow, slower than anything... local ssd, nfs reexport of gpfs. 

I am using the standard benchmarks from Relion 3 website. 

The mpirun -n 9 `which relion_refine_mpi` is 10X slower than local SSD, and nfs reexport of another gpfs filesystem. 

The latter two I can get close results (1hr25min) as compared with the publish results (1hr13min) on the same Intel Xeon Gold 6148 CPU @2.40GHz and 4 V100 GPU cards, with the same command. 

Running the same standard benchmark it takes 15-20 min for one iteration, should be <1.7 mins. 

The per process IO is very slow, 4-5 MiB/s, while on ssd and nfs I got 20-40 MiB/s if watching the /proc/<PID>/io of the relion_refine processes. 

My gpfs client can see ~2GB/s when benchmarking with IOZONE, yes, 2GB/s because of small system, 70? drives. 

Record Size 4096 kB
O_DIRECT feature enabled
File size set to 20971520 kB
Command line used: iozone -r 4m -I -t 16 -s 20g
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 kBytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Throughput test with 16 processes
Each process writes a 20971520 kByte file in 4096 kByte records

Children see throughput for 16 initial writers = 1960218.38 kB/sec
Parent sees throughput for 16 initial writers = 1938463.07 kB/sec
Min throughput per process =  120415.66 kB/sec 
Max throughput per process =  123652.07 kB/sec
Avg throughput per process =  122513.65 kB/sec
Min xfer = 20426752.00 kB

Children see throughput for 16 readers = 1700354.00 kB/sec
Parent sees throughput for 16 readers = 1700046.71 kB/sec
Min throughput per process =  104587.73 kB/sec 
Max throughput per process =  108182.84 kB/sec
Avg throughput per process =  106272.12 kB/sec
Min xfer = 20275200.00 kB

The --no_parallel_disk_io is even worse. --only_do_unfinished_movies does not help much. 

Please advise.

Thanks

Wei Guo

Computational Engineer, 

St Jude Children's Research Hospital

wei.guo at stjude.org

Dear Michael,

There are settings within relion for parallel file systems, you should check they are enabled if you have SS underneath.

Otherwise, check which version of relion and then try to understand the problem that is being analysed a little more.

If the box size is very small and the internal symmetry low then the user may read 100,000s of small "picked particle" files for each iteration opening and closing the files each time.

I believe that relion3 has some facility for extracting these small particles from the larger raw images and that is more SS-friendly. Alternatively, the size of the set of picked particles is often only in 50GB range and so staging to one or more local machines is quite feasible...

Hope one of those suggestions helps.
Regards,
Robert

--

Dr Robert Esnouf 

University Research Lecturer, 
Director of Research Computing BDI, 
Head of Research Computing Core WHG, 
NDM Research Computing Strategy Officer 

Main office: 
Room 10/028, Wellcome Centre for Human Genetics, 
Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK 

Emails: 
robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk 

Tel:   (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI)

-----Original Message-----
From: "Michael Holliday" <michael.holliday at crick.ac.uk>
To: gpfsug-discuss at spectrumscale.org
Date: 27/02/19 12:21
Subject: [gpfsug-discuss] relion software using GPFS storage

Hi All,

We’ve recently had an issue where a job on our client GPFS cluster caused out main storage to go extremely slowly.   The job was running relion using MPI (https://www2.mrc-lmb.cam.ac.uk/relion/index.php?title=Main_Page)

It caused waiters across the cluster, and caused the load to spike on NSDS on at a time.  When the spike ended on one NSD, it immediately started on another. 

There were no obvious errors in the logs and the issues cleared immediately after the job was cancelled. 

Has anyone else see any issues with relion using GPFS storage?

Michael

Michael Holliday RITTech MBCS
Senior HPC & Research Data Systems Engineer | eMedLab Operations Team
Scientific Computing STP | The Francis Crick Institute
1, Midland Road | London | NW1 1AT | United Kingdom
Tel: 0203 796 3167

The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190809/49045fe2/attachment-0001.htm>