[gpfsug-discuss] alphafold and mmap performance
jon at well.ox.ac.uk
Tue Oct 19 19:12:34 BST 2021
Not that it answers Stuart's questions in any way, but we gave up on the same problem on a similar setup, rescued an old fileserver off the scrapheap (RAID6 of 12 x 7.2k rpm SAS on a PERC H710P) and just served the reference data by nfs - good enough to keep the compute busy rather than in cxiWaitEventWait. If there's significant demand for Alphafold then somebody's arm will be twisted for a new server with some NVMe. If I remember right, the reference data is ~2.3TB, ruling out our usual approach of just reading the problematic files into a ramdisk first.
We are also interested in hearing how it might be usably served from GPFS.
Dr. Jonathan Diprose <jon at well.ox.ac.uk> Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stuart Barkley [stuartb at 4gh.net]
Sent: 19 October 2021 18:16
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] alphafold and mmap performance
Over the years there have been several discussions about performance
problems with mmap() on GPFS/Spectrum Scale.
We are currently having problems with mmap() performance on our
systems with new alphafold <https://github.com/deepmind/alphafold>
protein folding software. Things look similar to previous times we
have had mmap() problems.
The software component "hhblits" appears to mmap a large file with
genomic data and then does random reads throughout the file. GPFS
appears to be doing 4K reads for each block limiting the performance.
The first run takes 20+ hours to run. Subsequent identical runs
complete in just 1-2 hours. After clearing the linux system cache
(echo 3 > /proc/sys/vm/drop_caches) the slow performance returns for
the next run.
GPFS Server is 4.2.3-5 running on DDN hardware. CentOS 7.3
Default GPFS Client is 4.2.3-22. CentOS 7.9
We have tried a number of things including Spectrum Scale client
version 5.0.5-9 which should have Sven's recent mmap performance
improvements. Are the recent mmap performance improvements in the
client code or the server code?
Only now do I notice a suggestion:
mmchconfig prefetchAggressivenessRead=0 -i
I did not use this. Would a performance change be expected?
Would the pagepool size be involved in this?
I've never been lost; I was once bewildered for three days, but never lost!
-- Daniel Boone
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
More information about the gpfsug-discuss