[gpfsug-discuss] alphafold and mmap performance

Thu Oct 21 00:19:51 BST 2021

Thanks Olaf, Jon and Fred.  Some more details below.

We may just need to wait on things to evolve (us getting Spectrum
Scale 5 installed, alphafold getting HPC specific improvements).  It
will also be driven by whether our users have a real need for
alphafold or are just enthusiastic due to the press releases.

On Tue, 19 Oct 2021 at 16:27 -0000, Olaf Weiser wrote:

> > [...] We have tried a number of things including Spectrum Scale
> > client version 5.0.5-9[...]

> in the client code or the server code?

Our main client code is 4.2.3-22 but I'm trying 5.0.5-9 on a test
client.  The server code is (very old) 4.2.3-5.

> there are going  multiple improvements in the code.. continuously...
> Since your version 4.2.3 /  5.0.5 a lot of them are in the area of
> NSD server/GNR (which is server based) and also a lot of
> enhancements went into the client part. Some are on both .. such as
> RoCE, or using multiple TCP/IP sockets per communication pair,
> etc.... All this influences your performance..

Thanks for the information.  Some of this sounds good.  We had upgrade
issues with DDN but we now have a license for Spectrum Scale 5.  Its
now mostly getting enough cycles to do the update.

> But Id like to try to give you some answers to  your specific Q -
> > Only now do I notice a suggestion:
> >     mmchconfig prefetchAggressivenessRead=0 -i
> > I did not use this.  Would a performance change be expected?

> YES;-)  .. this parameter should really help..

I'm trying this now with the 5.0 client.  Initial indications are that
there may be about 50% performance improvement but that is still
significantly lower than we would hope.

Using "mmdiag --iohist" we were seeing 750-900 8 sector reads per
second.  With prefetchAggressivenessRead=0 it looks the 8 sector reads
seem about as frequent but there are often (5-10/second) reads of
100-2000 sectors in the mix.  A rough estimate is the large reads are
for about the same amount of data as the 8 sector reads.

The number of large sector reads seem to be decreasing over time.

I don't know the specifics of the algorithm but I image there is a lot
of jumping around in the data.  The early large reads may have brought
in the more common regions and now it is filling the less dense
regions.  Just a thought.

> from the UG expert talk 2020 we shared some numbers/charts on it
> https://www.spectrumscaleug.org/event/ssugdigital-spectrum-scale-expert-talks-update-on-per
> formance-enhancements-in-spectrum-scale/ starting ~ 8:30 minutes /
> just 2 slides  ... let us know, if you need more information

Yes, I had looked at the slides but not listened to the talk which was
a mistake.  There were some other interesting tidbits.  In particular
if we can get this to work we may try a scheduler prolog/epilog to
change the parameter.  We can look at that after our move from Grid
Engine to Slurm which requires other cycles.

On Tue, 19 Oct 2021 at 14:12 -0000, Jon Diprose wrote:

> If I remember right, the reference data is ~2.3TB, ruling out our
> usual approach of just reading the problematic files into a ramdisk
> first.

We found the critical file is about 1.5TB and we are able to load that
into ramdisk on a 2TB system (but it doesn't have any GPUs).

We also have some old "spare" hardware that might be built as an NFS
appliance for this purpose.  I would prefer to see the ~10 year old
hardware die.

The alphafold application is one large monolith.  The first phase does
some large I/O and CPU intensive operations.  The second phase does
some GPU operations.  We would prefer to separate the non-GPU code
from the GPU code so we could have the GPU systems doing GPU stuff.
We do this quite effectively with some of our other GPU code with CPU
based pre/post processing.

Stuart
-- 
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone