[gpfsug-discuss] SLURM scripts/policy for data movement into a flash pool?

Alex Chekholko alex at calicolabs.com
Wed Mar 6 17:13:18 GMT 2019


Hi,

I have tried this before and I would like to temper your expectations.

If you use a placement policy to allow users to write any files into your
"small" pool (e.g. by directory), they will get E_NOSPC when your small
pool fills up.  And they will be confused because they can't see the pool
configuration, they just see a large filesystem with lots of space.  I
think there may now be an "overflow" policy but it will only work for new
files, not if someone keeps writing into an existing file in an existing
pool.

If you use a migration policy (even based on heat map) it is still a
periodic scheduled data movement and not anything that happens "on the
fly".  Also, "fileheat" only gets updated at some interval anyway.

If you use a migration policy to move data between pools, you may starve
users of I/O which will confuse your users because suddenly things are
slow.  I think there is now a QOS way to throttle your data migration.  I
guess it depends on how much of your disk I/O throughput is not used; if
your disks are already churning, migrations will just slow everything down.

Think of it less like a cache layer and more like two separate storage
locations.  If a bunch of jobs want to read the same files from your big
pool, it's probably faster to just have them read from the big pool
directly rather than have some kind of prologue job to read the data from
the big pool, write it into the small poool, then have the jobs read from
the small pool.

Also, my experience was with pool ratios of like 10%/90%, yours is more
like 2%/98%.  However, mine were with write-heavy workloads (typical
university environment with quickly growing capacity utilization).

Hope these anecdotes help.  Also, it could be that things work a bit
differently now in new versions.

Regards,
Alex


On Wed, Mar 6, 2019 at 3:13 AM Jake Carroll <jake.carroll at uq.edu.au> wrote:

> Hi Scale-folk.
>
> I have an IBM ESS GH14S building block currently configured for my HPC
> workloads.
>
> I've got about 1PB of /scratch filesystem configured in mechanical
> spindles via GNR and about 20TB of SSD/flash sitting in another GNR
> filesystem at the moment. My intention is to destroy that stand-alone flash
> filesystem eventually and use storage pools coupled with GPFS policy to
> warm up workloads into that flash storage:
>
>
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_storagepool.htm
>
> A little dated, but that kind of thing.
>
> Does anyone have any experience in this space in using flash storage
> inside a pool with pre/post flight SLURM scripts to puppeteer GPFS policy
> to warm data up?
>
> I had a few ideas for policy construction around file size, file count,
> file access intensity. Someone mentioned heat map construction and mmdiag
> --iohist to me the other day. Could use some background there.
>
> If anyone has any SLURM specific integration tips for the scheduler or
> pre/post flight bits for SBATCH, it'd be really very much appreciated.
>
> This array really does fly along and surpassed my expectations - but, I
> want to get the most out of it that I can for my users - and I think
> storage pool automation and good file placement management is going to be
> an important part of that.
>
> Thank you.
>
> -jc
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190306/69a3d2f3/attachment-0002.htm>


More information about the gpfsug-discuss mailing list