[gpfsug-discuss] Singularity + GPFS

Nathan Harper nathan.harper at cfms.org.uk
Thu Apr 26 17:00:56 BST 2018


We had an issue with a particular application writing out output in
parallel - (I think) including gpfs.h seemed to fix the problem, but we
might also have had a clockskew issue on the compute nodes at the same
time, so we aren't sure exactly which fixed it.

My chaos monkeys aren't those that resist guidance, but instead are the
ones that will employ all the tools at their disposal to improve
performance.  A lot of our applications aren't doing MPI-IO, so my very
capable parallel filesystem is idling while a single rank is
reading/writing.   However, some will hit the filesystem much harder or
exercise less used functionality, and I'm keen to make sure that works
through Singularity as well.

On 26 April 2018 at 16:31, David Johnson <david_johnson at brown.edu> wrote:

> Regarding MPI-IO, how do you mean “building the applications against
> GPFS”?
> We try to advise our users about things to avoid, but we have some
> poster-ready
> “chaos monkeys” as well, who resist guidance.  What apps do your users
> favor?
> Molpro is one of our heaviest apps right now.
> Thanks,
>  — ddj
>
>
> On Apr 26, 2018, at 11:25 AM, Nathan Harper <nathan.harper at cfms.org.uk>
> wrote:
>
> Happy to share on the list in case anyone else finds it useful:
>
> We use GPFS for home/scratch on our HPC clusters, supporting engineering
> applications, so 95+% of our jobs are multi-node MPI.   We have had some
> questions/concerns about GPFS+Singularity+MPI-IO, as we've had issues with
> GPFS+MPI-IO in the past that was solved by building the applications
> against GPFS.   If users start using Singularity containers, we then can't
> guarantee how the contained applications have been built.
>
> I've got a small test system (2 nsd nodes, 6 compute nodes) to see if we
> can break it, before we deploy onto our production systems.   Everything
> seems to be ok under synthetic benchmarks, but I've handed over to one of
> my chaos monkey users to let him do his worst.
>
> On 26 April 2018 at 15:53, Yugendra Guvvala <yguvvala at cambridgecomputer.
> com> wrote:
>
>> I am interested to learn this too. So please add me sending a direct
>> mail.
>>
>> Thanks,
>> Yugi
>>
>> On Apr 26, 2018, at 10:51 AM, Oesterlin, Robert <
>> Robert.Oesterlin at nuance.com> wrote:
>>
>> Hi Lohit, Nathan
>>
>>
>>
>> Would you be willing to share some more details about your setup? We are
>> just getting started here and I would like to hear about what your
>> configuration looks like. Direct email to me is fine, thanks.
>>
>>
>>
>> Bob Oesterlin
>>
>> Sr Principal Storage Engineer, Nuance
>>
>>
>>
>>
>>
>> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
>> valleru at cbio.mskcc.org" <valleru at cbio.mskcc.org>
>> *Reply-To: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org
>> >
>> *Date: *Thursday, April 26, 2018 at 9:45 AM
>> *To: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> *Subject: *[EXTERNAL] Re: [gpfsug-discuss] Singularity + GPFS
>>
>>
>>
>> We do run Singularity + GPFS, on our production HPC clusters.
>>
>> Most of the time things are fine without any issues.
>>
>>
>>
>> However, i do see a significant performance loss when running some
>> applications on singularity containers with GPFS.
>>
>>
>>
>> As of now, the applications that have severe performance issues with
>> singularity on GPFS - seem to be because of “mmap io”. (Deep learning
>> applications)
>>
>> When i run the same application on bare metal, they seem to have a huge
>> difference in GPFS IO when compared to running on singularity containers.
>>
>> I am yet to raise a PMR about this with IBM.
>>
>> I have not seen performance degradation for any other kind of IO, but i
>> am not sure.
>>
>>
>> Regards,
>> Lohit
>>
>>
>> On Apr 26, 2018, 10:35 AM -0400, Nathan Harper <nathan.harper at cfms.org.uk>,
>> wrote:
>>
>> We are running on a test system at the moment, and haven't run into any
>> issues yet, but so far it's only been 'hello world' and running FIO.
>>
>>
>>
>> I'm interested to hear about experience with MPI-IO within Singularity.
>>
>>
>>
>> On 26 April 2018 at 15:20, Oesterlin, Robert <Robert.Oesterlin at nuance.com>
>> wrote:
>>
>> Anyone (including IBM) doing any work in this area? I would appreciate
>> hearing from you.
>>
>>
>>
>> Bob Oesterlin
>>
>> Sr Principal Storage Engineer, Nuance
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=_6XPyvEYCL-PnMjKsOcjeRy1H19CJ1ujTFPrAEwWnLo&s=OTGc6UhyurUT-JrFZxFuLp_VKCTGGknEbzjMSKUa-QQ&e=>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=_6XPyvEYCL-PnMjKsOcjeRy1H19CJ1ujTFPrAEwWnLo&s=hPDughTsSwQUdR8KHv4gxXMumoGw3jXLgGGXG1MQrJM&e=>
>>
>>
>>
>>
>>
>> --
>>
>> *Nathan* *Harper* // IT Systems Lead
>>
>>
>>
>> [image: Image removed by sender.]
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cfms.org.uk_news-2Devents_events_2018_july_farnborough-2Dinternational-2Dairshow-2D2018_&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=_6XPyvEYCL-PnMjKsOcjeRy1H19CJ1ujTFPrAEwWnLo&s=SAiH6rxM-6WYWwUBRecbBS9Yr7f4adg4GizgVJXIAqI&e=>
>>
>>
>> *e: *nathan.harper at cfms.org.uk   *t*: 0117 906 1104  *m*:  0787 551 0891
>>  *w: *www.cfms.org.uk
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cfms.org.uk_&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=_6XPyvEYCL-PnMjKsOcjeRy1H19CJ1ujTFPrAEwWnLo&s=KAbz_Md-BsCIrN6LDOb6Yc7sriwSq5IeO4aaxRGQfjc&e=>
>>
>>
>> CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons
>> Green // Bristol // BS16 7FR
>>
>> [image: Image removed by sender.]
>>
>> CFMS Services Ltd is registered in England and Wales No 05742022 - a
>> subsidiary of CFMS Ltd
>> CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1
>> 4QP
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
>
> --
> *Nathan Harper* // IT Systems Lead
>
>
> <https://cfms.org.uk/news-events/events/2018/july/farnborough-international-airshow-2018/>
>
> *e: *nathan.harper at cfms.org.uk   *t*: 0117 906 1104  *m*:  0787 551 0891
> *w: *www.cfms.org.uk
> CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons
> Green // Bristol // BS16 7FR
>
> CFMS Services Ltd is registered in England and Wales No 05742022 - a
> subsidiary of CFMS Ltd
> CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1
> 4QP
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>


-- 
*Nathan Harper* // IT Systems Lead

<https://cfms.org.uk/news-events/events/2018/july/farnborough-international-airshow-2018/>

*e: *nathan.harper at cfms.org.uk   *t*: 0117 906 1104  *m*:  0787 551 0891
*w: *www.cfms.org.uk
CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons
Green // Bristol // BS16 7FR

CFMS Services Ltd is registered in England and Wales No 05742022 - a
subsidiary of CFMS Ltd
CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1
4QP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180426/ca47f922/attachment-0002.htm>


More information about the gpfsug-discuss mailing list