[gpfsug-discuss] Singularity + GPFS

Vanessa Borcherding vborcher at linkedin.com
Thu Apr 26 19:59:38 BST 2018


Hi All,

In my previous life at Weill Cornell, I benchmarked Singularity pretty extensively for bioinformatics applications on a GPFS 4.2 cluster, and saw virtually no overhead whatsoever. However, I did not allow MPI jobs for those workloads, so that may be the key differentiator here. You may wish to reach out to Greg Kurtzer and his team too - they're super responsive on github and have a slack channel that you can join. His email address is gmkurtzer at gmail.com.


Vanessa

On 4/26/18, 9:01 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" <gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org> wrote:

    Send gpfsug-discuss mailing list submissions to
    	gpfsug-discuss at spectrumscale.org
    
    To subscribe or unsubscribe via the World Wide Web, visit
    	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    or, via email, send a message with subject or body 'help' to
    	gpfsug-discuss-request at spectrumscale.org
    
    You can reach the person managing the list at
    	gpfsug-discuss-owner at spectrumscale.org
    
    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of gpfsug-discuss digest..."
    
    
    Today's Topics:
    
       1. Re: Singularity + GPFS (Nathan Harper)
    
    
    ----------------------------------------------------------------------
    
    Message: 1
    Date: Thu, 26 Apr 2018 17:00:56 +0100
    From: Nathan Harper <nathan.harper at cfms.org.uk>
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: Re: [gpfsug-discuss] Singularity + GPFS
    Message-ID:
    	<CAOQ_t5akG7YKU9NwyBJ4-K+7KM8Ub+3OS_+a7y=KJce6OciBGg at mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"
    
    We had an issue with a particular application writing out output in
    parallel - (I think) including gpfs.h seemed to fix the problem, but we
    might also have had a clockskew issue on the compute nodes at the same
    time, so we aren't sure exactly which fixed it.
    
    My chaos monkeys aren't those that resist guidance, but instead are the
    ones that will employ all the tools at their disposal to improve
    performance.  A lot of our applications aren't doing MPI-IO, so my very
    capable parallel filesystem is idling while a single rank is
    reading/writing.   However, some will hit the filesystem much harder or
    exercise less used functionality, and I'm keen to make sure that works
    through Singularity as well.
    
    On 26 April 2018 at 16:31, David Johnson <david_johnson at brown.edu> wrote:
    
    > Regarding MPI-IO, how do you mean ?building the applications against
    > GPFS??
    > We try to advise our users about things to avoid, but we have some
    > poster-ready
    > ?chaos monkeys? as well, who resist guidance.  What apps do your users
    > favor?
    > Molpro is one of our heaviest apps right now.
    > Thanks,
    >  ? ddj
    >
    >
    > On Apr 26, 2018, at 11:25 AM, Nathan Harper <nathan.harper at cfms.org.uk>
    > wrote:
    >
    > Happy to share on the list in case anyone else finds it useful:
    >
    > We use GPFS for home/scratch on our HPC clusters, supporting engineering
    > applications, so 95+% of our jobs are multi-node MPI.   We have had some
    > questions/concerns about GPFS+Singularity+MPI-IO, as we've had issues with
    > GPFS+MPI-IO in the past that was solved by building the applications
    > against GPFS.   If users start using Singularity containers, we then can't
    > guarantee how the contained applications have been built.
    >
    > I've got a small test system (2 nsd nodes, 6 compute nodes) to see if we
    > can break it, before we deploy onto our production systems.   Everything
    > seems to be ok under synthetic benchmarks, but I've handed over to one of
    > my chaos monkey users to let him do his worst.
    >
    > On 26 April 2018 at 15:53, Yugendra Guvvala <yguvvala at cambridgecomputer.
    > com> wrote:
    >
    >> I am interested to learn this too. So please add me sending a direct
    >> mail.
    >>
    >> Thanks,
    >> Yugi
    >>
    >> On Apr 26, 2018, at 10:51 AM, Oesterlin, Robert <
    >> Robert.Oesterlin at nuance.com> wrote:
    >>
    >> Hi Lohit, Nathan
    >>
    >>
    >>
    >> Would you be willing to share some more details about your setup? We are
    >> just getting started here and I would like to hear about what your
    >> configuration looks like. Direct email to me is fine, thanks.
    >>
    >>
    >>
    >> Bob Oesterlin
    >>
    >> Sr Principal Storage Engineer, Nuance
    >>
    >>
    >>
    >>
    >>
    >> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
    >> valleru at cbio.mskcc.org" <valleru at cbio.mskcc.org>
    >> *Reply-To: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org
    >> >
    >> *Date: *Thursday, April 26, 2018 at 9:45 AM
    >> *To: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    >> *Subject: *[EXTERNAL] Re: [gpfsug-discuss] Singularity + GPFS
    >>
    >>
    >>
    >> We do run Singularity + GPFS, on our production HPC clusters.
    >>
    >> Most of the time things are fine without any issues.
    >>
    >>
    >>
    >> However, i do see a significant performance loss when running some
    >> applications on singularity containers with GPFS.
    >>
    >>
    >>
    >> As of now, the applications that have severe performance issues with
    >> singularity on GPFS - seem to be because of ?mmap io?. (Deep learning
    >> applications)
    >>
    >> When i run the same application on bare metal, they seem to have a huge
    >> difference in GPFS IO when compared to running on singularity containers.
    >>
    >> I am yet to raise a PMR about this with IBM.
    >>
    >> I have not seen performance degradation for any other kind of IO, but i
    >> am not sure.
    >>
    >>
    >> Regards,
    >> Lohit
    >>
    >>
    >> On Apr 26, 2018, 10:35 AM -0400, Nathan Harper <nathan.harper at cfms.org.uk>,
    >> wrote:
    >>
    >> We are running on a test system at the moment, and haven't run into any
    >> issues yet, but so far it's only been 'hello world' and running FIO.
    >>
    >>
    >>
    >> I'm interested to hear about experience with MPI-IO within Singularity.
    >>
    >>
    >>
    >> On 26 April 2018 at 15:20, Oesterlin, Robert <Robert.Oesterlin at nuance.com>
    >> wrote:
    >>
    >> Anyone (including IBM) doing any work in this area? I would appreciate
    >> hearing from you.
    >>
    >>
    >>
    >> Bob Oesterlin
    >>
    >> Sr Principal Storage Engineer, Nuance
    >>
    >>
    >>
    >>
    >> _______________________________________________
    >> gpfsug-discuss mailing list
    >> gpfsug-discuss at spectrumscale.org
    >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=_6XPyvEYCL-PnMjKsOcjeRy1H19CJ1ujTFPrAEwWnLo&s=OTGc6UhyurUT-JrFZxFuLp_VKCTGGknEbzjMSKUa-QQ&e=>
    >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=_6XPyvEYCL-PnMjKsOcjeRy1H19CJ1ujTFPrAEwWnLo&s=hPDughTsSwQUdR8KHv4gxXMumoGw3jXLgGGXG1MQrJM&e=>
    >>
    >>
    >>
    >>
    >>
    >> --
    >>
    >> *Nathan* *Harper* // IT Systems Lead
    >>
    >>
    >>
    >> [image: Image removed by sender.]
    >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cfms.org.uk_news-2Devents_events_2018_july_farnborough-2Dinternational-2Dairshow-2D2018_&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=_6XPyvEYCL-PnMjKsOcjeRy1H19CJ1ujTFPrAEwWnLo&s=SAiH6rxM-6WYWwUBRecbBS9Yr7f4adg4GizgVJXIAqI&e=>
    >>
    >>
    >> *e: *nathan.harper at cfms.org.uk   *t*: 0117 906 1104  *m*:  0787 551 0891
    >>  *w: *www.cfms.org.uk
    >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cfms.org.uk_&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=_6XPyvEYCL-PnMjKsOcjeRy1H19CJ1ujTFPrAEwWnLo&s=KAbz_Md-BsCIrN6LDOb6Yc7sriwSq5IeO4aaxRGQfjc&e=>
    >>
    >>
    >> CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons
    >> Green // Bristol // BS16 7FR
    >>
    >> [image: Image removed by sender.]
    >>
    >> CFMS Services Ltd is registered in England and Wales No 05742022 - a
    >> subsidiary of CFMS Ltd
    >> CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1
    >> 4QP
    >>
    >> _______________________________________________
    >> gpfsug-discuss mailing list
    >> gpfsug-discuss at spectrumscale.org
    >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >>
    >> _______________________________________________
    >> gpfsug-discuss mailing list
    >> gpfsug-discuss at spectrumscale.org
    >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >>
    >>
    >> _______________________________________________
    >> gpfsug-discuss mailing list
    >> gpfsug-discuss at spectrumscale.org
    >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >>
    >>
    >
    >
    > --
    > *Nathan Harper* // IT Systems Lead
    >
    >
    > <https://cfms.org.uk/news-events/events/2018/july/farnborough-international-airshow-2018/>
    >
    > *e: *nathan.harper at cfms.org.uk   *t*: 0117 906 1104  *m*:  0787 551 0891
    > *w: *www.cfms.org.uk
    > CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons
    > Green // Bristol // BS16 7FR
    >
    > CFMS Services Ltd is registered in England and Wales No 05742022 - a
    > subsidiary of CFMS Ltd
    > CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1
    > 4QP
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >
    >
    >
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >
    >
    
    
    -- 
    *Nathan Harper* // IT Systems Lead
    
    <https://cfms.org.uk/news-events/events/2018/july/farnborough-international-airshow-2018/>
    
    *e: *nathan.harper at cfms.org.uk   *t*: 0117 906 1104  *m*:  0787 551 0891
    *w: *www.cfms.org.uk
    CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons
    Green // Bristol // BS16 7FR
    
    CFMS Services Ltd is registered in England and Wales No 05742022 - a
    subsidiary of CFMS Ltd
    CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1
    4QP
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180426/ca47f922/attachment.html>
    
    ------------------------------
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    
    End of gpfsug-discuss Digest, Vol 75, Issue 56
    **********************************************
    



More information about the gpfsug-discuss mailing list