[gpfsug-discuss] mmsysmon.py revisited

Jonathon A Anderson jonathon.anderson at colorado.edu
Wed Jul 19 19:29:22 BST 2017


OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As a result, OPA is much more sensitive to task placement and interrupts, in our experience, because the host CPU load competes with the fabric IO processing load.

~jonathon


On 7/19/17, 12:12 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu" <gpfsug-discuss-bounces at spectrumscale.org on behalf of david_johnson at brown.edu> wrote:

    We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. 
    
      -- ddj
    Dave Johnson
    
    On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson <jonathon.anderson at colorado.edu> wrote:
    
    >> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
    > 
    > I suspect it’s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case.
    > 
    > We’ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem.
    > 
    > The official company line of “we don't see significant CPU consumption by mmsysmon on our test systems” isn’t helping. Do you have a test system with OPA?
    > 
    > ~jonathon
    > 
    > 
    > On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" <gpfsug-discuss-bounces at spectrumscale.org on behalf of MDIETZ at de.ibm.com> wrote:
    > 
    >    thanks for the feedback. 
    > 
    >    Let me clarify what mmsysmon is doing.
    >    Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
    >    Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events.
    > 
    >    This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them.
    > 
    > 
    >> It’s a huge problem. I don’t understand why it hasn’t been given
    > 
    >> much credit by dev or support.
    > 
    >    Over the last couple of month, the development team has put a strong focus on this topic.
    > 
    >    In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
    >    We are trying to reduce the polling overhead constantly and replace polling with notifications when possible.
    > 
    > 
    >    Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval)
    > 
    >    See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
    >    In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter.
    > 
    > 
    >    Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems.
    >        
    >    It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
    > 
    >    Kind regards
    > 
    >    Mathias Dietz
    > 
    >    IBM Spectrum Scale - Release Lead Architect and RAS Architect
    > 
    > 
    > 
    >    gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
    > 
    >> From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
    >> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    >> Date: 07/18/2017 07:51 PM
    >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
    >> Sent by: gpfsug-discuss-bounces at spectrumscale.org
    >> 
    >> There’s no official way to cleanly disable it so far as I know yet; 
    >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
    >> mmsysmonitor.conf.
    >> 
    >> It’s a huge problem. I don’t understand why it hasn’t been given 
    >> much credit by dev or support.
    >> 
    >> ~jonathon
    >> 
    >> 
    >> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on 
    >> behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org 
    >> on behalf of david_johnson at brown.edu> wrote:
    >> 
    >> 
    >> 
    >> 
    >>    We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
    >>    our diskless compute nodes. I read the earlier query, where it 
    >> was answered:
    >> 
    >> 
    >> 
    >> 
    >>    ces == Cluster Export Services,  mmsysmon.py comes from 
    >> mmcesmon. It is used for managing export services of GPFS. If it is 
    >> killed,  your nfs/smb etc will be out of work.
    >>    Their overhead is small and they are very important. Don't 
    >> attempt to kill them.
    >> 
    >> 
    >> 
    >> 
    >> 
    >> 
    >>    Our question is this — we don’t run the latest “protocols", our 
    >> NFS is CNFS, and our CIFS is clustered CIFS.
    >>    I can understand it might be needed with Ganesha, but on every node? 
    >> 
    >> 
    >>    Why in the world would I be getting this daemon running on all 
    >> client nodes, when I didn’t install the “protocols" version 
    >>    of the distribution?   We have release 4.2.2 at the moment.  How
    >> can we disable this?
    >> 
    >> 
    >>    Thanks,
    >>     — ddj
    >> 
    >> 
    >> _______________________________________________
    >> gpfsug-discuss mailing list
    >> gpfsug-discuss at spectrumscale.org
    >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > 
    > 
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    



More information about the gpfsug-discuss mailing list