<html><body><p><font size="2">Hi David,</font><br><br><font size="2">Re: "</font><tt><font size="2">The impact we were seeing was some variation in MPI benchmark results when the nodes were fully loaded.</font></tt><font size="2">"</font><br><br><font size="2">MPI workloads show the most mmhealth impact. Specifically the more sensitive the workload is to jitter the higher the potential impact.</font><br><br><font size="2">The mmhealth config interval, as per Mathias's link, is a scalar applied to all monitor interval values in the configuration file. As such it currently modifies the server side monitoring and health reporting in addition to mitigating mpi client impact. So "medium" == 5 is a good perhaps reasonable value - whereas the "slow" == 10 scalar may be too infrequent for your server side monitoring and reporting (so your 30 second update becomes 5 minutes). </font><br><br><font size="2">The clock alignment that Mathias mentioned is a new investigatory undocumented tool for MPI workloads. It nearly completely removes all mmhealth MPI jitter while retaining default monitor intervals. It also naturally generates thundering herds of all client reporting to the quorum nodes. So while you may mitigate the client MPI jitter you may severely impact the server throughput on those intervals if not also exceed connection and thread limits. </font><br><br><font size="2">Configuring "clients" separately from "servers" without resorting to alignment is another area of investigation.</font><br><br><font size="2">I'm not familiar with your PMR but as Mathias mentioned "mmhealth config interval medium" would be a good start. In testing that Kums and I have done the "mmhealth config interval medium" value provides mitigation almost as good as the mentioned clock alignment for MPI for say a psnap with barrier type workload .</font><br><br><font size="2">Regards, Mike Harris</font><br><br><font size="2">IBM Spectrum Scale - Core Team</font><br><br><img width="16" height="16" src="cid:1__=0ABB0BF1DFD841C48f9e8a93df938690918c0AB@" border="0" alt="Inactive hide details for gpfsug-discuss-request---07/19/2017 09:28:57 AM---Send gpfsug-discuss mailing list submissions to  gp"><font size="2" color="#424282">gpfsug-discuss-request---07/19/2017 09:28:57 AM---Send gpfsug-discuss mailing list submissions to  gpfsug-discuss@spectrumscale.org</font><br><br><font size="2" color="#5F5F5F">From:        </font><font size="2">gpfsug-discuss-request@spectrumscale.org</font><br><font size="2" color="#5F5F5F">To:        </font><font size="2">gpfsug-discuss@spectrumscale.org</font><br><font size="2" color="#5F5F5F">Date:        </font><font size="2">07/19/2017 09:28 AM</font><br><font size="2" color="#5F5F5F">Subject:        </font><font size="2">gpfsug-discuss Digest, Vol 66, Issue 30</font><br><font size="2" color="#5F5F5F">Sent by:        </font><font size="2">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br><br><br><tt><font size="2">Send gpfsug-discuss mailing list submissions to<br>                 gpfsug-discuss@spectrumscale.org<br><br>To subscribe or unsubscribe via the World Wide Web, visit<br>                 </font></tt><tt><font size="2"><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></font></tt><tt><font size="2"><br>or, via email, send a message with subject or body 'help' to<br>                 gpfsug-discuss-request@spectrumscale.org<br><br>You can reach the person managing the list at<br>                 gpfsug-discuss-owner@spectrumscale.org<br><br>When replying, please edit your Subject line so it is more specific<br>than "Re: Contents of gpfsug-discuss digest..."<br><br><br>Today's Topics:<br><br>   1. Re: mmsysmon.py revisited (Mathias Dietz)<br>   2. Re: mmsysmon.py revisited (David Johnson)<br><br><br>----------------------------------------------------------------------<br><br>Message: 1<br>Date: Wed, 19 Jul 2017 15:05:49 +0200<br>From: "Mathias Dietz" <MDIETZ@de.ibm.com><br>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><br>Subject: Re: [gpfsug-discuss] mmsysmon.py revisited<br>Message-ID:<br>                 <OFCA7D9A5E.C7B3505A-ONC1258162.00420361-C1258162.0047F174@notes.na.collabserv.com><br>                 <br>Content-Type: text/plain; charset="iso-8859-1"<br><br>thanks for the feedback. <br><br>Let me clarify what mmsysmon is doing.<br>Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the <br>overall health monitoring and CES failover handling.<br>Even without CES it is an essential part of the system because it monitors <br>the individual components and provides health state information and error <br>events. <br>This information is needed by other Spectrum Scale components (mmhealth <br>command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) <br>and therefore disabling mmsysmon will impact them. <br><br>> It?s a huge problem. I don?t understand why it hasn?t been given <br>> much credit by dev or support.<br><br>Over the last couple of month, the development team has put a strong focus <br>on this topic. <br>In order to monitor the health of the individual components, mmsysmon <br>listens for notifications/callback but also has to do some polling.<br>We are trying to reduce the polling overhead constantly and replace <br>polling with notifications when possible. <br><br>Several improvements have been added to 4.2.3, including the ability to <br>configure the polling frequency to reduce the overhead. (mmhealth config <br>interval) <br>See <br></font></tt><tt><font size="2"><a href="https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm">https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm</a></font></tt><tt><font size="2"><br>In addition a new option has been introduced to clock align the monitoring <br>threads in order to reduce CPU jitter. <br><br>Nevertheless, we don't see significant CPU consumption by mmsysmon on our <br>test systems. <br>It might be a problem specific to your system environment or a wrong <br>configuration therefore please get in contact with IBM support to analyze <br>the root cause of the high usage.<br><br>Kind regards<br><br>Mathias Dietz<br><br>IBM Spectrum Scale - Release Lead Architect and RAS Architect <br><br><br>gpfsug-discuss-bounces@spectrumscale.org wrote on 07/18/2017 07:51:21 PM:<br><br>> From: Jonathon A Anderson <jonathon.anderson@colorado.edu><br>> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><br>> Date: 07/18/2017 07:51 PM<br>> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited<br>> Sent by: gpfsug-discuss-bounces@spectrumscale.org<br>> <br>> There?s no official way to cleanly disable it so far as I know yet; <br>> but you can defacto disable it by deleting /var/mmfs/mmsysmon/<br>> mmsysmonitor.conf.<br>> <br>> It?s a huge problem. I don?t understand why it hasn?t been given <br>> much credit by dev or support.<br>> <br>> ~jonathon<br>> <br>> <br>> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces@spectrumscale.org on <br>> behalf of David Johnson" <gpfsug-discuss-bounces@spectrumscale.org <br>> on behalf of david_johnson@brown.edu> wrote:<br>> <br>> <br>> <br>> <br>>     We also noticed a fair amount of CPU time accumulated by mmsysmon.py <br>on<br>>     our diskless compute nodes. I read the earlier query, where it <br>> was answered:<br>> <br>> <br>> <br>> <br>>     ces == Cluster Export Services,  mmsysmon.py comes from <br>> mmcesmon. It is used for managing export services of GPFS. If it is <br>> killed,  your nfs/smb etc will be out of work.<br>>     Their overhead is small and they are very important. Don't <br>> attempt to kill them.<br>> <br>> <br>> <br>> <br>> <br>> <br>>     Our question is this ? we don?t run the latest ?protocols", our <br>> NFS is CNFS, and our CIFS is clustered CIFS.<br>>     I can understand it might be needed with Ganesha, but on every node? <br><br>> <br>> <br>>     Why in the world would I be getting this daemon running on all <br>> client nodes, when I didn?t install the ?protocols" version <br>>     of the distribution?   We have release 4.2.2 at the moment.  How<br>> can we disable this?<br>> <br>> <br>>     Thanks,<br>>      ? ddj<br>> <br>> <br>> _______________________________________________<br>> gpfsug-discuss mailing list<br>> gpfsug-discuss at spectrumscale.org<br>> </font></tt><tt><font size="2"><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></font></tt><tt><font size="2"><br><br><br>-------------- next part --------------<br>An HTML attachment was scrubbed...<br>URL: <</font></tt><tt><font size="2"><a href="http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/8c0e33e9/attachment-0001.html">http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/8c0e33e9/attachment-0001.html</a></font></tt><tt><font size="2">><br><br>------------------------------<br><br>Message: 2<br>Date: Wed, 19 Jul 2017 09:28:23 -0400<br>From: David Johnson <david_johnson@brown.edu><br>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><br>Subject: Re: [gpfsug-discuss] mmsysmon.py revisited<br>Message-ID: <BA818B33-1758-469C-9CE3-D8F870F6CAF5@brown.edu><br>Content-Type: text/plain; charset="utf-8"<br><br>I have opened a PMR, and the official response reflects what you just posted.<br>In addition, it seems there are some performance issues with Python 2 that will be <br>improved with eventual migration to Python 3.  I was unaware of the mmhealth<br>functions that the mmsysmon daemon provides. The impact we were seeing <br>was some variation in MPI benchmark results when the nodes were fully loaded.<br>I suppose it would be possible to turn off mmsysmon during the benchmarking,<br>but I appreciate the effort at streamlining the monitor service.  Cutting back on<br>fork/exec, better python, less polling, more notifications?  all good.<br><br>Thanks for the details,<br><br> ? ddj<br><br>> On Jul 19, 2017, at 9:05 AM, Mathias Dietz <MDIETZ@de.ibm.com> wrote:<br>> <br>> thanks for the feedback. <br>> <br>> Let me clarify what mmsysmon is doing.<br>> Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.<br>> Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. <br>> This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. <br>> <br>> > It?s a huge problem. I don?t understand why it hasn?t been given <br>> > much credit by dev or support.<br>> <br>> Over the last couple of month, the development team has put a strong focus on this topic. <br>> In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.<br>> We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. <br>> <br>> Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) <br>> See </font></tt><tt><font size="2"><a href="https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm">https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm</a></font></tt><tt><font size="2"> <</font></tt><tt><font size="2"><a href="https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm">https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm</a></font></tt><tt><font size="2">><br>> In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. <br>> <br>> Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. <br>> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.<br>> <br>> Kind regards<br>> <br>> Mathias Dietz<br>> <br>> IBM Spectrum Scale - Release Lead Architect and RAS Architect <br>> <br>> <br>> gpfsug-discuss-bounces@spectrumscale.org wrote on 07/18/2017 07:51:21 PM:<br>> <br>> > From: Jonathon A Anderson <jonathon.anderson@colorado.edu><br>> > To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><br>> > Date: 07/18/2017 07:51 PM<br>> > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited<br>> > Sent by: gpfsug-discuss-bounces@spectrumscale.org<br>> > <br>> > There?s no official way to cleanly disable it so far as I know yet; <br>> > but you can defacto disable it by deleting /var/mmfs/mmsysmon/<br>> > mmsysmonitor.conf.<br>> > <br>> > It?s a huge problem. I don?t understand why it hasn?t been given <br>> > much credit by dev or support.<br>> > <br>> > ~jonathon<br>> > <br>> > <br>> > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces@spectrumscale.org on <br>> > behalf of David Johnson" <gpfsug-discuss-bounces@spectrumscale.org <br>> > on behalf of david_johnson@brown.edu> wrote:<br>> > <br>> >     <br>> >     <br>> >     <br>> >     We also noticed a fair amount of CPU time accumulated by mmsysmon.py on<br>> >     our diskless compute nodes. I read the earlier query, where it <br>> > was answered:<br>> >     <br>> >     <br>> >     <br>> >     <br>> >     ces == Cluster Export Services,  mmsysmon.py comes from <br>> > mmcesmon. It is used for managing export services of GPFS. If it is <br>> > killed,  your nfs/smb etc will be out of work.<br>> >     Their overhead is small and they are very important. Don't <br>> > attempt to kill them.<br>> >     <br>> >     <br>> >     <br>> >     <br>> >     <br>> >     <br>> >     Our question is this ? we don?t run the latest ?protocols", our <br>> > NFS is CNFS, and our CIFS is clustered CIFS.<br>> >     I can understand it might be needed with Ganesha, but on every node? <br>> >     <br>> >     <br>> >     Why in the world would I be getting this daemon running on all <br>> > client nodes, when I didn?t install the ?protocols" version <br>> >     of the distribution?   We have release 4.2.2 at the moment.  How<br>> > can we disable this?<br>> >     <br>> >     <br>> >     Thanks,<br>> >      ? ddj<br>> >     <br>> > <br>> > _______________________________________________<br>> > gpfsug-discuss mailing list<br>> > gpfsug-discuss at spectrumscale.org<br>> > </font></tt><tt><font size="2"><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></font></tt><tt><font size="2"> <</font></tt><tt><font size="2"><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></font></tt><tt><font size="2">><br>> <br>> _______________________________________________<br>> gpfsug-discuss mailing list<br>> gpfsug-discuss at spectrumscale.org<br>> </font></tt><tt><font size="2"><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></font></tt><tt><font size="2"><br><br>-------------- next part --------------<br>An HTML attachment was scrubbed...<br>URL: <</font></tt><tt><font size="2"><a href="http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/669c525b/attachment.html">http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170719/669c525b/attachment.html</a></font></tt><tt><font size="2">><br><br>------------------------------<br><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><tt><font size="2"><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></font></tt><tt><font size="2"><br><br><br>End of gpfsug-discuss Digest, Vol 66, Issue 30<br>**********************************************<br><br></font></tt><br><br><BR>

</body></html>