[gpfsug-discuss] Thoughts on GPFS on IB & MTU sizes

Edward Wahl ewahl at osc.edu
Fri Mar 9 14:19:10 GMT 2018


Welcome to the list.   

If Hideaki Kikuchi is still around CCAST, say "Oh hisashiburi, des ne?" for me.
Though I recall he may have left.


A couple of questions as I, unfortunately, have a good deal of expel experience.

-Are you set up to use verbs or only IPoIB? "mmlsconfig verbsRdma"

-Are you using the IB as the administrative IP network?

-As Wei asked, can nodes sending the expel requests ping the victim over
whatever interface is being used administratively?  Other interfaces do NOT
matter for expels. Nodes that cannot even mount the file systems can still
request expels.  Many many things can cause issues here from routing and
firewalls to bad switch software which will not update ARP tables, and you get
nodes trying to expel each other.

-are your NSDs logging the expels in /tmp/mmfs?  You can mmchconfig
expelDataCollectionDailyLimit if you need more captures to narrow down what is
happening outside the mmfs.log.latest.  Just be wary of the disk space if you
have "expel storms".

-That tuning page is very out of date and appears to be mostly focused on GPFS
3.5.x tuning.   While there is also a Spectrum Scale wiki, it's Linux tuning
page does not appear to be kernel and network focused and is dated even older.


Ed



On Thu, 8 Mar 2018 15:06:03 +0000
"Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu> wrote:

> Hi Folks,
> 
> 
> As this is my first post to the group, let me start by saying I applaud the
> commentary from the user group as it has been a resource to those of us
> watching from the sidelines.
> 
> 
> That said, we have a GPFS layered on IPoIB, and recently, we started having
> some issues on our IB FDR fabric which manifested when GPFS began sending
> persistent expel messages to particular nodes.
> 
> 
> Shortly after, we embarked on a tuning exercise using IBM tuning
> recommendations<https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Welcome%20to%20High%20Performance%20Computing%20%28HPC%29%20Central/page/Linux%20System%20Tuning%20Recommendations>
> but this page is quite old and we've run into some snags, specifically with
> setting 4k MTUs using mlx4_core/mlx4_en module options.
> 
> 
> While setting 4k MTUs as the guide recommends is our general inclination, I'd
> like to solicit some advice as to whether 4k MTUs are a good idea and any
> hitch-free steps to accomplishing this. I'm getting some conflicting remarks
> from Mellanox support asking why we'd want to use 4k MTUs with Unreliable
> Datagram mode.
> 
> 
> Also, any pointers to best practices or resources for network configurations
> for heavy I/O clusters would be much appreciated.
> 
> 
> Thanks,
> 
> Siji Saula
> HPC System Administrator
> Center for Computationally Assisted Science & Technology
> NORTH DAKOTA STATE UNIVERSITY
> 
> 
> <https://www.ndsu.edu/alphaindex/buildings/Building::395>Research 2
> Building<https://www.ndsu.edu/alphaindex/buildings/Building::396><https://www.ndsu.edu/alphaindex/buildings/Building::395>
> – Room 220B Dept 4100, PO Box 6050  / Fargo, ND 58108-6050 p:701.231.7749
> www.ccast.ndsu.edu<file://composeviewinternalloadurl/www.ccast.ndsu.edu> |
> www.ndsu.edu<file://composeviewinternalloadurl/www.ndsu.edu>
> 



-- 

Ed Wahl
Ohio Supercomputer Center
614-292-9302



More information about the gpfsug-discuss mailing list