[gpfsug-discuss] [External] Re: DSS-G upgrade and ConnectX-4 settings

Ryan Novosielski novosirj at rutgers.edu
Sat Sep 10 21:50:13 BST 2022


> On Sep 10, 2022, at 16:38, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk> wrote:
> 
> On Fri, 2022-09-09 at 15:53 +0000, Simon Thompson2 wrote:
>> 
>> Just to note that I’ve picked this up directly with our development
>> team and support teams internally. It is not intended behaviour to
>> reset the adapter settings.
>> 
>> 
>> @JAB, could you let me know the DSS-G release you were moving from
>> and to get to 4.2a? Maybe there is some specific from and to release
>> where this occurs.
> 
> Lots of experimentation today with repeated reinstalls of 4.2b,
> now Lenovo's screwup with respect to my ESD entitlements has been
> sorted.
> 
> It has nothing to do with firmware updates. If I reinstall the server
> via xcat the ports get set back to Infiniband, even if the server
> already has 4.2b installed.
> 
> If I reset the ports back to Ethernet and do a forced install of the
> firmware (because I am already at the right level and that's what the
> DSS-G does see *1) then reboot the machine and it comes back up with
> the ports still in Ethernet mode.
> 
> For reference both 4.2a and 4.2b have the same firmware level for the
> ConnectX-4 PCIe FDR 2-Port QSFP VPI adapter (SN30L27795_Ax) installed
> in the machine specifically 12.28.2006 
> 
> However what I did notice is that the install is putting in Mellanox
> drivers rather than relying on the standard kernel drivers and stack in
> RHEL 8.
> 
> This is noticable because the Mellanox drivers cause the device names
> to change from the form ens5f1 to ens5f1np1. The result is a right mess
> of the networking setup. Not sure who thought this was a good idea because they clearly didn't check to make sure it didn't make a right mess of things.
> 
> Anyway I am reasonably confident that it's the installation of the
> Mellanox drivers that is messing everything up. I just can't see in the
> several hundred lines of Perl where it changes things back to default
> yet.
> 
> I am not impressed with the new xcat method of install that ignores all
> the postscripts in the postscript table so I have to mess about
> manaually setting things. Well I don't I am going to edit
> dssgserver.stanza and add them there thank you very much and you should
> jolly well be documenting this shenanigins IMHO. Frankly the engineers
> at Lenovo have been on the crack pipe again.
> 
> 
> JAB.
> 
> 1. /install/dss-g-4.2b-standard-5.1/opt/lenovo/dss/bin/dsschfw-ofed

It’s useful to hear these experiences before our site does this this month. What sort of DSS-G hardware do you have?

As far as the change of the networking interfaces, but believe that is something that happened in a relatively recent version of OFED, 5.2 to 5.4 maybe? It’s in the release notes anyhow. That bit me once on a different system, and I rolled back until I could plan for that better.

However, DSS-G installs have always used OFED, and GSS before them, not the RHEL-supplied drivers. I am as close to 100% confident about that as I am with anything.


More information about the gpfsug-discuss mailing list