[gpfsug-discuss] Dataless nodes as GPFS clients

Mon Feb 17 18:57:47 GMT 2020

We do this. We provision only the GPFS key files — /var/mmfs/ssl/stage/genkeyData* — and the appropriate SSH key files needed, and use the following systemd override to the mmsdrserv.service. Where is the appropriate place to do that override will depend on your version of GFPS somewhat as the systemd setup for GPFS has changed in 5.x, but I’ve rigged this up for any of the 4.x and 5.x that exist so far if you need pointers. We use CentOS, FYI, but I don’t think any of this should be different on Debian; our current version of GPFS on nodes where we do this is 5.0.4-1:

[root at master ~]# wwsh file print mmsdrserv-override.conf
#### mmsdrserv-override.conf ##################################################
mmsdrserv-override.conf: ID               = 1499
mmsdrserv-override.conf: NAME             = mmsdrserv-override.conf
mmsdrserv-override.conf: PATH             = /etc/systemd/system/mmsdrserv.service.d/override.conf
mmsdrserv-override.conf: ORIGIN           = /root/clusters/amarel/mmsdrserv-override.conf
mmsdrserv-override.conf: FORMAT           = data
mmsdrserv-override.conf: CHECKSUM         = ee7c28f0eee075a014f7a1a5add65b1e
mmsdrserv-override.conf: INTERPRETER      = UNDEF
mmsdrserv-override.conf: SIZE             = 210
mmsdrserv-override.conf: MODE             = 0644
mmsdrserv-override.conf: UID              = 0
mmsdrserv-override.conf: GID              = 0

[root at master ~]# wwsh file show mmsdrserv-override.conf
[Unit]
After=sys-subsystem-net-devices-ib0.device

[Service]
ExecStartPre=/usr/lpp/mmfs/bin/mmsdrrestore -p $SERVER -R /usr/bin/scp
ExecStartPre=/usr/lpp/mmfs/bin/mmauth genkey propagate -N %{NODENAME}-ib0

…where $SERVER above has been changed for this e-mail; the actual override file contains the hostname of our cluster manager, or other appropriate config server. %{NODENAME} is filled in by Warewulf, which is our cluster manager, and will contain any given node’s short hostname. I’ve since found that we can also set an object that I could use to make the first line include %{CLUSTERMGR} or other arbitrary variable and make this file more cluster-agnostic, but we just haven’t done that yet.

Other than that, we build/install the appropriate gpfs.gplbin-<uname -r> RPM, which we build by doing — on a node with an identical OS — or you can manually modify the config and have the appropriate kernel source handy: "cd /usr/lpp/mmfs/src; make Autoconfig; make World; make rpm”. You’d do make deb instead. Also obviously installed is the rest of GPFS and you join the node to the cluster while it’s booted up one of the times. Warewulf starts a node off with a nearly empty /var, so anything we need to be in there has to be populated on boot. It’s required a little tweaking from time to time on OS upgrades or GPFS upgrades, but other than that, we’ve been running clusters like this without incident for years.

--
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Feb 17, 2020, at 1:42 PM, Popescu, Razvan <rp2927 at gsb.columbia.edu> wrote:
> 
> Hi,
>  
> Here at CBS we run our compute cluster as dataless nodes loading the base OS from a root server and using AUFS to overlay a few node config files (just krb5.keytab at this time) plus a tmpfs writtable layer on top of everything.   The result is that a node restart resets the configuration to whatever is recorded on the root server which does not include any node specific runtime files.   The (Debian10) system is based on debian-live, with a few in-house modification, a major feature being that we nfs mount the bottom r/o root layer such that we can make live updates (within certain limits).
>  
> I’m trying to add native (GPL) GPFS access to it.   (so far, we’ve used NFS to gain access to the GPFS resident data)
>  
> I was successful in building an Ubuntu 18.04 LTS based prototype of a similar design.  I installed on the root server all required GPFS (client) packages and manually built the GPL chroot’ed in the exported system tree.   I booted a test node with a persistent top layer to catch the data created by the GPFS node addition.   I successfully added the (client) node to the GPFS cluster.    It seems to work fine.
>  
> I’ve copied some the captured node data to the node specific overlay to try to run without any persistency:   the critical one seems to be the one in /var/mmfs/gen.  (copied all the /var/mmfs in fact).   It runs fine without persistency. 
>  
> My questions are:
> 	• Am I insane and take the risk of compromising the cluster’s data integrity?   (…by resetting the whole content of /var to whatever was generated after the mmaddnode command?!?!)
> 	• Would such a configuration run safely through a proper reboot?  How about a forced power-off and restart?
> 	• Is there a properly identified minimum set of files that must be added to the node specific overlay to make this work?   (for now, I’ve used my “knowledge” and guesswork to decide what to retain and what not: e.g. keep startup links, certificates and config dumps,   drop: logs, pids. etc….). 
>  
> Thanks!!
>  
> Razvan N. Popescu
> Research Computing Director
> Office: (212) 851-9298
> razvan.popescu at columbia.edu
>  
> Columbia Business School
> At the Very Center of Business
>  
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss