[gpfsug-discuss] Dataless nodes as GPFS clients

Popescu, Razvan rp2927 at gsb.columbia.edu
Mon Feb 17 18:42:51 GMT 2020


Hi,

Here at CBS we run our compute cluster as dataless nodes loading the base OS from a root server and using AUFS to overlay a few node config files (just krb5.keytab at this time) plus a tmpfs writtable layer on top of everything.   The result is that a node restart resets the configuration to whatever is recorded on the root server which does not include any node specific runtime files.   The (Debian10) system is based on debian-live, with a few in-house modification, a major feature being that we nfs mount the bottom r/o root layer such that we can make live updates (within certain limits).

I’m trying to add native (GPL) GPFS access to it.   (so far, we’ve used NFS to gain access to the GPFS resident data)

I was successful in building an Ubuntu 18.04 LTS based prototype of a similar design.  I installed on the root server all required GPFS (client) packages and manually built the GPL chroot’ed in the exported system tree.   I booted a test node with a persistent top layer to catch the data created by the GPFS node addition.   I successfully added the (client) node to the GPFS cluster.    It seems to work fine.

I’ve copied some the captured node data to the node specific overlay to try to run without any persistency:   the critical one seems to be the one in /var/mmfs/gen.  (copied all the /var/mmfs in fact).   It runs fine without persistency.

My questions are:

  1.  Am I insane and take the risk of compromising the cluster’s data integrity?   (…by resetting the whole content of /var to whatever was generated after the mmaddnode command?!?!)
  2.  Would such a configuration run safely through a proper reboot?  How about a forced power-off and restart?
  3.  Is there a properly identified minimum set of files that must be added to the node specific overlay to make this work?   (for now, I’ve used my “knowledge” and guesswork to decide what to retain and what not: e.g. keep startup links, certificates and config dumps,   drop: logs, pids. etc….).

Thanks!!

Razvan N. Popescu
Research Computing Director
Office: (212) 851-9298
razvan.popescu at columbia.edu<mailto:razvan.popescu at columbia.edu>

Columbia Business School
At the Very Center of Business

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200217/0ce50445/attachment-0001.htm>


More information about the gpfsug-discuss mailing list