[gpfsug-discuss] Introduction

Luke Raimbach luke.raimbach at oerc.ox.ac.uk
Mon Mar 2 19:21:25 GMT 2015

HI Adam,

We run virtualised GPFS client nodes in a VMware cluster here at Oxford e-Research Centre. We had a requirement where one research group wanted root access to their VMs but also wanted fast direct access to their data on our GPFS cluster.

The technical setup was (relatively) simple. We span up a small three-node virtual GPFS cluster (with no file system of its own). We then used the multi-cluster feature of GPFS to allow this small virtual cluster to join our main file system cluster which now gives that group very good IO performance.

However, the problem of spoofing you mention is relevant – we installed the virtual cluster nodes for the research group and put our Active Directory client on them. We also used the root-squash configuration option of the multi-cluster setup to prevent remote-cluster root access in to our file system. We also agreed with the research group that they would nominate one Administrator to have root access in their cluster and that they would maintain the AAA framework we put in place. We have to trust the group’s Administrator not to fiddle his UID or to let others escalate their privileges.

If you were letting untrusted root users spin up Stacks, then you could still run GPFS clients in the OpenStack instance nodes to give them fast access to their data. Here are some musings on a recipe (others please feel free to pull these ideas to pieces):

1.       Start with Cluster A – your main production GPFS file system. It has GPFS device name /gpfs.

2.       Pretend you have lots of money for extra disk to go under your OpenStack cluster (say you buy something like a DDN SFA7700 with a couple of expansion shelves and fill it up with 4TB drives – 180 drives).

3.       Drive this disk array with two, preferably four (or however many you want, really) decent NSD servers. Configure quorum nodes, etc. appropriately. Call this Cluster B.

4.       Carve up the disk array in to something like 30 x RAID6 (4 + 2) LUNs and configure them as GPFS NSDs; but don’t create a file system (line up the stripe sizes etc. and choose a nice block size, etc. etc.)…

5.       Put the GPFS metadata on some SSD NSDs somewhere. I like putting it on SSDs in the NSD server nodes and replicating it. Other people like putting it in their disk arrays.

6.       As part of someone spinning up a Stack, get some scripts to do the following “magic”:

a.       Connect to Cluster A and find out how big their target data-set is.

b.      Connect to Cluster B and create a new GPFS file system with a reasonable (dependent on the above result) number of NSD disks. Call this new GPFS device something unique other that /gpfs e.g. /gpfs0001. You could slice bits off your SSDs for the metadata NSDs in each file system you create in this manner (if you haven’t got many SSDs).

c.       As part of a new Stack, provide a few (three, say) GPFS quorum nodes that you’ve configured. Call this Cluster C. Add the rest of the stack instances to Cluster C. No File System.

d.      Pop back over to Cluster A. Export their target data-set from Cluster A using AFM (over GPFS or NFS – pick your favourite: GPFS is probably better performant but means you need Cluster A to stay online).

e.      Now return to Cluster B. Import the target data to a local AFM cache on Cluster B’s new file system. Name the AFM file-set whatever you like, but link it in to the Cluster B /gpfs0001 namespace at the same level as it is in Cluster A. For example Cluster A: /gpfs/projects/dataset01 imports to an AFM fileset in Cluster B named userdataset01. Link this under /gpfs0001/projects/dataset01.

f.        Configure multi-cluster support on Cluster B to export GPFS device /gpfs0001 to Cluster C. Encrypt traffic if you want a headache.

g.       Configure multi-cluster support on Cluster C to remote mount Cluster B:/gpfs0001 as local device /gpfs.

7.       You now have fast GPFS access to this user dataset *only* using GPFS clients inside the OpenStack instance nodes. You have also preserved the file system namespace in Cluster C’s instance nodes. If you only want to run over the data in the stack instances, you could pre-fetch the entire data-set using AFM Control from Cluster A in to the Cluster B file-set (if it’s big enough).

8.       Now your users are finished and want to destroy the stack – you need some more script “magic”:

a.       Dismount the file system /gpfs in Cluster C.

b.      Connect to Cluster B and use AFM control to flush all the data back home to Cluster A.

c.       Unlink the file-set in Cluster B and force delete it; then delete the file system to free the NSDs back to the pool available to Cluster B.

d.      Connect back to Cluster A and unexport the original data-set directory structure.

e.      Throw away the VMs in the stack

Things to worry about:

·         Inode space will be different if users happen to be working on the data in Cluster A and Cluster C and want to know about inodes. GPFS XATTRS are preserved.

·         If you use AFM over NFS because Cluster A and B are far away from each other and laggy, then there’s no locking with your AFM cache running as an Independent Writer. Writes at home (Cluster A) and in cache (Cluster B from Cluster C) will be nondeterministic. Your users will need to know this to avoid disappointment.

·         If you use AFM over GPFS because Cluster A and B are near to each other and have a fast network, then there might still not be locking, but if Cluster A goes offline, it will put the AFM cache in to a “dismounted” state.

·         If your users want access to other parts of the Cluster A /gpfs namespace within their stack instances (because you have tools they use or they want to see other stuff), you can export them as read-only to a read-only AFM cache in Cluster B and they will be able to see things in Cluster C provided you link the AFM caches in the right places. Remember they have root access here.

·         AFM updates are sent from cache to home as root so users can potentially overflow their quota at the Cluster A home site (root doesn’t care about quotas at home).

·         Other frightening things might happen that I’ve not thought about.

Hope this helps!


Luke Raimbach
IT Manager
Oxford e-Research Centre
7 Keble Road,

+44(0)1865 610639

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Adam Huffman
Sent: 02 March 2015 09:40
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Introduction

Hi Vic,

Re-emphasising that I’m still very much learning about GPFS, one of the approaches being discussed is running the GPFS client inside the instances. The concern here is over the case where users have root privileges inside their instances (a pretty common assumption for those used to AWS, for example) and the implications this may have for GPFS. Does it mean there would be a risk of spoofing?


From: Vic Cornell
Reply-To: gpfsug main discussion list
Date: Monday, 2 March 2015 09:32
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Introduction

Hi Adam,

I guess that one of the things that would help push it forward is a definition of what "secure" means to you.



On 2 Mar 2015, at 09:24, Adam Huffman <adam.huffman at crick.ac.uk<mailto:adam.huffman at crick.ac.uk>> wrote:


A couple of weeks ago I joined Bruno Silva’s HPC team at the Francis Crick Institute, with special responsibility for HPC, OpenStack and virtualization. I’m very much a GPFS novice so I’m hoping to be able to draw on the knowledge in this group, while hopefully being able to help others with OpenStack.

As Bruno stated in his message, we’re particularly interested in how to present GPFS to instances securely. I’ve read the discussion from November on this list, which didn’t seem to come to any firm conclusions. Has anyone involved then made substantial progress since?



Adam Huffman
Senior HPC & Virtualization Systems Engineer
The Francis Crick Institute
Gibbs Building
215 Euston Road
London NW1 2BE

E: adam.huffman at crick.ac.uk<mailto:j at crick.ac.uk>
W: www.crick.ac.uk<http://www.crick.ac.uk/>

The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no.06885462, with its registered office at 215 Euston Road, London NW1 2BE

gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20150302/5564fd0e/attachment-0003.htm>

More information about the gpfsug-discuss mailing list