[gpfsug-discuss] Using VMs as quorum / admin nodes in a GPFS infiniband cluster

Sanchez, Paul Paul.Sanchez at deshaw.com
Mon Jun 7 14:36:00 BST 2021


Hi Leo,

We use VMs for Spectrum Scale all of the time (including VM-based NAS clusters that span multiple sites) and all of the cloud-based offerings do as well, so it’s pretty clearly a thing that people are using.  (Note: all of my experience is on Ethernet fabrics, so keep that in mind when I’m discussing networking.) But you’re right that there are a few pitfalls, such as…


1.       Licensing. The traditional PVU license model discouraged adding machines to clusters and encouraged the concentration of server roles in a way that didn’t align with best practices. If you’re on capacity based licensing then this issue is moot.  (We’ve been in that model for ages, and so consequently we have years of experience with GPFS and VMs. But with PVUs we probably wouldn’t have gone this way.)

2.       Virtualized networking can be flaky. In particular, I’ve found SR-IOV to be unreliable.  Suddenly in the middle of a TCP session you might see GPFS complain about “Unexpected data in message. Header dump: cccccccc cccc cccc…” from a VM whose virtual network interface has gone awry and necessitates a reboot, and which can leave corrupted data on disk when this happens, requiring you to offline mmfsck and/or spelunk through a damaged filesystem and backups to recover.  Based on this, I would recommend the following:

a.       Do NOT use SR-IOV. If you’re using KVM then just stick with virtio (vnet and bridge interfaces).

b.       DO enable all of the checksum protection you can get on the cluster (e.g. nsdCksumTraditional=yes). This can act as a backstop against network reliability issues and in practice on modern machines doesn’t appear to be as big of a performance hit as it once was. (I’d recommend this for everyone honestly.)

c.       Think about increasing your replication factor if you’re running filesystems with only one copy of data/metadata.  One of the strengths of GPFS is its support for replication, both as a throughput scaling mechanism and for redundancy, and that redundancy can buy you a lot of forgiveness if things go wrong.

3.       Sizing.  Do not be too stingy with RAM and CPU allocations for your guest nodes. Scale is excellent at multithreading for things like parallel inode scan, prefetching, etc, and remember that your quorum nodes will be token managers by default unless you assign the manager roles elsewhere, and may need to have enough RAM to support their share of the token serving workload.  A stable cluster is one in which the servers aren’t thrashing for a lack of resources.

Others may have additional experience and best practices to share, which would be great since I don’t see this trend going away any time soon.

Good luck,
Paul

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Leonardo Sala
Sent: Monday, June 7, 2021 08:47
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] Using VMs as quorum / admin nodes in a GPFS infiniband cluster


This message was sent by an external party.


Hallo,

we do have multiple bare-metal GPFS clusters with infiniband fabric, and I am actually considering adding some VMs in the mix, to perform admin tasks (so that the bare metal servers do not need passwordless ssh keys) and quorum nodes. Has anybody tried this? What could be the drawbacks / issues at GPFS level?

Thanks a lot for the insights!

cheers

leo

--

Paul Scherrer Institut

Dr. Leonardo Sala

Group Leader High Performance Computing

Deputy Section Head Science IT

Science IT

WHGA/036

Forschungstrasse 111

5232 Villigen PSI

Switzerland



Phone: +41 56 310 3369

leonardo.sala at psi.ch<mailto:leonardo.sala at psi.ch>

www.psi.ch<http://www.psi.ch>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210607/f062b14a/attachment-0002.htm>


More information about the gpfsug-discuss mailing list