[gpfsug-discuss] cross-cluster mounting different versionsofgpfs

Yuri L Volobuev volobuev at us.ibm.com
Wed Mar 16 19:29:17 GMT 2016


There are two related, but distinctly different issues to consider.

1) File system format and backward compatibility.  The format of a given
file system is recorded on disk, and determines the level of code required
to mount such a file system.  GPFS offers backward compatibility for older
file system versions stretching for many releases.  The oldest file system
format we test with in the lab is 2.2 (we don't believe there are file
systems using older versions actually present in the field).  So if you
have a file system formatted using GPFS V3.5 code, you can mount that file
system using GPFS V4.1 or V4.2 without a problem.  Of course, you don't get
to use the new features that depend on the file system format that came out
since V3.5.  If you're formatting a new file system on a cluster running
newer code, but want that file system to be mountable by older code, you
have to use --version with mmcrfs.

2) RPC format compatibility, aka nodes being able to talk to each other.
As the code evolves, the format of some RPCs sent over the network to other
nodes naturally has to evolve as well.  This of course presents a major
problem for code coexistence (running different versions of GPFS on
different nodes in the same cluster, or nodes from different clusters
mounting the same file system, which effectively means joining a remote
cluster), which directly translates into the possibility of a rolling
migration (upgrading nodes to newer GPFS level one at a time, without
taking all nodes down).  Implementing new features while preserving some
level of RPC compatibility with older releases is Hard, but this is
something GPFS has committed to, long ago.  The commitment is not
open-ended though, there's a very specific statement of support for what's
allowed.  GPFS major (meaning 'v' or 'r' is incremented in a v.r.m.f
version string) release N stream shall have coexistence with the GPFS major
release N - 1 stream.  So coexistence of V4.2 with V4.1 is supported, while
coexistence of V4.2 with older releases is unsupported (it may or may not
work if one tries it, depending on the specific combination of versions,
but one would do so entirely on own risk).  The reason for limiting the
extent of RPC compatibility is prosaic: in order to support something, we
have to be able to test this something.  We have the resources to test the
N / N - 1 combination, for every major release N.  If we had to extend this
to N, N - 1, N - 2, N - 3, you can do the math on how many combinations to
test that would create.  That would bust the test budget.

So if you want to cross-mount a file system from a home cluster running
V4.2, you have to run at least V4.1.x on client nodes, and the file system
would have to be formatted using the lowest version used on any node
mounting the file system.

Hope this clarifies things a bit.

yuri



From:	"Uwe Falke" <UWEFALKE at de.ibm.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
Date:	03/16/2016 11:52 AM
Subject:	Re: [gpfsug-discuss] cross-cluster mounting different versions
            ofgpfs
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



Hi, Damir,

I have not done that, but a rolling upgrade from 3.5.x to 4.1.x (maybe
even to 4.2) is supported.
So, as long as you do not need all 500 nodes of your compute cluster
permanently active, you might upgrade them in batches without fully-blown
downtime. Nicely orchestrated by some scripts it could be done quite
smoothly (depending on the percentage of compute nodes which can go down
at once and on the run time / wall clocks of your jobs this will take
between few hours and many days ...).



Mit freundlichen Grüßen / Kind regards


Dr. Uwe Falke

IT Specialist
High Performance Computing Services / Integrated Technology Services /
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------

IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------

IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:
Frank Hammer, Thorsten Moehring
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 17122




From:   Damir Krstic <damir.krstic at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   03/16/2016 07:08 PM
Subject:        Re: [gpfsug-discuss] cross-cluster mounting different
versions of     gpfs
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Sven,

For us, at least, at this point in time, we have to create new filesystem
with version flag. The reason is we can't take downtime to upgrade all of
our 500+ compute nodes that will cross-cluster mount this new storage. We
can take downtime in June and get all of the nodes up to 4.2 gpfs version
but we have users today that need to start using the filesystem.

So at this point in time, we either have ESS built with 4.1 version and
cross mount its filesystem (also built with --version flag I assume) to
our 3.5 compute cluster, or...we proceed with 4.2 ESS and build
filesystems with --version flag and then in June when we get all of our
clients upgrade we run =latest gpfs command and then mmchfs -V to get
filesystem back up to 4.2 features.

It's unfortunate that we are in this bind with the downtime of the compute
cluster. If we were allowed to upgrade our compute nodes before June, we
could proceed with 4.2 build without having to worry about filesystem
versions.

Thanks for your reply.

Damir

On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme <oehmes at gmail.com> wrote:
while this is all correct people should think twice about doing this.
if you create a filesystem with older versions, it might prevent you from
using some features like data-in-inode, encryption, adding 4k disks to
existing filesystem, etc even if you will eventually upgrade to the latest
code.

for some customers its a good point in time to also migrate to larger
blocksizes compared to what they run right now and migrate the data. i
have seen customer systems gaining factors of performance improvements
even on existing HW by creating new filesystems with larger blocksize and
latest filesystem layout (that they couldn't before due to small file
waste which is now partly solved by data-in-inode). while this is heavily
dependent on workload and environment its at least worth thinking about.

sven



On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan <makaplan at us.ibm.com>
wrote:
The key point is that you must create the file system so that is "looks"
like a 3.5 file system.  See mmcrfs ... --version.  Tip: create or find a
test filesystem back on the 3.5 cluster and look at the version string.
 mmslfs xxx -V.  Then go to the 4.x system and try to create a file system
with the same version string....





_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment
"atthrpb5.gif" deleted by Uwe Falke/Germany/IBM]
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160316/4d03d124/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160316/4d03d124/attachment-0002.gif>


More information about the gpfsug-discuss mailing list