[gpfsug-discuss] Well, this is the pits...

Buterbaugh, Kevin L Kevin.Buterbaugh at Vanderbilt.Edu
Thu May 4 16:43:56 BST 2017


Hi Olaf,

Your explanation mostly makes sense, but...

Failed with 4 nodes … failed with 2 nodes … not gonna try with 1 node.  And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems.

I thought that QOS (which I’m using) was what would keep an mmrestripefs from overrunning the system … QOS has worked extremely well for us - it’s one of my favorite additions to GPFS.

Kevin

On May 4, 2017, at 10:34 AM, Olaf Weiser <olaf.weiser at de.ibm.com<mailto:olaf.weiser at de.ibm.com>> wrote:

no.. it is just in the code, because we have to avoid to run out of mutexs / block

reduce the number of nodes -N down to 4  (2nodes is even more safer) ... is the easiest way to solve it for now....

I've been told the real root cause will be fixed in one of the next ptfs .. within this year ..
this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before
but the fact , that we have to make sure, not to overrun the system by mmrestripe  remains.. to please lower the -N number of nodes to 4 or better 2

(even though we know.. than the mmrestripe will take longer)


From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        05/04/2017 05:26 PM
Subject:        [gpfsug-discuss] Well, this is the pits...
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________



Hi All,

Another one of those, “I can open a PMR if I need to” type questions…

We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won’t get into here).  Therefore, I’m doing a lot of mmrestripe’s, mmdeldisk’s, and mmadddisk’s.

Yesterday I did an “mmrestripefs <old fs> -r -N <my 8 NSD servers>” (after suspending a disk, of course).  Worked like it should.

Today I did a “mmrestripefs <new fs> -b -P capacity -N <those same 8 NSD servers>” and got:

mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system.  The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31.  Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting.  By default the file system manager node is counted as a participating node.
mmrestripefs: Command failed. Examine previous error messages to determine cause.

So there must be some difference in how the “-r” and “-b” options calculate the number of PIT worker threads.  I did an “mmfsadm dump all | grep pitWorkerThreadsPerNode” on all 8 NSD servers and the filesystem manager node … they all say the same thing:

   pitWorkerThreadsPerNode 0

Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!?  I’m confused...

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170504/38da869a/attachment-0002.htm>


More information about the gpfsug-discuss mailing list