[gpfsug-discuss] mmfsadm test pit

Aaron Knister aaron.s.knister at nasa.gov
Tue Aug 16 22:55:19 BST 2016


Thanks Marc! That's incredibly helpful info. I'll uh, not use the test 
pit command :)

-Aaron

On 8/16/16 5:09 PM, Marc A Kaplan wrote:
> I was surprised to read that Ctrl-C did not really kill restripe.   It's
> supposed to!  If it doesn't that's a bug.
>
> I ran this by my expert within IBM and he wrote to me:
>
> First of all a "PIT job" such as restripe, deldisk, delsnapshot, and
> such should be easy to stop by ^C the management program that started
> them.  The SG manager daemon holds open a socket to the client program
> for the purposes of sending command output, progress updates, error
> messages and the like.  The PIT code checks this socket periodically and
> aborts the PIT process cleanly if the socket is closed.  If this cleanup
> doesn't occur, it is a bug and should be worth reporting.  However,
> there's no exact guarantee on how quickly each thread on the SG mgr will
> notice and then how quickly the helper nodes can be stopped and so
> forth.  The interval between socket checks depends among other things on
> how long it takes to process each file, if there are a few very large
> files, the delay can be significant.  In the limiting case, where most
> of the FS storage is contained in a few files, this mechanism doesn't
> work [elided] well.  So it can be quite involved and slow sometimes to
> wrap up a PIT operation.
>
> The simplest way to determine if the command has really stopped is with
> the mmdiag --commands issued on the SG manager node.  This shows running
> commands with the command line, start time, socket, flags, etc.  After
> ^Cing the client program, the entry here should linger for a while, then
> go away.  When it exits you'll see an entry in the GPFS log file where
> it fails with err 50.  If this doesn't stop the command after a while,
> it is worth looking into.
>
> If the command wasn't issued on the SG mgr node and you can't find the
> where the client command is running, the socket is still a useful hint.
>  While tedious, it should be possible to trace this socket back to node
> where that command was originally run using netstat or equivalent.
>  Poking around inside a GPFS internaldump will also provide clues; there
> should be an outstanding  sgmMsgSGClientCmd command listed in the dump
> tscomm section.  Once you find it, just 'kill `pidof mmrestripefs` or
> similar.
>
> I'd like to warn the OP away from mmfsadm test pit.  These commands are
> of course unsupported and unrecommended for any purpose (even internal
> test and development purposes, as far as I know).  You are definitely
> working without a net there.  When I was improving the integration
> between PIT and snapshot quiesce a few years ago, I looked into this and
> couldn't figure out how to (easily) make these stop and resume commands
> safe to use, so as far as I know they remain unsafe.  The list command,
> however, is probably fairly okay; but it would probably be better to use
> mmfsadm saferdump pit.
>
>
>
>
>
> From:        Aaron Knister <aaron.s.knister at nasa.gov>
> To:        <gpfsug-discuss at spectrumscale.org>
> Date:        08/15/2016 10:49 PM
> Subject:        [gpfsug-discuss] mmfsadm test pit
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
>
>
>
> I just discovered this interesting gem poking at mmfsadm:
>
>  test pit fsname list|suspend|status|resume|stop [jobId]
>
> There have been times where I've kicked off a restripe and either
> intentionally or accidentally ctrl-c'd it only to realize that many
> times it's disappeared into the ether and is still running. The only way
> I've known so far to stop it is with a chgmgr.
>
> A far more painful instance happened when I ran a rebalance on an fs
> w/more than 31 nsds using more than 31 pit workers and hit *that* fun
> APAR which locked up access for a single filesystem to all 3.5k nodes.
> We spent 48 hours round the clock rebooting nodes as jobs drained to
> clear it up. I would have killed in that instance for a way to cancel
> the PIT job (the chmgr trick didn't work). It looks like you might
> actually be able to do this with mmfsadm, although how wise this is, I
> do not know (kinda curious about that).
>
> Here's an example. I kicked off a restripe and then ctrl-c'd it on a
> client node. Then ran these commands from the fs manager:
>
> root at loremds19:~ # /usr/lpp/mmfs/bin/mmfsadm test pit tlocal list
> JobId 785979015170 PitJobStatus PIT_JOB_RUNNING progress 0.00
> debug: statusListP D40E2C70
>
> root at loremds19:~ # /usr/lpp/mmfs/bin/mmfsadm test pit tlocal stop
> 785979015170
> debug: statusListP 0
>
> root at loremds19:~ # /usr/lpp/mmfs/bin/mmfsadm test pit tlocal list
> JobId 785979015170 PitJobStatus PIT_JOB_STOPPING progress 4.01
> debug: statusListP D4013E70
>
> ... some time passes ...
>
> root at loremds19:~ # /usr/lpp/mmfs/bin/mmfsadm test pit tlocal list
> debug: statusListP 0
>
> Interesting.
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list