[gpfsug-discuss] mmfsd recording High CPU usage

Wed Nov 21 15:32:55 GMT 2018

Hi,

the best way to debug something like that is to start with top. start top
then press 1 and check if any of the cores has almost 0% idle while others
have plenty of CPU left. if that is the case you have one very hot thread.
to further isolate it you can press 1 again to collapse the cores, now
press shirt-h which will break down each thread of a process and show them
as an individual line.
now you either see one or many mmfsd's causing cpu consumption, if its many
your workload is just doing a lot of work, what is more concerning is if
you have just 1 thread running at the 90%+ . if thats the case write down
the PID of the thread that runs so hot and run mmfsadm dump
threads,kthreads  >dum. you will see many entries in the file like :

    MMFSADMDumpCmdThread: desc 0x7FC84C002980 handle 0x4C0F02FA parm
0x7FC9700008C0 highStackP 0x7FC783F7E530
      pthread 0x83F80700 kernel thread id 49878 (slot -1) pool 21
ThPoolCommands
      per-thread gbls:
        0:0x0 1:0x0 2:0x0 3:0x3 4:0xFFFFFFFFFFFFFFFF 5:0x0 6:0x0
7:0x7FC98C0067B0
        8:0x0 9:0x0 10:0x0 11:0x0 12:0x0 13:0x400000E 14:0x7FC98C004C10
15:0x0
        16:0x4 17:0x0 18:0x0

find the pid behind 'thread id' and post that section, that would be the
first indication on what that thread does ...

sven

On Tue, Nov 20, 2018 at 11:10 PM Saula, Oluwasijibomi <
oluwasijibomi.saula at ndsu.edu> wrote:

> Hello Scalers,
>
>
> First, let me say Happy Thanksgiving to those of us in the US and to those
> beyond, well, it's a still happy day seeing we're still above ground! 😊
>
>
> Now, what I have to discuss isn't anything extreme so don't skip the
> turkey for this, but lately, on a few of our compute GPFS client nodes,
> we've been noticing high CPU usage by the mmfsd process and are wondering
> why. Here's a sample:
>
>
> [~]# top -b -n 1 | grep mmfs
>
>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
> COMMAND
>
>
> 231898 root       0 -20 14.508g 4.272g  70168 S  93.8  6.8  69503:41
> *mmfs*d
>
>   4161 root       0 -20  121876   9412   1492 S   0.0  0.0   0:00.22 run
> *mmfs*
>
> Obviously, this behavior was likely triggered by a not-so-convenient user
> job that in most cases is long finished by the time we
> investigate. Nevertheless, does anyone have an idea why this might be
> happening? Any thoughts on preventive steps even?
>
>
> This is GPFS v4.2.3 on Redhat 7.4, btw...
>
>
> Thanks,
>
> Siji Saula
> HPC System Administrator
> Center for Computationally Assisted Science & Technology
> *NORTH DAKOTA STATE UNIVERSITY*
>
>
> <https://www.ndsu.edu/alphaindex/buildings/Building::395>Research 2
> Building <https://www.ndsu.edu/alphaindex/buildings/Building::396>
> <https://www.ndsu.edu/alphaindex/buildings/Building::395> – Room 220B
> Dept 4100, PO Box 6050  / Fargo, ND 58108-6050
> p:701.231.7749
> www.ccast.ndsu.edu | www.ndsu.edu
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181121/e829f214/attachment-0002.htm>