[gpfsug-discuss] gpfsug-discuss Digest, Vol 82, Issue 31
Saula, Oluwasijibomi
oluwasijibomi.saula at ndsu.edu
Wed Nov 21 20:55:29 GMT 2018
Sven/Jim,
Thanks for sharing your thoughts! - Currently, we have mFTC set as such:
maxFilesToCache 4000
However, since we have a very diverse workload, we'd have to cycle through a vast majority of our apps to find the most fitting mFTC value as this page (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaag/wecm/l0wecm00_maxfilestocache.htm) suggests.
In the meantime, I was able to gather some more info for the lone mmfsd thread (pid: 34096) running at high CPU utilization, and right away I can see the number of nonvoluntary_ctxt_switches is quite high, compared to the other threads in the dump; however, I think I need some help interpreting all of this. Although, I should add that heavy HPC workloads (i.e. vasp, ansys...) are running on these nodes and may be somewhat related to this issue:
Scheduling info for kernel thread 34096
mmfsd (34096, #threads: 309)
-------------------------------------------------------------------
se.exec_start : 8057632237.613486
se.vruntime : 4914854123.640008
se.sum_exec_runtime : 1042598557.420591
se.nr_migrations : 8337485
nr_switches : 15824325
nr_voluntary_switches : 4110
nr_involuntary_switches : 15820215
se.load.weight : 88761
policy : 0
prio : 100
clock-delta : 24
mm->numa_scan_seq : 88980
numa_migrations, 5216521
numa_faults_memory, 0, 0, 1, 1, 1
numa_faults_memory, 1, 0, 0, 1, 1030
numa_faults_memory, 0, 1, 0, 0, 1
numa_faults_memory, 1, 1, 0, 0, 1
Status for kernel thread 34096
Name: mmfsd
Umask: 0022
State: R (running)
Tgid: 58921
Ngid: 34395
Pid: 34096
PPid: 3941
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 256
Groups:
VmPeak: 15137612 kB
VmSize: 15126340 kB
VmLck: 4194304 kB
VmPin: 8388712 kB
VmHWM: 4424228 kB
VmRSS: 4420420 kB
RssAnon: 4350128 kB
RssFile: 50512 kB
RssShmem: 19780 kB
VmData: 14843812 kB
VmStk: 132 kB
VmExe: 23672 kB
VmLib: 121856 kB
VmPTE: 9652 kB
VmSwap: 0 kB
Threads: 309
SigQ: 5/257225
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000010017a07
SigIgn: 0000000000000000
SigCgt: 0000000180015eef
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000001fffffffff
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
Seccomp: 0
Cpus_allowed: ffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
Cpus_allowed_list: 0-239
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 4110
nonvoluntary_ctxt_switches: 15820215
Thanks,
Siji Saula
HPC System Administrator
Center for Computationally Assisted Science & Technology
NORTH DAKOTA STATE UNIVERSITY
<https://www.ndsu.edu/alphaindex/buildings/Building::395>Research 2 Building<https://www.ndsu.edu/alphaindex/buildings/Building::396><https://www.ndsu.edu/alphaindex/buildings/Building::395> – Room 220B
Dept 4100, PO Box 6050 / Fargo, ND 58108-6050
p:701.231.7749
www.ccast.ndsu.edu<file://composeviewinternalloadurl/www.ccast.ndsu.edu> | www.ndsu.edu<file://composeviewinternalloadurl/www.ndsu.edu>
________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Wednesday, November 21, 2018 9:33:10 AM
To: gpfsug-discuss at spectrumscale.org
Subject: gpfsug-discuss Digest, Vol 82, Issue 31
Send gpfsug-discuss mailing list submissions to
gpfsug-discuss at spectrumscale.org
To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-request at spectrumscale.org
You can reach the person managing the list at
gpfsug-discuss-owner at spectrumscale.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."
Today's Topics:
1. Re: mmfsd recording High CPU usage (Jim Doherty)
2. Re: mmfsd recording High CPU usage (Sven Oehme)
----------------------------------------------------------------------
Message: 1
Date: Wed, 21 Nov 2018 13:01:54 +0000 (UTC)
From: Jim Doherty <jjdoherty at yahoo.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] mmfsd recording High CPU usage
Message-ID: <1913697205.666954.1542805314669 at mail.yahoo.com>
Content-Type: text/plain; charset="utf-8"
At a guess with no data ....?? if the application is opening more files than can fit in the maxFilesToCache (MFTC) objects? GPFS will expand the MFTC to support the open files,? but it will also scan to try and free any unused objects.??? If you can identify the user job that is causing this? you could monitor a system more closely.
Jim
On Wednesday, November 21, 2018, 2:10:45 AM EST, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:
<!--#yiv4325073500 P {margin-top:0;margin-bottom:0;}-->
Hello Scalers,
First, let me say Happy Thanksgiving to those of us in the US and to those beyond, well, it's a?still happy day seeing we're still above ground!?
Now, what I have to discuss isn't anything extreme so don't skip the turkey for this, but lately, on a few of our compute GPFS client nodes, we've been noticing high CPU usage by the mmfsd process and are wondering why. Here's a sample:
[~]# top -b -n 1 | grep mmfs
? ?PID USER? ? ?PR? NI? ?VIRT? ? RES? ?SHR S? %CPU %MEM ? ? TIME+ COMMAND
231898 root ? ? ? 0 -20 14.508g 4.272g? 70168 S?93.8? 6.8?69503:41 mmfsd
?4161 root ? ? ? 0 -20?121876 ? 9412 ? 1492 S ? 0.0?0.0 ? 0:00.22 runmmfs
Obviously, this behavior was likely triggered by a not-so-convenient user job that in most cases is long finished by the time we investigate.?Nevertheless, does anyone have an idea why this might be happening? Any thoughts on preventive steps even?
This is GPFS v4.2.3 on Redhat 7.4, btw...
Thanks,?Siji SaulaHPC System AdministratorCenter for Computationally Assisted Science & TechnologyNORTH DAKOTA STATE UNIVERSITY?
Research 2 Building???Room 220BDept 4100, PO Box 6050? / Fargo, ND 58108-6050p:701.231.7749www.ccast.ndsu.edu?|?www.ndsu.edu
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20181121/16de3172/attachment-0001.html>
------------------------------
Message: 2
Date: Wed, 21 Nov 2018 07:32:55 -0800
From: Sven Oehme <oehmes at gmail.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] mmfsd recording High CPU usage
Message-ID:
<CALssuR2PA5Q73p-i=xnNP97+bvzdqT8RO2p1NZ6duBbhRO8OCw at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
the best way to debug something like that is to start with top. start top
then press 1 and check if any of the cores has almost 0% idle while others
have plenty of CPU left. if that is the case you have one very hot thread.
to further isolate it you can press 1 again to collapse the cores, now
press shirt-h which will break down each thread of a process and show them
as an individual line.
now you either see one or many mmfsd's causing cpu consumption, if its many
your workload is just doing a lot of work, what is more concerning is if
you have just 1 thread running at the 90%+ . if thats the case write down
the PID of the thread that runs so hot and run mmfsadm dump
threads,kthreads >dum. you will see many entries in the file like :
MMFSADMDumpCmdThread: desc 0x7FC84C002980 handle 0x4C0F02FA parm
0x7FC9700008C0 highStackP 0x7FC783F7E530
pthread 0x83F80700 kernel thread id 49878 (slot -1) pool 21
ThPoolCommands
per-thread gbls:
0:0x0 1:0x0 2:0x0 3:0x3 4:0xFFFFFFFFFFFFFFFF 5:0x0 6:0x0
7:0x7FC98C0067B0
8:0x0 9:0x0 10:0x0 11:0x0 12:0x0 13:0x400000E 14:0x7FC98C004C10
15:0x0
16:0x4 17:0x0 18:0x0
find the pid behind 'thread id' and post that section, that would be the
first indication on what that thread does ...
sven
On Tue, Nov 20, 2018 at 11:10 PM Saula, Oluwasijibomi <
oluwasijibomi.saula at ndsu.edu> wrote:
> Hello Scalers,
>
>
> First, let me say Happy Thanksgiving to those of us in the US and to those
> beyond, well, it's a still happy day seeing we're still above ground! ?
>
>
> Now, what I have to discuss isn't anything extreme so don't skip the
> turkey for this, but lately, on a few of our compute GPFS client nodes,
> we've been noticing high CPU usage by the mmfsd process and are wondering
> why. Here's a sample:
>
>
> [~]# top -b -n 1 | grep mmfs
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
>
>
> 231898 root 0 -20 14.508g 4.272g 70168 S 93.8 6.8 69503:41
> *mmfs*d
>
> 4161 root 0 -20 121876 9412 1492 S 0.0 0.0 0:00.22 run
> *mmfs*
>
> Obviously, this behavior was likely triggered by a not-so-convenient user
> job that in most cases is long finished by the time we
> investigate. Nevertheless, does anyone have an idea why this might be
> happening? Any thoughts on preventive steps even?
>
>
> This is GPFS v4.2.3 on Redhat 7.4, btw...
>
>
> Thanks,
>
> Siji Saula
> HPC System Administrator
> Center for Computationally Assisted Science & Technology
> *NORTH DAKOTA STATE UNIVERSITY*
>
>
> <https://www.ndsu.edu/alphaindex/buildings/Building::395>Research 2
> Building <https://www.ndsu.edu/alphaindex/buildings/Building::396>
> <https://www.ndsu.edu/alphaindex/buildings/Building::395> ? Room 220B
> Dept 4100, PO Box 6050 / Fargo, ND 58108-6050
> p:701.231.7749
> www.ccast.ndsu.edu<http://www.ccast.ndsu.edu> | www.ndsu.edu<http://www.ndsu.edu>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20181121/e829f214/attachment.html>
------------------------------
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
End of gpfsug-discuss Digest, Vol 82, Issue 31
**********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181121/0e502846/attachment-0001.htm>
More information about the gpfsug-discuss
mailing list