[gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why.
Luke Raimbach
luke.raimbach at googlemail.com
Mon Jul 24 23:23:03 BST 2017
Switch of CCR and see what happens.
On Mon, 24 Jul 2017, 15:40 Adam Huffman, <adam.huffman at crick.ac.uk> wrote:
> smem is recommended here
>
> Cheers,
> Adam
>
> --
>
> Adam Huffman
> Senior HPC and Cloud Systems Engineer
> The Francis Crick Institute
> 1 Midland Road
> London NW1 1AT
>
> T: 020 3796 1175
> E: adam.huffman at crick.ac.uk
> W: www.crick.ac.uk
>
>
>
>
>
> On 24 Jul 2017, at 15:21, Peter Childs <p.childs at qmul.ac.uk> wrote:
>
>
> top
>
> but ps gives the same value.
>
> [root at dn29 ~]# ps auww -q 4444
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 4444 2.7 22.3 10537600 5472580 ? S<Ll Jul12 466:13
> /usr/lpp/mmfs/bin/mmfsd
>
> Thanks for the help
>
> Peter.
>
>
> On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote:
>
> How are you identifying the high memory usage?
>
>
> On Monday, July 24, 2017 9:30 AM, Peter Childs <p.childs at qmul.ac.uk>
> wrote:
>
>
> I've had a look at mmfsadm dump malloc and it looks to agree with the
> output from mmdiag --memory. and does not seam to account for the excessive
> memory usage.
>
> The new machines do have idleSocketTimout set to 0 from what your saying
> it could be related to keeping that many connections between nodes working.
>
> Thanks in advance
>
> Peter.
>
>
>
>
> [root at dn29 ~]# mmdiag --memory
>
> === mmdiag: memory ===
> mmfsd heap size: 2039808 bytes
>
>
> Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
> 128 bytes in use
> 17500049370 hard limit on memory usage
> 1048576 bytes committed to regions
> 1 number of regions
> 555 allocations
> 555 frees
> 0 allocation failures
>
>
> Statistics for MemoryPool id 2 ("Shared Segment")
> 42179592 bytes in use
> 17500049370 hard limit on memory usage
> 56623104 bytes committed to regions
> 9 number of regions
> 100027 allocations
> 79624 frees
> 0 allocation failures
>
>
> Statistics for MemoryPool id 3 ("Token Manager")
> 2099520 bytes in use
> 17500049370 hard limit on memory usage
> 16778240 bytes committed to regions
> 1 number of regions
> 4 allocations
> 0 frees
> 0 allocation failures
>
>
> On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote:
>
> There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2
> shared memory segments. To see the memory utilization of the shared
> memory segments run the command mmfsadm dump malloc . The statistics
> for memory pool id 2 is where maxFilesToCache/maxStatCache objects are
> and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.
>
> You might want to upgrade to later PTF as there was a PTF to fix a memory
> leak that occurred in tscomm associated with network connection drops.
>
>
> On Monday, July 24, 2017 5:29 AM, Peter Childs <p.childs at qmul.ac.uk>
> wrote:
>
>
> We have two GPFS clusters.
>
> One is fairly old and running 4.2.1-2 and non CCR and the nodes run
> fine using up about 1.5G of memory and is consistent (GPFS pagepool is
> set to 1G, so that looks about right.)
>
> The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
> increasing in there memory usage, starting at about 1.1G and are find
> for a few days however after a while they grow to 4.2G which when the
> node need to run real work, means the work can't be done.
>
> I'm losing track of what maybe different other than CCR, and I'm trying
> to find some more ideas of where to look.
>
> I'm checked all the standard things like pagepool and maxFilesToCache
> (set to the default of 4000), workerThreads is set to 128 on the new
> gpfs cluster (against default 48 on the old)
>
> I'm not sure what else to look at on this one hence why I'm asking the
> community.
>
> Thanks in advance
>
> Peter Childs
> ITS Research Storage
> Queen Mary University of London.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> --
>
> Peter Childs
> ITS Research Storage
> Queen Mary, University of London
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> --
>
> Peter Childs
> ITS Research Storage
> Queen Mary, University of London
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> The Francis Crick Institute Limited is a registered charity in England and
> Wales no. 1140062 and a company registered in England and Wales no.
> 06885462, with its registered office at 1 Midland Road London NW1 1AT
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/6af94da1/attachment-0002.htm>
More information about the gpfsug-discuss
mailing list