[gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why.

Mon Jul 24 23:23:03 BST 2017

Switch of CCR and see what happens.

On Mon, 24 Jul 2017, 15:40 Adam Huffman, <adam.huffman at crick.ac.uk> wrote:

> smem is recommended here
>
> Cheers,
> Adam
>
> --
>
> Adam Huffman
> Senior HPC and Cloud Systems Engineer
> The Francis Crick Institute
> 1 Midland Road
> London NW1 1AT
>
> T: 020 3796 1175
> E: adam.huffman at crick.ac.uk
> W: www.crick.ac.uk
>
>
>
>
>
> On 24 Jul 2017, at 15:21, Peter Childs <p.childs at qmul.ac.uk> wrote:
>
>
> top
>
> but ps gives the same value.
>
> [root at dn29 ~]# ps auww -q 4444
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root      4444  2.7 22.3 10537600 5472580 ?    S<Ll Jul12 466:13
> /usr/lpp/mmfs/bin/mmfsd
>
> Thanks for the help
>
> Peter.
>
>
> On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote:
>
> How are you identifying  the high memory usage?
>
>
> On Monday, July 24, 2017 9:30 AM, Peter Childs <p.childs at qmul.ac.uk>
> wrote:
>
>
> I've had a look at mmfsadm dump malloc and it looks to agree with the
> output from mmdiag --memory. and does not seam to account for the excessive
> memory usage.
>
> The new machines do have idleSocketTimout set to 0 from what your saying
> it could be related to keeping that many connections between nodes working.
>
> Thanks in advance
>
> Peter.
>
>
>
>
> [root at dn29 ~]# mmdiag --memory
>
> === mmdiag: memory ===
> mmfsd heap size: 2039808 bytes
>
>
> Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
>            128 bytes in use
>    17500049370 hard limit on memory usage
>        1048576 bytes committed to regions
>              1 number of regions
>            555 allocations
>            555 frees
>              0 allocation failures
>
>
> Statistics for MemoryPool id 2 ("Shared Segment")
>       42179592 bytes in use
>    17500049370 hard limit on memory usage
>       56623104 bytes committed to regions
>              9 number of regions
>         100027 allocations
>          79624 frees
>              0 allocation failures
>
>
> Statistics for MemoryPool id 3 ("Token Manager")
>        2099520 bytes in use
>    17500049370 hard limit on memory usage
>       16778240 bytes committed to regions
>              1 number of regions
>              4 allocations
>              0 frees
>              0 allocation failures
>
>
> On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote:
>
> There are 3 places that the GPFS mmfsd uses memory  the pagepool  plus 2
> shared memory segments.   To see the memory utilization of the shared
> memory segments run the command   mmfsadm dump malloc .    The statistics
> for memory pool id 2 is where  maxFilesToCache/maxStatCache objects are
> and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.
>
> You might want to upgrade to later PTF  as there was a PTF to fix a memory
> leak that occurred in tscomm associated with network connection drops.
>
>
> On Monday, July 24, 2017 5:29 AM, Peter Childs <p.childs at qmul.ac.uk>
> wrote:
>
>
> We have two GPFS clusters.
>
> One is fairly old and running 4.2.1-2 and non CCR and the nodes run
> fine using up about 1.5G of memory and is consistent (GPFS pagepool is
> set to 1G, so that looks about right.)
>
> The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
> increasing in there memory usage, starting at about 1.1G and are find
> for a few days however after a while they grow to 4.2G which when the
> node need to run real work, means the work can't be done.
>
> I'm losing track of what maybe different other than CCR, and I'm trying
> to find some more ideas of where to look.
>
> I'm checked all the standard things like pagepool and maxFilesToCache
> (set to the default of 4000), workerThreads is set to 128 on the new
> gpfs cluster (against default 48 on the old)
>
> I'm not sure what else to look at on this one hence why I'm asking the
> community.
>
> Thanks in advance
>
> Peter Childs
> ITS Research Storage
> Queen Mary University of London.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> --
>
> Peter Childs
> ITS Research Storage
> Queen Mary, University of London
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> --
>
> Peter Childs
> ITS Research Storage
> Queen Mary, University of London
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> The Francis Crick Institute Limited is a registered charity in England and
> Wales no. 1140062 and a company registered in England and Wales no.
> 06885462, with its registered office at 1 Midland Road London NW1 1AT
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/6af94da1/attachment-0002.htm>