[gpfsug-discuss] Tuning Spectrum Scale AFM for stability?

Andi Christiansen andi at christiansen.xxx
Tue Apr 28 07:34:37 BST 2020


Hi All,

Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ?


Cache Site only:
TCP Settings:
sunrpc.tcp_slot_table_entries = 128 


Home and Cache:
AFM / GPFS Settings:
maxBufferDescs=163840
afmHardMemThreshold=25G
afmMaxWriteMergeLen=30G


Cache fileset:
Attributes for fileset AFMFILESET:
================================
Status Linked
Path /mnt/fs02/AFMFILESET
Id 1
Root inode 524291
Parent Id 0
Created Tue Apr 14 15:57:43 2020
Comment
Inode space 1
Maximum number of inodes 10000384
Allocated inodes 10000384
Permission change flag chmodAndSetacl
afm-associated Yes
Target nfs://DK_VPN/mnt/fs01/AFMFILESET
Mode single-writer
File Lookup Refresh Interval 30 (default)
File Open Refresh Interval 30 (default)
Dir Lookup Refresh Interval 60 (default)
Dir Open Refresh Interval 60 (default)
Async Delay 15 (default)
Last pSnapId 0
Display Home Snapshots no
Number of Read Threads per Gateway 64
Parallel Read Chunk Size 128
Parallel Read Threshold 1024
Number of Gateway Flush Threads 48
Prefetch Threshold 0 (default)
Eviction Enabled yes (default)
Parallel Write Threshold 1024
Parallel Write Chunk Size 128
Number of Write Threads per Gateway 16
IO Flags 0 (default)


mmfsadm dump afm:
AFM Gateway:
RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072
readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648
readBypassThresh 67108864
QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600
Ping thread: Started
Fileset: AFMFILESET 1 (fs02)
mode: single-writer queue: Normal MDS: <c0n1> QMem 0 CTL 577
home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16
handler: Mounted Dirty refCount: 1
queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0
remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0
queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78
handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0
lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200
i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64
i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824
i/o: prefetchThresh 0 (Prefetch)
Mnt status: 0:0 1:0 2:0 3:0
Export Map: 10.110.5.10/<c0n0> 10.110.5.11/<c0n1> 10.110.5.12/<c0n2> 10.110.5.13/<c0n9>
Priority Queue: Empty (state: Active)
Normal Queue: Empty (state: Active)


Cluster Config Cache:
maxFilesToCache 131072
maxStatCache 524288
afmDIO 2
afmIOFlags 4096
maxReceiverThreads 32
afmNumReadThreads 64
afmNumWriteThreads 8
afmHardMemThreshold 26843545600
maxBufferDescs 163840
afmMaxWriteMergeLen 32212254720
workerThreads 1024


The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week..


The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature.


If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-)


Many Thanks in Advance!
Andi Christiansen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200428/9ca9dcd7/attachment-0001.htm>


More information about the gpfsug-discuss mailing list