<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<div class="default-style">
Hi All,
</div>
<div class="default-style">
<br>
</div>
<div class="default-style">
Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ?
</div>
<div class="default-style">
<br>
</div>
<div class="default-style">
<br>
</div>
<div class="default-style">
<strong>Cache Site only:</strong>
</div>
<div class="default-style">
<div>
TCP Settings:
</div>
<div>
sunrpc.tcp_slot_table_entries = 128
</div>
<div>
<br>
</div>
<div>
<br>
</div>
<div>
<strong>Home and Cache:</strong>
</div>
<div>
AFM / GPFS Settings:
</div>
<div>
maxBufferDescs=163840
</div>
<div>
afmHardMemThreshold=25G
</div>
<div>
afmMaxWriteMergeLen=30G
</div>
<div>
<br>
</div>
<div>
<br>
</div>
<div>
<strong>Cache fileset:</strong>
</div>
<div>
Attributes for fileset AFMFILESET:
<br>================================
<br>Status Linked
<br>Path /mnt/fs02/AFMFILESET
<br>Id 1
<br>Root inode 524291
<br>Parent Id 0
<br>Created Tue Apr 14 15:57:43 2020
<br>Comment
<br>Inode space 1
<br>Maximum number of inodes 10000384
<br>Allocated inodes 10000384
<br>Permission change flag chmodAndSetacl
<br>afm-associated Yes
<br>Target nfs://DK_VPN/mnt/fs01/AFMFILESET
<br>Mode single-writer
<br>File Lookup Refresh Interval 30 (default)
<br>File Open Refresh Interval 30 (default)
<br>Dir Lookup Refresh Interval 60 (default)
<br>Dir Open Refresh Interval 60 (default)
<br>Async Delay 15 (default)
<br>Last pSnapId 0
<br>Display Home Snapshots no
<br>Number of Read Threads per Gateway 64
<br>Parallel Read Chunk Size 128
<br>Parallel Read Threshold 1024
<br>Number of Gateway Flush Threads 48
<br>Prefetch Threshold 0 (default)
<br>Eviction Enabled yes (default)
<br>Parallel Write Threshold 1024
<br>Parallel Write Chunk Size 128
<br>Number of Write Threads per Gateway 16
<br>IO Flags 0 (default)
</div>
<div>
<br>
</div>
<div>
<br>
</div>
<div>
<strong>mmfsadm dump afm:</strong>
</div>
<div>
AFM Gateway:
<br>RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072
<br>readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648
<br>readBypassThresh 67108864
<br>QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600
<br>Ping thread: Started
<br>Fileset: AFMFILESET 1 (fs02)
<br>mode: single-writer queue: Normal MDS: <c0n1> QMem 0 CTL 577
<br>home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16
<br>handler: Mounted Dirty refCount: 1
<br>queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0
<br>remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0
<br>queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78
<br>handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0
<br>lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200
<br>i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64
<br>i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824
<br>i/o: prefetchThresh 0 (Prefetch)
<br>Mnt status: 0:0 1:0 2:0 3:0
<br>Export Map: 10.110.5.10/<c0n0> 10.110.5.11/<c0n1> 10.110.5.12/<c0n2> 10.110.5.13/<c0n9>
<br>Priority Queue: Empty (state: Active)
<br>Normal Queue: Empty (state: Active)
</div>
<div>
<br>
</div>
<div>
<br>
</div>
<div>
<strong>Cluster Config Cache:</strong>
</div>
<div>
maxFilesToCache 131072
<br>maxStatCache 524288
</div>
<div>
afmDIO 2
<br>afmIOFlags 4096
<br>maxReceiverThreads 32
<br>afmNumReadThreads 64
<br>afmNumWriteThreads 8
<br>afmHardMemThreshold 26843545600
<br>maxBufferDescs 163840
<br>afmMaxWriteMergeLen 32212254720
<br>workerThreads 1024
</div>
<div>
<br>
</div>
<div>
<br>
</div>
<div>
The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week..
</div>
<div>
<br>
</div>
<div>
<br>
</div>
<div>
The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature.
</div>
<div>
<br>
</div>
<div>
<br>
</div>
<div>
If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-)
<br>
<br>
<br>Many Thanks in Advance!
</div>
<div>
Andi Christiansen
</div>
<div>
<br>
</div>
</div>
</body>
</html>