[gpfsug-discuss] gpfsug-discuss Digest, Vol 155, Issue 2
Jonathan Buzzard
jonathan.buzzard at strath.ac.uk
Wed Jun 25 09:38:19 BST 2025
On 24/06/2025 19:00, Truong Vu wrote:
>
> There is an undocumented option for this purpose. You can issue
> mmdelnode -f on the node bad node. This cleans up leftover
> configuration and stop/start services if needed.
Thanks, one to remember then. Though to be honest having used GPFS now
for 18+ years that is the first time I have needed it.
>>>> Specifically dual EPYC 9555
> If tsgskkm is hung, you may hit a known gskit issue. Can you
> manually apply the workaround and see if it works?
>
> Insert the following lines to file
> /usr/lpp/mmfs/lib/gsk8/C/icc/icclib/ICCSIG.txt
>
> ICC_SHIFT=3
> ICC_TRNG=TRNG_ALT4
>
> Insert the following lines to file
> /usr/lpp/mmfs/lib/gsk8/N/icc/icclib/ICCSIG.txt
>
> ICC_TRNG=TRNG_ALT4
>
That did the trick.
I quick Google shows this being an issue back in 2020 (on this list)
with GPFS 4.2 on AMD Epyc. And also this APAR from 2023
https://www.ibm.com/support/pages/apar/IJ43790
The suggest fix is a little different too.
However I already have some AMD EPYC 7513 based servers on the system
running 5.1.9-6 (to be upgraded real soon now to 5.2.2-1) which are
according to lscpu CPU family 25. I have no recollection of doing
anything special and I don't notice the fix in the files.
> Can you post lscpu output?
>
See below, my educated guess is that this is CPU family 26 and whatever
fix IBM introduced for CPU a family 25 doesn't work on Zen 5 CPU's.
JAB.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Vendor ID: AuthenticAMD
BIOS Vendor ID: AMD
Model name: AMD EPYC 9555 64-Core Processor
BIOS Model name: AMD EPYC 9555 64-Core Processor
CPU family: 26
Model: 2
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
Stepping: 1
Frequency boost: enabled
CPU(s) scaling MHz: 72%
CPU max MHz: 4409.3750
CPU min MHz: 1500.0000
BogoMIPS: 6390.74
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_go
od amd_lbr_v2 nopl nonstop_tsc cpuid
extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid
sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdran
d lahf_lm cmp_legacy svm extapic cr8_legacy
abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext
perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb
cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2
ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep
bmi2 invpcid cqm rdt_a avx512f avx512dq rdseed a
dx smap avx512ifma clflushopt clwb avx512cd
sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc
cqm_occup_llc cqm_mbm_total cqm_mbm_local avx_vnni avx512
_bf16 clzero irperf xsaveerptr rdpru wbnoinvd
amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean
flushbyasid decodeassists pausefilter pfthreshold a
vic v_vmsave_vmload vgif x2avic v_spec_ctrl
vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq
avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bu
s_lock_detect movdiri movdir64b
overflow_recov succor smca avx512_vp2intersect flush_l1d debug_swap
amd_lbr_pmc_freeze
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 6 MiB (128 instances)
L1i: 4 MiB (128 instances)
L2: 128 MiB (128 instances)
L3: 512 MiB (16 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-63,128-191
NUMA node1 CPU(s): 64-127,192-255
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled
via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and
__user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB
conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI
Not affected
Srbds: Not affected
Tsx async abort: Not affected
--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
More information about the gpfsug-discuss
mailing list