[gpfsug-discuss] gpfsug-discuss Digest, Vol 155, Issue 2

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Wed Jun 25 09:38:19 BST 2025


On 24/06/2025 19:00, Truong Vu wrote:
> 
> There is an undocumented option for this purpose. You can issue 
> mmdelnode -f on the node bad node. This cleans up leftover 
> configuration and stop/start services if needed.

Thanks, one to remember then. Though to be honest having used GPFS now
for 18+ years that is the first time I have needed it.

>>>> Specifically dual EPYC 9555
> If tsgskkm is hung, you may hit a known gskit issue. Can you
> manually apply the workaround and see if it works?
> 
> Insert the following lines to file 
> /usr/lpp/mmfs/lib/gsk8/C/icc/icclib/ICCSIG.txt
 >
> ICC_SHIFT=3 
> ICC_TRNG=TRNG_ALT4
> 
> Insert the following lines to file 
> /usr/lpp/mmfs/lib/gsk8/N/icc/icclib/ICCSIG.txt
 >
> ICC_TRNG=TRNG_ALT4
>

That did the trick.

I quick Google shows this being an issue back in 2020 (on this list) 
with GPFS 4.2 on AMD Epyc. And also this APAR from 2023

https://www.ibm.com/support/pages/apar/IJ43790

The suggest fix is a little different too.

However I already have some AMD EPYC 7513 based servers on the system 
running 5.1.9-6 (to be upgraded real soon now to 5.2.2-1) which are 
according to lscpu CPU family 25. I have no recollection of doing 
anything special and I don't notice the fix in the files.

> Can you post lscpu output?
> 

See below, my educated guess is that this is CPU family 26 and whatever 
fix IBM introduced for CPU a family 25 doesn't work on Zen 5 CPU's.


JAB.


Architecture:             x86_64
   CPU op-mode(s):         32-bit, 64-bit
   Address sizes:          52 bits physical, 57 bits virtual
   Byte Order:             Little Endian
CPU(s):                   256
   On-line CPU(s) list:    0-255
Vendor ID:                AuthenticAMD
   BIOS Vendor ID:         AMD
   Model name:             AMD EPYC 9555 64-Core Processor
     BIOS Model name:      AMD EPYC 9555 64-Core Processor
     CPU family:           26
     Model:                2
     Thread(s) per core:   2
     Core(s) per socket:   64
     Socket(s):            2
     Stepping:             1
     Frequency boost:      enabled
     CPU(s) scaling MHz:   72%
     CPU max MHz:          4409.3750
     CPU min MHz:          1500.0000
     BogoMIPS:             6390.74
     Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep 
mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx 
mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_go
                           od amd_lbr_v2 nopl nonstop_tsc cpuid 
extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid 
sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdran
                           d lahf_lm cmp_legacy svm extapic cr8_legacy 
abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext 
perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb
                           cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 
ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep 
bmi2 invpcid cqm rdt_a avx512f avx512dq rdseed a
                           dx smap avx512ifma clflushopt clwb avx512cd 
sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc 
cqm_occup_llc cqm_mbm_total cqm_mbm_local avx_vnni avx512
                           _bf16 clzero irperf xsaveerptr rdpru wbnoinvd 
amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean 
flushbyasid decodeassists pausefilter pfthreshold a
                           vic v_vmsave_vmload vgif x2avic v_spec_ctrl 
vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq 
avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bu
                           s_lock_detect movdiri movdir64b 
overflow_recov succor smca avx512_vp2intersect flush_l1d debug_swap 
amd_lbr_pmc_freeze
Virtualization features:
   Virtualization:         AMD-V
Caches (sum of all):
   L1d:                    6 MiB (128 instances)
   L1i:                    4 MiB (128 instances)
   L2:                     128 MiB (128 instances)
   L3:                     512 MiB (16 instances)
NUMA:
   NUMA node(s):           2
   NUMA node0 CPU(s):      0-63,128-191
   NUMA node1 CPU(s):      64-127,192-255
Vulnerabilities:
   Gather data sampling:   Not affected
   Itlb multihit:          Not affected
   L1tf:                   Not affected
   Mds:                    Not affected
   Meltdown:               Not affected
   Mmio stale data:        Not affected
   Reg file data sampling: Not affected
   Retbleed:               Not affected
   Spec rstack overflow:   Not affected
   Spec store bypass:      Mitigation; Speculative Store Bypass disabled 
via prctl
   Spectre v1:             Mitigation; usercopy/swapgs barriers and 
__user pointer sanitization
   Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB 
conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI 
Not affected
   Srbds:                  Not affected
   Tsx async abort:        Not affected


--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG




More information about the gpfsug-discuss mailing list