[gpfsug-discuss] gpfsug-discuss Digest, Vol 155, Issue 4
Truong Vu
truongv at us.ibm.com
Wed Jun 25 17:12:13 BST 2025
Zen5 (CPU family 26) is not include in the list that GPFS checks and applies the workaround if needed.
If you are running 5.2.2.1 on CPU family 25 model 1 and 17, it should automatically checks and applies the workaround.
Thanks for providing the lscpu output. We will add family 26 to the list.
Thanks,
Tru.
On 6/25/25, 7:01 AM, "gpfsug-discuss on behalf of gpfsug-discuss-request at gpfsug.org <mailto:gpfsug-discuss-request at gpfsug.org>" <gpfsug-discuss-bounces at gpfsug.org <mailto:gpfsug-discuss-bounces at gpfsug.org> on behalf of gpfsug-discuss-request at gpfsug.org <mailto:gpfsug-discuss-request at gpfsug.org>> wrote:
Send gpfsug-discuss mailing list submissions to
gpfsug-discuss at gpfsug.org <mailto:gpfsug-discuss at gpfsug.org>
To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-request at gpfsug.org <mailto:gpfsug-discuss-request at gpfsug.org>
You can reach the person managing the list at
gpfsug-discuss-owner at gpfsug.org <mailto:gpfsug-discuss-owner at gpfsug.org>
When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."
Today's Topics:
1. Re: gpfsug-discuss Digest, Vol 155, Issue 2 (Jonathan Buzzard)
----------------------------------------------------------------------
Message: 1
Date: Wed, 25 Jun 2025 09:38:19 +0100
From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk <mailto:jonathan.buzzard at strath.ac.uk>>
To: gpfsug-discuss at gpfsug.org <mailto:gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 155, Issue 2
Message-ID: <575f9664-76af-43fd-8205-6ed2ea5ab750 at strath.ac.uk <mailto:575f9664-76af-43fd-8205-6ed2ea5ab750 at strath.ac.uk>>
Content-Type: text/plain; charset=UTF-8; format=flowed
On 24/06/2025 19:00, Truong Vu wrote:
>
> There is an undocumented option for this purpose. You can issue
> mmdelnode -f on the node bad node. This cleans up leftover
> configuration and stop/start services if needed.
Thanks, one to remember then. Though to be honest having used GPFS now
for 18+ years that is the first time I have needed it.
>>>> Specifically dual EPYC 9555
> If tsgskkm is hung, you may hit a known gskit issue. Can you
> manually apply the workaround and see if it works?
>
> Insert the following lines to file
> /usr/lpp/mmfs/lib/gsk8/C/icc/icclib/ICCSIG.txt
>
> ICC_SHIFT=3
> ICC_TRNG=TRNG_ALT4
>
> Insert the following lines to file
> /usr/lpp/mmfs/lib/gsk8/N/icc/icclib/ICCSIG.txt
>
> ICC_TRNG=TRNG_ALT4
>
That did the trick.
I quick Google shows this being an issue back in 2020 (on this list)
with GPFS 4.2 on AMD Epyc. And also this APAR from 2023
https://www.ibm.com/support/pages/apar/IJ43790 <https://www.ibm.com/support/pages/apar/IJ43790>
The suggest fix is a little different too.
However I already have some AMD EPYC 7513 based servers on the system
running 5.1.9-6 (to be upgraded real soon now to 5.2.2-1) which are
according to lscpu CPU family 25. I have no recollection of doing
anything special and I don't notice the fix in the files.
> Can you post lscpu output?
>
See below, my educated guess is that this is CPU family 26 and whatever
fix IBM introduced for CPU a family 25 doesn't work on Zen 5 CPU's.
JAB.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Vendor ID: AuthenticAMD
BIOS Vendor ID: AMD
Model name: AMD EPYC 9555 64-Core Processor
BIOS Model name: AMD EPYC 9555 64-Core Processor
CPU family: 26
Model: 2
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
Stepping: 1
Frequency boost: enabled
CPU(s) scaling MHz: 72%
CPU max MHz: 4409.3750
CPU min MHz: 1500.0000
BogoMIPS: 6390.74
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_go
od amd_lbr_v2 nopl nonstop_tsc cpuid
extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid
sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdran
d lahf_lm cmp_legacy svm extapic cr8_legacy
abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext
perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb
cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2
ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep
bmi2 invpcid cqm rdt_a avx512f avx512dq rdseed a
dx smap avx512ifma clflushopt clwb avx512cd
sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc
cqm_occup_llc cqm_mbm_total cqm_mbm_local avx_vnni avx512
_bf16 clzero irperf xsaveerptr rdpru wbnoinvd
amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean
flushbyasid decodeassists pausefilter pfthreshold a
vic v_vmsave_vmload vgif x2avic v_spec_ctrl
vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq
avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bu
s_lock_detect movdiri movdir64b
overflow_recov succor smca avx512_vp2intersect flush_l1d debug_swap
amd_lbr_pmc_freeze
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 6 MiB (128 instances)
L1i: 4 MiB (128 instances)
L2: 128 MiB (128 instances)
L3: 512 MiB (16 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-63,128-191
NUMA node1 CPU(s): 64-127,192-255
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled
via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and
__user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB
conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI
Not affected
Srbds: Not affected
Tsx async abort: Not affected
--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
------------------------------
Subject: Digest Footer
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >
------------------------------
End of gpfsug-discuss Digest, Vol 155, Issue 4
**********************************************
More information about the gpfsug-discuss
mailing list