<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:10.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Yikes! Those must be some mighty large memory compute nodes! That is an OK setting for a large memory ESS/DSS server but NOT the compute nodes at my site, as that is in bytes.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">(so ~324 GB) Even on our 1TB+ memory machines we do not tune it that high.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">You can set pagepool for nodeclass machines such as all your compute, but pagepool is one of those settings where you will have to restart the clients for it to take effect. (such as most all the rdma settings,
etc)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">You should look into creating a “nodeclass” for each of your “node types” if you have not already, so you can avoid OOM issues from just the pagepool, and tune other settings per node-type (rdma/network settings,
etc)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I would address this here, rather than on the Slurm side. Then you can address (total memory minus the pagepool) for the overall addressability to Slurm for user jobs. Leave some spare memory for the system
itself or you will see more memory issues and whatnot when users get close to OOM, even in their cgroup.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Example from a cross mounted compute-side cluster. Default is 1GB:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[root@nostorage-manager1 ~]# mmlsconfig pagepool<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 1024M<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 4G [k8,pitzer]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 64G [ascend]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 16G [ib-spire-login,owenslogin,pitzerlogin]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 48G [dm]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 4G [cardinal]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 64G [cardinal_quadport]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">example from the ESS/DSS server side. Later ESS versions set things by mmvdisk groups, rather than server type.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"># mmlsconfig pagepool<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 32G<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 358G [gss_ppc64]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 16384M [ibmems11-hs,ems]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 324383477760 [ess3200_mmvdisk_ibmessio13_hs_ibmessio14_hs,ess3200_mmvdisk_ibmessio15_hs_ibmessio16_hs,ess3200_mmvdisk_ibmessio17_hs_ibmessio18_hs]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 64G [sp]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 384399572992 [ibmgssio1_hsibmgssio2_hs,ibmgssio3_hsibmgssio4_hs,ibmgssio5_hsibmgssio6_hs]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 573475966156 [ess5k_mmvdisk_ibmessio11_hs_ibmessio12_hs]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">pagepool 96G [ces]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">example of nodeclasses used to address other settings, such as what Infiniband port(s) to use.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"># mmlsconfig verbsports<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">verbsPorts mlx5_0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">verbsPorts mlx5_0 mlx5_2 [pitzer_dualport]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">verbsPorts mlx4_1/1 mlx4_1/2 [dm]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">verbsPorts mlx5_0 mlx5_2 [k8_dualport]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">verbsPorts mlx5_0 mlx5_1 mlx5_2 mlx5_3 [cardinal_quadport]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Ed Wahl<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Ohio Supercomputer Center<o:p></o:p></span></p>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt">From:</span></b><span style="font-size:11.0pt"> gpfsug-discuss <gpfsug-discuss-bounces@gpfsug.org>
<b>On Behalf Of </b>Iban Cabrillo<br>
<b>Sent:</b> Friday, March 8, 2024 9:40 AM<br>
<b>To:</b> gpfsug-discuss <gpfsug-discuss@spectrumscale.org><br>
<b>Subject:</b> [gpfsug-discuss] pagepool<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal" style="mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white">Good afternoon, We are new to the DSS system configurations. Reviewing the configuration I have seen that the default pagepool is set to this value: pagepool 323908133683
But not only in the DSS servers, but also in the rest of the HPC nodes <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white"><o:p></o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black">Good afternoon,<o:p></o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"> We are new to the DSS system configurations. Reviewing the configuration I have seen that the default pagepool is set to this value:<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><strong><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"> pagepool 323908133683</span></strong><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black">But not only in the DSS servers, but also in the rest of the HPC nodes and I don't know if it is an excessive value. We are noticing that some jobs are dying by "Memory
cgroup out of memory: Killed process XXX", and my doubt is if this pagepool is reserving too much memory for the mmfs process in decripento of the execution of jobs.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black">Any advice is welcomed,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black">Regards, I<o:p></o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black">--
<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"><br>
================================================================<br>
Ibán Cabrillo Bartolomé<br>
Instituto de Física de Cantabria (IFCA-CSIC)<br>
Santander, Spain<br>
Tel: +34942200969/+34669930421<br>
Responsible for advanced computing service (RSC)<br>
=========================================================================================<br>
=========================================================================================<br>
All our suppliers must know and accept IFCA policy available at:<br>
<br>
<a href="https://urldefense.com/v3/__https:/confluence.ifca.es/display/IC/Information*Security*Policy*for*External*Suppliers__;KysrKys!!KGKeukY!3o_dGRsvxDtOG6Z646nJEb9ehb_ondS1kL3gecKjKN7mvMULc6h9iKST-ihDjnWz04X-lcNATjPzLDB2eW7P$">https://confluence.ifca.es/display/IC/Information+Security+Policy+for+External+Suppliers</a><br>
==========================================================================================<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
</body>
</html>