<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Aptos;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
font-size:11.0pt;
font-family:"Aptos",sans-serif;}
span.EmailStyle23
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:118841024;
mso-list-template-ids:1639771872;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1
{mso-list-id:362635915;
mso-list-template-ids:-1712563738;}
@list l1:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-family:"Calibri",sans-serif">IIRC, the filesystem descriptor disk is only in the system pool… so as long as the system pool
<b>only has 3 FGs</b> and they correspond to your 3 sites, then the filesystem survivability characteristics are straightforward.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Calibri",sans-serif">I think that you technically
<i>could</i> use two <i>different</i> FGs for the second pool and GPFS will still work as expected, but that just seems confusing to the humans. We started off with one multisite stretch cluster like you describe and 10 years later we have around 100 stretch
clusters. Choosing a standard mapping between FG numbers and your sites can be a good way to reduce cognitive load on your team.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Calibri",sans-serif"><br>
-Paul<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-family:"Calibri",sans-serif">From:</span></b><span style="font-family:"Calibri",sans-serif"> gpfsug-discuss <gpfsug-discuss-bounces@gpfsug.org>
<b>On Behalf Of </b>Luke Sudbery<br>
<b>Sent:</b> Tuesday, March 18, 2025 16:18<br>
<b>To:</b> gpfsug main discussion list <gpfsug-discuss@gpfsug.org><br>
<b>Subject:</b> Re: [gpfsug-discuss] Replicated cluster - failure groups<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p><span lang="EN-GB" style="font-size:8.5pt;font-family:"Verdana",sans-serif;color:#CC0000">This message was sent by an external party.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-GB">But any pros/cons of 5 (2 failure groups per site + tiebreaker ) vs 3?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">If we had 1 failure group per site, would we need to bring up all NSDs on that site (4x DSS – 8 actual servers) to guarantee bringing up the NSD with the desc replica disk?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">Luke<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;color:#1F497D">-- <o:p>
</o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;color:#1F497D">Luke Sudbery<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;color:#1F497D">Principal Engineer (HPC and Storage).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;color:#1F497D">Architecture, Infrastructure and Systems<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;color:#1F497D">Advanced Research Computing, IT Services<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;color:#1F497D">Room 132, Computer Centre G5, Elms Road<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-GB" style="font-size:9.0pt;color:#1F497D">Please note I don’t work on Monday.<o:p></o:p></span></b></p>
</div>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-family:"Calibri",sans-serif">From:</span></b><span style="font-family:"Calibri",sans-serif"> gpfsug-discuss <<a href="mailto:gpfsug-discuss-bounces@gpfsug.org">gpfsug-discuss-bounces@gpfsug.org</a>>
<b>On Behalf Of </b>scale<br>
<b>Sent:</b> 18 March 2025 17:21<br>
<b>To:</b> gpfsug main discussion list <<a href="mailto:gpfsug-discuss@gpfsug.org">gpfsug-discuss@gpfsug.org</a>><br>
<b>Subject:</b> Re: [gpfsug-discuss] Replicated cluster - failure groups<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" align="left" width="100%" style="width:100.0%">
<tbody>
<tr>
<td width="10%" style="width:10.0%;border:solid #9C6500 1.0pt;background:#FFEB9C;padding:2.0pt 2.0pt 2.0pt 2.0pt">
<p class="MsoNormal" style="mso-element:frame;mso-element-frame-hspace:2.25pt;mso-element-wrap:around;mso-element-anchor-vertical:paragraph;mso-element-anchor-horizontal:column;mso-height-rule:exactly">
<b><span style="font-size:12.0pt;color:black">CAUTION:</span></b><span style="font-size:12.0pt;color:black"> This email originated from outside the organisation. Do not click links or open attachments unless you recognise the sender and know the content is
safe.</span><span style="font-size:12.0pt"><o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal">There is no advantage to having more failure group than maximum number of replica supported by a file system plus 1 for tie breaker disks. In a multiple site setup, you will want 1 failure group per site in order to ensure 1 replica
is placed at each site as GPFS will place replica using round-robin amount the failure groups.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div id="mail-editor-reference-message-container">
<div>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">gpfsug-discuss <<a href="mailto:gpfsug-discuss-bounces@gpfsug.org">gpfsug-discuss-bounces@gpfsug.org</a>> on behalf of Luke Sudbery <<a href="mailto:l.r.sudbery@bham.ac.uk">l.r.sudbery@bham.ac.uk</a>><br>
<b>Date: </b>Tuesday, March 18, 2025 at 9:28</span><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"> </span><span style="font-size:12.0pt;color:black">AM<br>
<b>To: </b><a href="mailto:gpfsug-discuss@gpfsug.org">gpfsug-discuss@gpfsug.org</a> <<a href="mailto:gpfsug-discuss@gpfsug.org">gpfsug-discuss@gpfsug.org</a>><br>
<b>Subject: </b>[EXTERNAL] [gpfsug-discuss] Replicated cluster - failure groups<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white">We are planning a replicated cluster. Due to a combination of purchasing cycles, floor loading and VAT-exemption status for half the equipment/data, this will be
built over time using a total 8 Lenovo DSS building blocks. 2 main pools, in 2<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal">We are planning a replicated cluster. Due to a combination of purchasing cycles, floor loading and VAT-exemption status for half the equipment/data, this will be built over time using a total 8 Lenovo DSS building blocks. 2 main pools,
in 2 data centres, with 2 DSSG per pool, and a quorum/manager node with a local tie breaker disk in a 3rd physical location.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">My main question is about failure groups - so far, with 2 DSS and 1 tiebreaker, we would have had 1 failure group per DSS and 1 for the tie breaker disk, giving us a total of 3. But if we did that now we would have 9 failure groups in 1
filesystem, which is more than the maximum number of replicas of the file system descriptor and not desirable, as I understand it.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">So we could have either:<o:p></o:p></p>
<ul style="margin-top:0in" type="disc">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l1 level1 lfo3">1 FG per physical site, and assign all 4 DSS per site to 1 FG, and a 3rd to the tiebreaker<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l1 level1 lfo3">1 FG per pool per site, with 2 DSS in each FG. This makes sense as the pairs of DSSG will both always need to be up for all the data in the pool to be accessible.
<o:p></o:p></li></ul>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">The second option would give us 5 failure groups, but what would be the advantage and disadvantages of more failure groups?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Many thanks,<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Luke<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D">-- </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D">Luke Sudbery</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D">Principal Engineer (HPC and Storage).</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D">Architecture, Infrastructure and Systems</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D">Advanced Research Computing, IT Services</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D">Room 132, Computer Centre G5, Elms Road</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal"><b><span style="font-size:9.0pt;color:#1F497D">Please note I don’t work on Monday.</span></b><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>