<span style=" font-size:10pt;font-family:sans-serif">(Aside from QOS,

I second the notion to review your "failure groups" if you are

using and depending on data replication.)<br></span><br><span style=" font-size:10pt;font-family:sans-serif">For QOS, some

suggestions:</span><br><br><span style=" font-size:10pt;font-family:sans-serif">You might want

to define a set of nodes that will do restripes using `mmcrnodeclass restripers

-N ...`</span><br><br><span style=" font-size:10pt;font-family:sans-serif">You can initially

just enable `mmchqos FS --enable` and then monitor performance of your

restripefs command</span><br><span style=" font-size:10pt;font-family:sans-serif"> `mmrestripefs

FS -b -N restripers` that restricts operations to the restripers nodeclass.</span><br><br><span style=" font-size:10pt;font-family:sans-serif">with `mmlsqos

FS --seconds 60 [[see other options]]` </span><br><br><span style=" font-size:10pt;font-family:sans-serif">Suppose you see

an average iops rates of several thousand IOPs and you decide that is interfering

with other work...</span><br><br><span style=" font-size:10pt;font-family:sans-serif">Then, for example,

you could "slow down" or "pace"  mmrestripefs

to use 999 iops within the system pool and 1999 iops within the data pool

with:</span><br><br><span style=" font-size:10pt;font-family:sans-serif">mmchqos FS --enable

-N restripers pool=system,maintenance=999iops  pool=data,maintenance=1999iops</span><br><br><span style=" font-size:10pt;font-family:sans-serif">And monitor that

with mmlsqos.</span><br><br><span style=" font-size:10pt;font-family:sans-serif">Tip: For a more

graphical view of QOS and disk performance, try samples/charts/qosplotfine.pl.

You will need to have gnuplot working...</span><br><br><span style=" font-size:10pt;font-family:sans-serif">If you are "into"

performance tools you might want to look at the --fine-stats options of

mmchqos and mmlsqos and plug that into your favorite performance viewer/plotter/analyzer

tool(s).</span><br><span style=" font-size:10pt;font-family:sans-serif">(Technical:</span><br><span style=" font-size:10pt;font-family:sans-serif"> mmlsqos

--fine-stats is written to be used and digested by scripts, no so much

for human "eyeballing".</span><br><span style=" font-size:10pt;font-family:sans-serif"> The --fine-stats

argument of mmchqos is a number of seconds.  The --fine-stats argument

of mmlsqos is one or two index values.  </span><br><span style=" font-size:10pt;font-family:sans-serif">The doc for mmlsqos

explains this and the qosplotfine.pl script is an example of how to use

it.</span><br><span style=" font-size:10pt;font-family:sans-serif">)</span><br><br><br><br><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">From:

       </span><span style=" font-size:9pt;font-family:sans-serif">"Luis

Bolinches" <luis.bolinches@fi.ibm.com></span><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">To:

       </span><span style=" font-size:9pt;font-family:sans-serif">"gpfsug

main discussion list" <gpfsug-discuss@spectrumscale.org></span><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">Date:

       </span><span style=" font-size:9pt;font-family:sans-serif">08/21/2018

12:56 AM</span><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">Subject:

       </span><span style=" font-size:9pt;font-family:sans-serif">Re:

[gpfsug-discuss] Rebalancing with mmrestripefs -P</span><br><span style=" font-size:9pt;color:#5f5f5f;font-family:sans-serif">Sent

by:        </span><span style=" font-size:9pt;font-family:sans-serif">gpfsug-discuss-bounces@spectrumscale.org</span><br><hr noshade><br><br><br><span style=" font-size:10pt">Hi<br><br>You can enable QoS first to see the activity while on inf value to see

the current values of usage and set the li is later on. Those limits are

modificable online so even in case you have (not your case it seems) less

activity times those can be increased for replication then and Lowe again

on peak times. <br><br>—<br>SENT FROM MOBILE DEVICE<br>Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations<br>Luis Bolinches<br>Consultant IT Specialist<br>Mobile Phone: +358503112585</span><span style=" font-size:10pt;color:blue"><u><br></u></span><a href="https://www.youracclaim.com/user/luis-bolinches"><span style=" font-size:10pt;color:blue"><u>https://www.youracclaim.com/user/luis-bolinches</u></span></a><span style=" font-size:10pt"><br><br>"If you always give you will always have" -- Anonymous<br><br>> On 21 Aug 2018, at 1.21, david_johnson@brown.edu wrote:<br>> <br>> Yes the arrays are in different buildings. We want to spread the activity

over more servers if possible but recognize the extra load that rebalancing

would entail. The system is busy all the time. <br>> <br>> I have considered using QOS when we run policy migrations but haven’t

yet because I don’t know what value to allow for throttling IOPS. We need

to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously

I’ve just let it run at native speed. I’d like to know what other folks

have used for QOS settings. <br>> <br>> I think we may leave things alone for now regarding the original question,

rebalancing this pool. <br>> <br>> -- ddj<br>> Dave Johnson<br>> <br>>> On Aug 20, 2018, at 6:08 PM, valdis.kletnieks@vt.edu wrote:<br>>> <br>>> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock"

said:<br>>> <br>>>> Note you have two additional NSDs in the 33 failure group

than you do in<br>>>> the 23 failure group. You may want to change one of those

NSDs in failure<br>>>> group 33 to be in failure group 23 so you have equal storage

space in both<br>>>> failure groups.<br>>> <br>>> Keep in mind that the failure groups should be built up based

on single points of failure.<br>>> In other words, a failure group should consist of disks that will

all stay up or all go down on<br>>> the same failure (controller, network, whatever).<br>>> <br>>> Looking at the fact that you have 6 disks named 'dNN_george_33'

and 8 named 'dNN_cit_33',<br>>> it sounds very likely that they are in two different storage arrays,

and you should make your<br>>> failure groups so they don't span a storage array. In other words,

taking a 'cit' disk<br>>> and moving it into a 'george' failure group will Do The Wrong

Thing, because if you do<br>>> data replication, one copy can go onto a 'george' disk, and the

other onto a 'cit' disk<br>>> that's in the same array as the 'george' disk. If 'george' fails,

you lose access to both<br>>> replicas.<br>>> _______________________________________________<br>>> gpfsug-discuss mailing list<br>>> gpfsug-discuss at spectrumscale.org<br>>> </span><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><span style=" font-size:10pt;color:blue"><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></span></a><span style=" font-size:10pt"><br>> _______________________________________________<br>> gpfsug-discuss mailing list<br>> gpfsug-discuss at spectrumscale.org<br>> </span><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><span style=" font-size:10pt;color:blue"><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></span></a><span style=" font-size:10pt"><br>> </span><span style=" font-size:12pt"><br><br>Ellei edellä ole toisin mainittu: / Unless stated otherwise above:<br>Oy IBM Finland Ab<br>PL 265, 00101 Helsinki, Finland<br>Business ID, Y-tunnus: 0195876-3 <br>Registered in Finland<br></span><tt><span style=" font-size:10pt">_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></span></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><span style=" font-size:10pt">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</span></tt></a><tt><span style=" font-size:10pt"><br></span></tt><br><br><BR>