[gpfsug-discuss] system.log pool on client nodes for HAWC

Fri Aug 31 19:25:34 BST 2018

I'm going to add a note of caution about HAWC as well...

Firstly this was based on when it was first released,so things might have changed...

HAWC replication uses the same failure group policy for placing replicas, therefore you need to use different failure groups for different client nodes. But do this carefully thinking about your failure domains. For example, we initially set each node in a cluster with its own failure group, might seem like a good idea until you shut the rack down (or even just a few select nodes might do it). You then lose your whole storage cluster by accident. (Or maybe you have hpc nodes and no UPS protection, if they have hawk and there is no protected replica, you lose the fs).

Maybe this is obvious to everyone, but it bit us in various ways in our early testing. So if you plan to implement it, do test how your storage reacts when a client node fails.

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of vtarasov at us.ibm.com [vtarasov at us.ibm.com]
Sent: 31 August 2018 18:49
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] system.log pool on client nodes for HAWC

That is correct. The blocks of each recovery log are striped across the devices in the system.log pool (if it is defined). As a result, even when all clients have a local device in the system.log pool, many writes to the recovery log will go to remote devices. For a client that lacks a local device in the system.log pool, log writes will always be remote.

Notice, that typically in such a setup you would enable log replication for HA. Otherwise, if a single client fails (and its recover log is lost) the whole cluster fails as there is no log  to recover FS to consistent state. Therefore, at least one remote write is essential.

HTH,
--
Vasily Tarasov,
Research Staff Member,
Storage Systems Research,
IBM Research - Almaden

----- Original message -----
From: Kenneth Waegeman <kenneth.waegeman at ugent.be>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC
Date: Tue, Aug 28, 2018 5:31 AM

Hi all,

I was looking into HAWC , using the 'distributed fast storage in client
nodes' method (
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_hawc_using.htm
)

This is achieved by putting  a local device on the clients in the
system.log pool. Reading another article
(https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_syslogpool.htm
) this would now be used for ALL File system recovery logs.

Does this mean that if you have a (small) subset of clients with fast
local devices added in the system.log pool, all other clients will use
these too instead of the central system pool?

Thank you!

Kenneth

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss