[gpfsug-discuss] RKM resilience questions testing and best practice

Alec anacreo at gmail.com
Thu Aug 17 16:52:08 BST 2023


Yesterday I proposed treating the replicated key servers as 2 different
sets of servers.  And having scale address two of the RKM servers by one
rkmid/tenant/devicegrp/client name, and having a second
rkmid/tenant/devicegrp/client name for the 2nd set of servers.

So define the same cluster of key management servers in two separate
stanzas of RKM.conf, an upper and lower half.

If we do that and key management team takes one set offline, everything
should work but scale would think one set of keys are offline and scream.

I think we need an IBM ticket to help vet all that out.

Alec

On Thu, Aug 17, 2023, 8:11 AM Jan-Frode Myklebust <janfrode at tanso.net>
wrote:

>
> Your second KMIP server don’t need to have an active replication
> relationship with the first one — it just needs to contain the same MEK. So
> you could do a one time replication / copying between them, and they would
> not have to see each other anymore.
>
> I don’t think having them host different keys will work, as you won’t be
> able to fetch the second key from the one server your client is connected
> to, and then will be unable to encrypt with that key.
>
> From what I’ve seen of KMIP setups with Scale, it’s a stupidly trivial
> service. It’s just a server that will tell you the key when asked + some
> access control to make sure no one else gets it. Also MEKs never changes…
> unless you actively change them in the file system policy, and then you
> could just post the new key to all/both your independent key servers when
> you do the change.
>
>
>  -jf
>
> ons. 16. aug. 2023 kl. 23:25 skrev Alec <anacreo at gmail.com>:
>
>> Ed
>>   Thanks for the response, I wasn't aware of those two commands.  I will
>> see if that unlocks a solution. I kind of need the test to work in a
>> production environment.   So can't just be adding spare nodes onto the
>> cluster and forgetting with file systems.
>>
>> Unfortunately the logs don't indicate when a node has returned to
>> health.  Only that it's in trouble but as we patch often we see these
>> regularly.
>>
>>
>> For the second question, we would add a 2nd MEK key to each file so that
>> two independent keys from two different RKM pools would be able to unlock
>> any file.  This would give us two whole independent paths to encrypt and
>> decrypt a file.
>>
>> So I'm looking for a best practice example from IBM to indicate this so
>> we don't have a dependency on a single RKM environment.
>>
>> Alec
>>
>>
>>
>> On Wed, Aug 16, 2023, 2:02 PM Wahl, Edward <ewahl at osc.edu> wrote:
>>
>>> > How can we verify that a key server is up and running when there are
>>> multiple key servers in an rkm pool serving a single key.
>>>
>>>
>>>
>>> Pretty simple.
>>>
>>> -Grab a compute node/client (and mark it offline if needed) unmount all
>>> encrypted File Systems.
>>>
>>> -Hack the RKM.conf to point to JUST the server you want to test (and
>>> maybe a backup)
>>>
>>> -Clear all keys:   ‘/usr/lpp/mmfs/bin/tsctl encKeyCachePurge all ‘
>>>
>>> -Reload the RKM.conf:  ‘/usr/lpp/mmfs/bin/tsloadikm run’   (this is a
>>> great command if you need to load new Certificates too)
>>>
>>> -Attempt to mount the encrypted FS, and then cat a few files.
>>>
>>>
>>>
>>> If you’ve not setup a 2nd server in your test you will see quarantine
>>> messages in the logs for a bad KMIP server.    If it works, you can clear
>>> keys again and see how many were retrieved.
>>>
>>>
>>>
>>> >Is there any documentation or diagram officially from IBM that
>>> recommends having 2 keys from independent RKM environments for high
>>> availability as best practice that I could refer to?
>>>
>>>
>>>
>>> I am not an IBM-er…  but I’m also not 100% sure what you are asking
>>> here.   Two un-related SKLM setups? How would you sync the keys?   How
>>> would this be better than multiple replicated servers?
>>>
>>>
>>>
>>> Ed Wahl
>>>
>>> Ohio Supercomputer Center
>>>
>>>
>>>
>>> *From:* gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> *On Behalf
>>> Of *Alec
>>> *Sent:* Wednesday, August 16, 2023 3:33 PM
>>> *To:* gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>>> *Subject:* [gpfsug-discuss] RKM resilience questions testing and best
>>> practice
>>>
>>>
>>>
>>> Hello we are using a remote key server with GPFS I have two questions:
>>> First question: How can we verify that a key server is up and running when
>>> there are multiple key servers in an rkm pool serving a single key. The
>>> scenario is after maintenance
>>>
>>> Hello we are using a remote key server with GPFS I have two questions:
>>>
>>>
>>>
>>> First question:
>>>
>>> How can we verify that a key server is up and running when there are
>>> multiple key servers in an rkm pool serving a single key.
>>>
>>>
>>>
>>> The scenario is after maintenance or periodically we want to verify that
>>> all member of the pool are in service.
>>>
>>>
>>>
>>> Second question is:
>>>
>>> Is there any documentation or diagram officially from IBM that
>>> recommends having 2 keys from independent RKM environments for high
>>> availability as best practice that I could refer to?
>>>
>>>
>>>
>>> Alec
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
>>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
>>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20230817/bafb9bfd/attachment.htm>


More information about the gpfsug-discuss mailing list