[gpfsug-discuss] Introduction/Question

Chase, Peter peter.chase at metoffice.gov.uk
Mon Nov 6 09:20:11 GMT 2017


Hello to all!

I'm pleased to have joined the GPFS UG mailing list, I'm experimenting with GPFS on zLinux running in z/VM on a z13 mainframe. I work for the UK Met Office in the GPCS team (general purpose compute service/mainframe team) and I'm based in Exeter, Devon.

I've joined with a specific question to ask, in short: how can I automate sending files to a cloud object store as they arrive in GPFS and keep a copy of the file in GPFS?

The longer spiel is this: We have a HPC that throws out a lot of NetCDF files via FTP for use in forecasts. We're currently undergoing a change in working practice, so that data processing is beginning to be done in the cloud.  At the same time we're also attempting to de-duplicate the data being sent from the HPC by creating one space to receive it and then have consumers use it or send it on as necessary from there. The data is in terabytes a day sizes, and the timeliness of it's arrival to systems is fairly important (forecasts cease to be forecasts if they're too late).

We're using zLinux because the mainframe already receives much of the data from the HPC and has access to a SAN with SSD storage, has the right network connections it needs and generally seems the least amount of work to put something in place.

Getting a supported clustered filesystem on zLinux is tricky, but GPFS fits the bill and having hardware, storage, OS and filesystem from one provider (IBM) should hopefully save some headaches.

We're using Amazon as our cloud provider, and have 2x10GB direct links to their London data centre with a ping of about 15ms, so fairly low latency. The developers using the data want it in s3 so they can access it from server-less environments and won't need to have ec2 instances loitering to look after the data.

We were initially interested in using mmcloudgateway/cloud data sharing to send the data, but it's not available for s390x (only x86_64), so I'm now looking at setting up a external storage pool for talking to s3 and then having some kind of ilm soft quota trigger to send the data once enough of it has arrived, but I'm still exploring options. Options such as asking the user group of experienced folks what they think is best!

So, any help or advice would be greatly appreciated!

Regards,

Peter Chase
GPCS Team
Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
Email: peter.chase at metoffice.gov.uk<mailto:peter.chase at metoffice.gov.uk> Website: www.metoffice.gov.uk<http://www.metoffice.gov.uk/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171106/9f0a4489/attachment-0001.htm>


More information about the gpfsug-discuss mailing list