[gpfsug-discuss] Introduction/Question

Mon Nov 6 09:37:15 GMT 2017

Peter,
Welcome to the mailing list!

Can I summarise in saying that you are looking for a way for GPFS to recognise that a file has just arrived in the filesystem (via FTP) and so trigger an action, in this case to trigger to push to Amazon S3 ?

I think that you also have a second question about coping with the restrictions on GPFS on zLinux?
ie CES is not supported and hence TCT isn’t either. 

Looking at the docs, there appears to be many restrictions on TCT for MultiCluster, AFM, Heterogeneous setups,  DMAPI tape tiers, etc.
So my question to add is; what success have people had in using a TCT in more than the simplest use case of a single small isolated x86 cluster?

Daniel

Dr Daniel Kidger 
IBM Technical Sales Specialist
Software Defined Solution Sales

+ 44-(0)7818 522 266 
daniel.kidger at uk.ibm.com

> On 6 Nov 2017, at 09:20, Chase, Peter <peter.chase at metoffice.gov.uk> wrote:
> 
> Hello to all!
>  
> I’m pleased to have joined the GPFS UG mailing list, I’m experimenting with GPFS on zLinux running in z/VM on a z13 mainframe. I work for the UK Met Office in the GPCS team (general purpose compute service/mainframe team) and I’m based in Exeter, Devon.
>  
> I’ve joined with a specific question to ask, in short: how can I automate sending files to a cloud object store as they arrive in GPFS and keep a copy of the file in GPFS?
>  
> The longer spiel is this: We have a HPC that throws out a lot of NetCDF files via FTP for use in forecasts. We’re currently undergoing a change in working practice, so that data processing is beginning to be done in the cloud.  At the same time we’re also attempting to de-duplicate the data being sent from the HPC by creating one space to receive it and then have consumers use it or send it on as necessary from there. The data is in terabytes a day sizes, and the timeliness of it’s arrival to systems is fairly important (forecasts cease to be forecasts if they’re too late).
>  
> We’re using zLinux because the mainframe already receives much of the data from the HPC and has access to a SAN with SSD storage, has the right network connections it needs and generally seems the least amount of work to put something in place.
>  
> Getting a supported clustered filesystem on zLinux is tricky, but GPFS fits the bill and having hardware, storage, OS and filesystem from one provider (IBM) should hopefully save some headaches.
>  
> We’re using Amazon as our cloud provider, and have 2x10GB direct links to their London data centre with a ping of about 15ms, so fairly low latency. The developers using the data want it in s3 so they can access it from server-less environments and won’t need to have ec2 instances loitering to look after the data.
>  
> We were initially interested in using mmcloudgateway/cloud data sharing to send the data, but it’s not available for s390x (only x86_64), so I’m now looking at setting up a external storage pool for talking to s3 and then having some kind of ilm soft quota trigger to send the data once enough of it has arrived, but I’m still exploring options. Options such as asking the user group of experienced folks what they think is best!
>  
> So, any help or advice would be greatly appreciated!
>  
> Regards,
>  
> Peter Chase
> GPCS Team
> Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> Email: peter.chase at metoffice.gov.uk Website: www.metoffice.gov.uk
>  
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171106/e25ef9ea/attachment-0002.htm>