[gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?

Jonathan Buzzard jonathan at buzzard.me.uk
Tue Jul 25 12:22:49 BST 2017


On Tue, 2017-07-25 at 09:30 +0000, John Hearns wrote:
> I agree with Jonathan.
>
> In my experience, if you look at why there are many small files being
> stored by researchers, these are either the results of data acquisition
> - high speed cameras, microscopes, or in my experience a wind tunnel.
> Or the images are a sequence of images produced by a simulation which
> are later post-processed into a movie or Ensight/Paraview format. When
> questioned, the resaechers will always say "but I would like to keep
> this data available just in case". In reality those files are never
> looked at again. And as has been said if you have a tape based
> archiving system you could end up with thousands of small files being
> spread all over your tapes.  So it is legitimate to make zips / tars of
> directories like that.
> 

Note that rules on data retention may require them to keep them for 10
years, so it is not unreasonable. Letting them spew thousands of files
into an "archive" is not sensible.

I was thinking of ways of getting the users to do it, and I guess
leaving them with zero available file number quota in the new system
would force them to zip up their data so they could add new stuff ;-)
Archives in my view should have no quota on the space, only quota's on
the number of files.

Of course that might not be very popular.

On reflection I think I would use a policy to restrict to files ending
with .zip/.ZIP only. It's an archive and this format is effectively open
source, widely understood and cross platform, and with the ZIP64 version
will now stand the test of time too.

Given it's an archive I would have a script that ran around setting all
the files to immutable 7 days after creation too. Or maybe change the
ownership and set a readonly ACL to the original user. Need to stop them
changing stuff after the event if you are going to use to as part of
your anti research fraud measures.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.




More information about the gpfsug-discuss mailing list