[gpfsug-discuss] Online data migration tool

Thu Nov 30 22:02:35 GMT 2017

On 30/11/17 18:01, Skylar Thompson wrote:

[SNIP]

> To be fair, a lot of our biomedical/informatics folks have no choice in the
> matter because the vendors are imposing a filesystem-as-a-database paradigm
> on them. Each of our Illumina sequencers, for instance, generates a few
> million files per run, many of which are images containing raw data from
> the sequencers that are used to justify refunds for defective reagents.
> Sure, we could turn them off, but then we're eating $$$ we could be getting
> back from the vendor.
> 

Been there too. What worked was having a find script that ran through 
their files, found directories that had not been accessed for a week and 
zipped them all up, before nuking the original files.

The other thing I would suggest is if they want to buy sequencers from 
vendors who are brain dead, then that's fine but they are going to have 
to pay extra for the storage because they are costing way more than the 
average to store their files. Far to much buying of kit goes on without 
any thought of the consequences of how to deal with the data it generates.

Then there where the proteomics bunch who basically just needed a good 
thrashing with a very large clue stick, because the zillions of files 
where the result of their own Perl scripts.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG