[gpfsug-discuss] Online data migration tool

Skylar Thompson skylar2 at u.washington.edu
Thu Nov 30 18:01:48 GMT 2017


On Thu, Nov 30, 2017 at 04:13:30PM +0000, Jonathan Buzzard wrote:
> On Wed, 2017-11-29 at 12:08 -0700, Nikhil Khandelwal wrote:
> 
> [SNIP]
> 
> > Since file systems created at 4.X.X and earlier used a block size
> > that kept this allocation in mind, there should be very little impact
> > on existing file systems.
> 
> That is quite a presumption. I would say that file systems created at
> 4.X.X and earlier potentially used a block size that was the best
> *compromise*, and the new options would work a lot better.
> 
> So for example supporting a larger block size for users who have sane
> workflows while still not wasting a ton of space for the biomedical
> folks who abuse the file system as a database.
> 
> Though I have come to the conclusion to stop them using the file system
> as a database (no don't do ls in that directory there is 200,000 files
> and takes minutes to come back) is to put your BOFH hat on quota them
> on maximum file numbers and suggest to them that they use a database
> even if it is just sticking it all in SQLite :-D

To be fair, a lot of our biomedical/informatics folks have no choice in the
matter because the vendors are imposing a filesystem-as-a-database paradigm
on them. Each of our Illumina sequencers, for instance, generates a few
million files per run, many of which are images containing raw data from
the sequencers that are used to justify refunds for defective reagents.
Sure, we could turn them off, but then we're eating $$$ we could be getting
back from the vendor.

At least SSD prices have come down far enough that we can put our metadata on
fast disks now, even if we can't take advantage of the more efficient small
file allocation yet.

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine



More information about the gpfsug-discuss mailing list