[gpfsug-discuss] fast search for archivable data sets

Alex Chekholko alex at calicolabs.com
Sat Apr 4 00:50:50 BST 2020


Hi Jim,

The common non-GPFS-specific way is to use a tool that dumps all of your
filesystem metadata into an SQL database and then you can have a webapp
that makes nice graphs/reports from the SQL database, or do your own
queries.

The Free Software example is "Robinhood" (use the POSIX scanner, not the
lustre-specific one) and one proprietary example is Starfish.

In both cases, you need a pretty beefy machine for the DB and the scanning
of your filesystem may take a long time, depending on your filesystem
performance.  And then without any filesystem-specific hooks like a
transaction log, you'll need to rescan the entire filesystem to update your
db.

Regards,
Alex

On Fri, Apr 3, 2020 at 3:25 PM Jim Kavitsky <jkavitsky at 23andme.com> wrote:

> Hello everyone,
> I'm managing a low-multi-petabyte Scale filesystem with hundreds of
> millions of inodes, and I'm looking for the best way to locate archivable
> directories. For example, these might be directories where whose contents
> were greater than 5 or 10TB, and whose contents had atimes greater than two
> years.
>
> Has anyone found a great way to do this with a policy engine run? If not,
> is there another good way that anyone would recommend? Thanks in advance,
>
> Jim Kavitsky
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200403/e7851160/attachment-0002.htm>


More information about the gpfsug-discuss mailing list