[gpfsug-discuss] Metadata with GNR code

Jan-Frode Myklebust janfrode at tanso.net
Fri Sep 21 11:13:51 BST 2018


That reminds me of a point Sven made when I was trying to optimize mdtest
results with metadata on FlashSystem... He sent me the following:

-- started at 11/15/2015 15:20:39 --
mdtest-1.9.3 was launched with 138 total task(s) on 23 node(s)
Command line used: /ghome/oehmes/mpi/bin/mdtest-pcmpi9131-existingdir -d
/ibm/fs2-4m-02/shared/mdtest-ec -i 1 -n 70000 -F -i 1 -w 0 -Z -u
Path: /ibm/fs2-4m-02/
sharedFS: 32.0 TiB   Used FS: 6.7%   Inodes: 145.4 Mi   Used Inodes: 22.0%
138 tasks, 9660000 files
SUMMARY: (of 1 iterations)
   Operation                      Max            Min           Mean
Std Dev
   ---------                      ---            ---           ----
-------
File creation     :     650440.486     650440.486     650440.486
0.000
File stat         :   23599134.618   23599134.618   23599134.618
0.000
File read         :    2171391.097    2171391.097    2171391.097
0.000
File removal      :    1007566.981    1007566.981    1007566.981
0.000
Tree creation     :          3.072          3.072          3.072
0.000
Tree removal      :          1.471          1.471          1.471
0.000
-- finished at 11/15/2015 15:21:10 --

from a GL6 -- only spinning disks -- pointing out that mdtest doesn't
really require Flash/SSD. The key to good results are

a) large GPFS log ( mmchfs -L 128m)

b) high maxfilestocache (you need to be able to cache all entries , so for
10 million across 20 nodes you need to have at least 750k per node)

c) fast network, thats key to handle the token requests and metadata
operations that need to get over the network.



  -jf

On Fri, Sep 21, 2018 at 10:22 AM Olaf Weiser <olaf.weiser at de.ibm.com> wrote:

> see a mdtest for a default block size file system ...
> 4 MB blocksize..
> mdata is on SSD
> data is on HDD   ... which is not really relevant for this mdtest ;-)
>
>
> -- started at 09/07/2018 06:54:54 --
>
> mdtest-1.9.3 was launched with 40 total task(s) on 20 node(s)
> Command line used: mdtest -n 25000 -i 3 -u -d
> /homebrewed/gh24_4m_4m/mdtest
> Path: /homebrewed/gh24_4m_4m
> FS: 10.0 TiB   Used FS: 0.0%   Inodes: 12.0 Mi   Used Inodes: 2.3%
>
> 40 tasks, 1000000 files/directories
>
> SUMMARY: (of 3 iterations)
>   Operation                      Max            Min           Mean
>  Std Dev
>   ---------                      ---            ---           ----
>  -------
>   Directory creation:     449160.409     430869.822     437002.187
> 8597.272
>   Directory stat    :    6664420.560    5785712.544    6324276.731
> 385192.527
>   Directory removal :     398360.058     351503.369     371630.648
>  19690.580
>   File creation     :     288985.217     270550.129     279096.800
> 7585.659
>   File stat         :    6720685.117    6641301.499    6674123.407
>  33833.182
>   File read         :    3055661.372    2871044.881    2945513.966
>  79479.638
>   File removal      :     215187.602     146639.435     179898.441
>  28021.467
>   Tree creation     :         10.215          3.165          6.603
>  2.881
>   Tree removal      :          5.484          0.880          2.418
>  2.168
>
> -- finished at 09/07/2018 06:55:42 --
>
>
>
>
> Mit freundlichen Grüßen / Kind regards
>
>
> Olaf Weiser
>
> EMEA Storage Competence Center Mainz, German / IBM Systems, Storage
> Platform,
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland
> IBM Allee 1
> 71139 Ehningen
> Phone: +49-170-579-44-66
> E-Mail: olaf.weiser at de.ibm.com
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert
> Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
> HRB 14562 / WEEE-Reg.-Nr. DE 99369940
>
>
>
> From:        "Andrew Beattie" <abeattie at au1.ibm.com>
> To:        gpfsug-discuss at spectrumscale.org
> Date:        09/21/2018 02:34 AM
> Subject:        Re: [gpfsug-discuss] Metadata with GNR code
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Simon,
>
> My recommendation is still very much to use SSD for Metadata and NL-SAS
> for data and
> the GH14 / GH24 Building blocks certainly make this much easier.
>
> Unless your filesystem is massive (Summit sized) you will typically still
> continue to benefit from the Random IO performance of SSD (even RI SSD) in
> comparison to NL-SAS.
>
> It still makes more sense to me to continue to use 2 copy or 3 copy for
> Metadata even in ESS / GNR style environments.  The read performance for
> metadata using 3copy is still significantly better than any other scenario.
>
> As with anything there are exceptions to the rule, but my experiences with
> ESS and ESS with SSD so far still maintain that the standard thoughts on
> managing Metadata and Small file IO remain the same -- even with the
> improvements around sub blocks with Scale V5.
>
> MDtest is still the typical benchmark for this comparison and MDTest shows
> some very clear differences  even on SSD when you use a large filesystem
> block size with more sub blocks vs a smaller block size with 1/32 subblocks
>
> This only gets worse if you change the storage media from SSD to NL-SAS
> *Andrew Beattie*
> *Software Defined Storage  - IT Specialist*
> *Phone: *614-2133-7927
> *E-mail: **abeattie at au1.ibm.com* <abeattie at au1.ibm.com>
>
>
> ----- Original message -----
> From: Simon Thompson <S.J.Thompson at bham.ac.uk>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
> Cc:
> Subject: [gpfsug-discuss] Metadata with GNR code
> Date: Fri, Sep 21, 2018 3:29 AM
>
> Just wondering if anyone has any strong views/recommendations with
> metadata when using GNR code?
>
>
>
> I know in “san” based GPFS, there is a recommendation to have data and
> metadata split with the metadata on SSD.
>
>
>
> I’ve also heard that with GNR there isn’t much difference in splitting
> data and metadata.
>
>
>
> We’re looking at two systems and want to replicate metadata, but not data
> (mostly) between them, so I’m not really sure how we’d do this without
> having separate system pool (and then NSDs in different failure groups)….
>
>
>
> If we used 8+2P vdisks for metadata only, would we still see no difference
> in performance compared to mixed (I guess the 8+2P is still spread over a
> DA so we’d get half the drives in the GNR system active…).
>
>
>
> Or should we stick SSD based storage in as well for the metadata pool?
> (Which brings an interesting question about RAID code related to the recent
> discussions on mirroring vs RAID5…)
>
>
>
> Thoughts welcome!
>
>
>
> Simon
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180921/9a997782/attachment-0002.htm>


More information about the gpfsug-discuss mailing list