[gpfsug-discuss] Write performances and filesystem size

Daniel Kidger daniel.kidger at uk.ibm.com
Wed Nov 15 23:48:18 GMT 2017


My 2c ...
Be careful here about mixing up three different possible effects seen in filesystems

1. Performance degradation as the filesystem approaches 100% full, often due to the difficulty of finding the remaining unallocated blocks.
GPFS doesn’t noticeably suffer from this effect compared to its competitors.

2. Performance degradation over time as files get fragmented and so cause extra movement of the actuator arm of a HDD. (hence defrag on Windows and the idea of short stroking drives).

3. Performance degradation as blocks are written further from the fastest part of a hard disk drive. SSDs do not show this effect. 


Benchmarks on newly formatted empty filesystems are often artificially high compared to performance after say 12 months whether or not the filesystem is near 90%+ capacity utilisation. The -j scatter option allows for more realistic performance measurement when designing for the long term usage of the filesystem. But this is due to the distributed location of the blocks not how full the filesystem is.



Daniel


 

 
 	
Dr Daniel Kidger 
IBM Technical Sales Specialist
Software Defined Solution Sales

+ 44-(0)7818 522 266 
daniel.kidger at uk.ibm.com

> On 15 Nov 2017, at 11:26, Olaf Weiser <olaf.weiser at de.ibm.com> wrote:
> 
>  to add a comment ...  .. very simply... depending on how you allocate the physical block storage .... if you - simply - using less physical resources when reducing the capacity (in the same ratio) .. you get , what you see.... 
> 
> so you need to tell us, how you allocate your block-storage .. (Do you using RAID controllers , where are your LUNs coming from, are then less RAID groups involved, when reducing the capacity ?...) 
> 
> GPFS can be configured to give you pretty as much as what the hardware can deliver.. if you reduce resource.. ... you'll get less , if you enhance your hardware .. you get more... almost regardless of the total capacity in #blocks .. 
> 
> 
> 
> 
> 
> 
> From:        "Kumaran Rajaram" <kums at us.ibm.com>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        11/15/2017 11:56 AM
> Subject:        Re: [gpfsug-discuss] Write performances and filesystem size
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> Hi,
> 
> >>Am I missing something? Is this an expected behaviour and someone has an explanation for this?
> 
> Based on your scenario, write degradation as the file-system is populated is possible if you had formatted the file-system with "-j cluster". 
> 
> For consistent file-system performance, we recommend mmcrfs "-j scatter" layoutMap.   Also, we need to ensure the mmcrfs "-n"  is set properly.
> 
> [snip from mmcrfs]
> # mmlsfs <fs> | egrep 'Block allocation| Estimated number'
> -j                 scatter                  Block allocation type
> -n                 128                       Estimated number of nodes that will mount file system
> [/snip]
> 
> 
> [snip from man mmcrfs]
> layoutMap={scatter| cluster}
>                  Specifies the block allocation map type. When
>                  allocating blocks for a given file, GPFS first
>                  uses a round‐robin algorithm to spread the data
>                  across all disks in the storage pool. After a
>                  disk is selected, the location of the data
>                  block on the disk is determined by the block
>                  allocation map type. If cluster is
>                  specified, GPFS attempts to allocate blocks in
>                  clusters. Blocks that belong to a particular
>                  file are kept adjacent to each other within
>                  each cluster. If scatter is specified,
>                  the location of the block is chosen randomly.
> 
>                 The cluster allocation method may provide
>                  better disk performance for some disk
>                  subsystems in relatively small installations.
>                  The benefits of clustered block allocation
>                  diminish when the number of nodes in the
>                  cluster or the number of disks in a file system
>                  increases, or when the file system’s free space
>                  becomes fragmented. The cluster
>                  allocation method is the default for GPFS
>                  clusters with eight or fewer nodes and for file
>                  systems with eight or fewer disks.
> 
>                 The scatter allocation method provides
>                  more consistent file system performance by
>                  averaging out performance variations due to
>                  block location (for many disk subsystems, the
>                  location of the data relative to the disk edge
>                  has a substantial effect on performance).This
>                  allocation method is appropriate in most cases
>                  and is the default for GPFS clusters with more
>                  than eight nodes or file systems with more than
>                  eight disks.
> 
>                  The block allocation map type cannot be changed
>                  after the storage pool has been created.
> 
> 
> -n NumNodes
>         The estimated number of nodes that will mount the file
>         system in the local cluster and all remote clusters.
>         This is used as a best guess for the initial size of
>         some file system data structures. The default is 32.
>         This value can be changed after the file system has been
>         created but it does not change the existing data
>         structures. Only the newly created data structure is
>         affected by the new value. For example, new storage
>         pool.
> 
>         When you create a GPFS file system, you might want to
>         overestimate the number of nodes that will mount the
>         file system. GPFS uses this information for creating
>         data structures that are essential for achieving maximum
>         parallelism in file system operations (For more
>         information, see GPFS architecture in IBM Spectrum
>         Scale: Concepts, Planning, and Installation Guide ). If
>         you are sure there will never be more than 64 nodes,
>         allow the default value to be applied. If you are
>         planning to add nodes to your system, you should specify
>         a number larger than the default.
> 
> [/snip from man mmcrfs]
> 
> Regards,
> -Kums
> 
> 
> 
> 
> 
> From:        Ivano Talamo <Ivano.Talamo at psi.ch>
> To:        <gpfsug-discuss at spectrumscale.org>
> Date:        11/15/2017 11:25 AM
> Subject:        [gpfsug-discuss] Write performances and filesystem size
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> Hello everybody,
> 
> together with my colleagues we are actually running some tests on a new 
> DSS G220 system and we see some unexpected behaviour.
> 
> What we actually see is that write performances (we did not test read 
> yet) decreases with the decrease of filesystem size.
> 
> I will not go into the details of the tests, but here are some numbers:
> 
> - with a filesystem using the full 1.2 PB space we get 14 GB/s as the 
> sum of the disk activity on the two IO servers;
> - with a filesystem using half of the space we get 10 GB/s;
> - with a filesystem using 1/4 of the space we get 5 GB/s.
> 
> We also saw that performances are not affected by the vdisks layout, ie. 
> taking the full space with one big vdisk or 2 half-size vdisks per RG 
> gives the same performances.
> 
> To our understanding the IO should be spread evenly across all the 
> pdisks in the declustered array, and looking at iostat all disks seem to 
> be accessed. But so there must be some other element that affects 
> performances.
> 
> Am I missing something? Is this an expected behaviour and someone has an 
> explanation for this?
> 
> Thank you,
> Ivano
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=Yu5Gt0RPmbb6KaS_emGivhq5C2A33w5DeecdU2aLViQ&s=K0Mz-y4oBH66YUf1syIXaQ3hxck6WjeEMsM-HNHhqAU&e=
> 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171115/43febefb/attachment-0002.htm>


More information about the gpfsug-discuss mailing list