[gpfsug-discuss] Write performances and filesystem size

Wed Nov 15 16:56:36 GMT 2017

Hi,

>>Am I missing something? Is this an expected behaviour and someone has an 
explanation for this?

Based on your scenario, write degradation as the file-system is populated 
is possible if you had formatted the file-system with "-j cluster". 

For consistent file-system performance, we recommend mmcrfs "-j scatter" 
layoutMap.   Also, we need to ensure the mmcrfs "-n"  is set properly.

[snip from mmcrfs]
# mmlsfs <fs> | egrep 'Block allocation| Estimated number'
 -j                 scatter                  Block allocation type
 -n                 128                       Estimated number of nodes 
that will mount file system
[/snip]

[snip from man mmcrfs]
 layoutMap={scatter | cluster}
                  Specifies the block allocation map type. When
                  allocating blocks for a given file, GPFS first
                  uses a round‐robin algorithm to spread the data
                  across all disks in the storage pool. After a
                  disk is selected, the location of the data
                  block on the disk is determined by the block
                  allocation map type. If cluster is
                  specified, GPFS attempts to allocate blocks in
                  clusters. Blocks that belong to a particular
                  file are kept adjacent to each other within
                  each cluster. If scatter is specified,
                  the location of the block is chosen randomly.

                 The cluster allocation method may provide
                  better disk performance for some disk
                  subsystems in relatively small installations.
                  The benefits of clustered block allocation
                  diminish when the number of nodes in the
                  cluster or the number of disks in a file system
                  increases, or when the file system’s free space
                  becomes fragmented. The cluster
                  allocation method is the default for GPFS
                  clusters with eight or fewer nodes and for file
                  systems with eight or fewer disks.

                 The scatter allocation method provides
                  more consistent file system performance by
                  averaging out performance variations due to
                  block location (for many disk subsystems, the
                  location of the data relative to the disk edge
                  has a substantial effect on performance). This
                  allocation method is appropriate in most cases
                  and is the default for GPFS clusters with more
                  than eight nodes or file systems with more than
                  eight disks.

                  The block allocation map type cannot be changed
                  after the storage pool has been created.

-n NumNodes
         The estimated number of nodes that will mount the file
         system in the local cluster and all remote clusters.
         This is used as a best guess for the initial size of
         some file system data structures. The default is 32.
         This value can be changed after the file system has been
         created but it does not change the existing data
         structures. Only the newly created data structure is
         affected by the new value. For example, new storage
         pool.

         When you create a GPFS file system, you might want to
         overestimate the number of nodes that will mount the
         file system. GPFS uses this information for creating
         data structures that are essential for achieving maximum
         parallelism in file system operations (For more
         information, see GPFS architecture in IBM Spectrum
         Scale: Concepts, Planning, and Installation Guide ). If
         you are sure there will never be more than 64 nodes,
         allow the default value to be applied. If you are
         planning to add nodes to your system, you should specify
         a number larger than the default.

[/snip from man mmcrfs]

Regards,
-Kums

From:   Ivano Talamo <Ivano.Talamo at psi.ch>
To:     <gpfsug-discuss at spectrumscale.org>
Date:   11/15/2017 11:25 AM
Subject:        [gpfsug-discuss] Write performances and filesystem size
Sent by:        gpfsug-discuss-bounces at spectrumscale.org

Hello everybody,

together with my colleagues we are actually running some tests on a new 
DSS G220 system and we see some unexpected behaviour.

What we actually see is that write performances (we did not test read 
yet) decreases with the decrease of filesystem size.

I will not go into the details of the tests, but here are some numbers:

- with a filesystem using the full 1.2 PB space we get 14 GB/s as the 
sum of the disk activity on the two IO servers;
- with a filesystem using half of the space we get 10 GB/s;
- with a filesystem using 1/4 of the space we get 5 GB/s.

We also saw that performances are not affected by the vdisks layout, ie. 
taking the full space with one big vdisk or 2 half-size vdisks per RG 
gives the same performances.

To our understanding the IO should be spread evenly across all the 
pdisks in the declustered array, and looking at iostat all disks seem to 
be accessed. But so there must be some other element that affects 
performances.

Am I missing something? Is this an expected behaviour and someone has an 
explanation for this?

Thank you,
Ivano
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171115/c3b2ad93/attachment-0002.htm>