<font size=2 face="sans-serif">Hi Ivano, </font><br><font size=2 face="sans-serif">so from this output, the performance

degradation is not explainable .. </font><br><font size=2 face="sans-serif">in my current environments.. , having

multiple file systems (so vdisks on one BB) .. and it works fine .. </font><br><br><font size=2 face="sans-serif"> as said .. just open a PMR.. I

would'nt consider this as the "expected behavior" </font><br><div><font size=2 face="sans-serif">the only thing is.. the MD disks are

a bit small.. so maybe redo your tests and for a simple compare between

1/2 1/1 or 1/4 capacity  test with 2 vdisks only and <i>dataAndMetadata</i></font><br><font size=2 face="sans-serif">cheers</font><br><br><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Ivano Talamo <Ivano.Talamo@psi.ch></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">gpfsug main discussion

list <gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">11/16/2017 08:52 AM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [gpfsug-discuss]

Write performances and filesystem size</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:    

   </font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><tt><font size=2>Hi,<br><br>as additional information I past the recovery group information in the

<br>full and half size cases.<br>In both cases:<br>- data is on sf_g_01_vdisk01<br>- metadata on sf_g_01_vdisk02<br>- sf_g_01_vdisk07 is not used in the filesystem.<br><br>This is with the full-space filesystem:<br><br>                    

declustered                  

  current       allowable<br>  recovery group       arrays     vdisks

 pdisks  format version format <br>version<br>  -----------------  -----------  ------  ------  --------------

<br>--------------<br>  sf-g-01                

     3       6      86  4.2.2.0

       4.2.2.0 <br><br><br>  declustered   needs            

               replace <br>scrub       background activity<br>     array     service  vdisks  pdisks  spares

 threshold  free space <br>duration  task   progress  priority<br>  -----------  -------  ------  ------  ------

 ---------  ---------- <br>--------  -------------------------<br>  NVR          no        

   1       2     0,0      

   1    3632 MiB <br>14 days  scrub       95%  low<br>  DA1          no        

   4      83    2,44      

   1      57 TiB <br>14 days  scrub        0%  low<br>  SSD          no        

   1       1     0,0      

   1     372 GiB <br>14 days  scrub       79%  low<br><br>                    

                     declustered

<br>        checksum<br>  vdisk               RAID code

             array     vdisk

size  block <br>size  granularity  state remarks<br>  ------------------  ------------------  -----------  ----------

<br>----------  -----------  ----- -------<br>  sf_g_01_logTip      2WayReplication    

NVR              48 MiB    

 2 <br>MiB      4096      ok    logTip<br>  sf_g_01_logTipBackup  Unreplicated        SSD

             48 MiB <br>2 MiB      4096      ok    logTipBackup<br>  sf_g_01_logHome     4WayReplication     DA1

            144 GiB      2

<br>MiB      4096      ok    log<br>  sf_g_01_vdisk02     3WayReplication     DA1

            103 GiB      1

<br>MiB     32 KiB     ok<br>  sf_g_01_vdisk07     3WayReplication     DA1

            103 GiB      1

<br>MiB     32 KiB     ok<br>  sf_g_01_vdisk01     8+2p          

     DA1             540 TiB

    16 <br>MiB     32 KiB     ok<br><br>  config data         declustered array  

spare space    remarks<br>  ------------------  ------------------  -------------

 -------<br>  rebuild space       DA1        

        53 pdisk <br>     increasing VCD spares is suggested<br><br>  config data         disk group fault tolerance

        remarks<br>  ------------------  ---------------------------------  -------<br>  rg descriptor       1 enclosure + 1 drawer + 2 pdisk

  limited by <br>rebuild space<br>  system index        1 enclosure + 1 drawer +

2 pdisk   limited by <br>rebuild space<br><br>  vdisk               disk group

fault tolerance         remarks<br>  ------------------  ---------------------------------  -------<br>  sf_g_01_logTip      1 pdisk<br>  sf_g_01_logTipBackup  0 pdisk<br>  sf_g_01_logHome     1 enclosure + 1 drawer + 1 pdisk  

limited by <br>rebuild space<br>  sf_g_01_vdisk02     1 enclosure + 1 drawer    

        limited by <br>rebuild space<br>  sf_g_01_vdisk07     1 enclosure + 1 drawer    

        limited by <br>rebuild space<br>  sf_g_01_vdisk01     2 pdisk<br><br><br>This is with the half-space filesystem:<br><br>                    

declustered                  

  current       allowable<br>  recovery group       arrays     vdisks

 pdisks  format version format <br>version<br>  -----------------  -----------  ------  ------  --------------

<br>--------------<br>  sf-g-01                

     3       6      86  4.2.2.0

       4.2.2.0 <br><br><br>  declustered   needs            

               replace <br>scrub       background activity<br>     array     service  vdisks  pdisks  spares

 threshold  free space <br>duration  task   progress  priority<br>  -----------  -------  ------  ------  ------

 ---------  ---------- <br>--------  -------------------------<br>  NVR          no        

   1       2     0,0      

   1    3632 MiB <br>14 days  scrub        4%  low<br>  DA1          no        

   4      83    2,44      

   1     395 TiB <br>14 days  scrub        0%  low<br>  SSD          no        

   1       1     0,0      

   1     372 GiB <br>14 days  scrub       79%  low<br><br>                    

                     declustered

<br>        checksum<br>  vdisk               RAID code

             array     vdisk

size  block <br>size  granularity  state remarks<br>  ------------------  ------------------  -----------  ----------

<br>----------  -----------  ----- -------<br>  sf_g_01_logTip      2WayReplication    

NVR              48 MiB    

 2 <br>MiB      4096      ok    logTip<br>  sf_g_01_logTipBackup  Unreplicated        SSD

             48 MiB <br>2 MiB      4096      ok    logTipBackup<br>  sf_g_01_logHome     4WayReplication     DA1

            144 GiB      2

<br>MiB      4096      ok    log<br>  sf_g_01_vdisk02     3WayReplication     DA1

            103 GiB      1

<br>MiB     32 KiB     ok<br>  sf_g_01_vdisk07     3WayReplication     DA1

            103 GiB      1

<br>MiB     32 KiB     ok<br>  sf_g_01_vdisk01     8+2p          

     DA1             270 TiB

    16 <br>MiB     32 KiB     ok<br><br>  config data         declustered array  

spare space    remarks<br>  ------------------  ------------------  -------------

 -------<br>  rebuild space       DA1        

        68 pdisk <br>     increasing VCD spares is suggested<br><br>  config data         disk group fault tolerance

        remarks<br>  ------------------  ---------------------------------  -------<br>  rg descriptor       1 node + 3 pdisk    

              limited by <br>rebuild space<br>  system index        1 node + 3 pdisk  

                limited by <br>rebuild space<br><br>  vdisk               disk group

fault tolerance         remarks<br>  ------------------  ---------------------------------  -------<br>  sf_g_01_logTip      1 pdisk<br>  sf_g_01_logTipBackup  0 pdisk<br>  sf_g_01_logHome     1 node + 2 pdisk      

            limited by <br>rebuild space<br>  sf_g_01_vdisk02     1 node + 1 pdisk      

            limited by <br>rebuild space<br>  sf_g_01_vdisk07     1 node + 1 pdisk      

            limited by <br>rebuild space<br>  sf_g_01_vdisk01     2 pdisk<br><br><br>Thanks,<br>Ivano<br><br><br><br><br>Il 16/11/17 13:03, Olaf Weiser ha scritto:<br>> Rjx, that makes it a bit clearer.. as  your vdisk  is big

enough to span<br>> over all pdisks  in each of your test 1/1 or 1/2 or 1/4  of

capacity...<br>> should bring the same performance. ..<br>><br>> You mean something about vdisk Layout. ..<br>> So in your test,  for the full capacity test, you use just one

vdisk per<br>> RG - so 2 in total for 'data' - right?<br>><br>> What about Md .. did you create separate vdisk for MD  / what

size then<br>> ?<br>><br>> Gesendet von IBM Verse<br>><br>> Ivano Talamo --- Re: [gpfsug-discuss] Write performances and filesystem<br>> size ---<br>><br>> Von:                

"Ivano Talamo" <Ivano.Talamo@psi.ch><br>> An:                

"gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org><br>> Datum:                

Do. 16.11.2017 03:49<br>> Betreff:              

  Re: [gpfsug-discuss] Write performances and filesystem size<br>><br>> ------------------------------------------------------------------------<br>><br>> Hello Olaf,<br>><br>> yes, I confirm that is the Lenovo version of the ESS GL2, so 2<br>> enclosures/4 drawers/166 disks in total.<br>><br>> Each recovery group has one declustered array with all disks inside,

so<br>> vdisks use all the physical ones, even in the case of a vdisk that

is<br>> 1/4 of the total size.<br>><br>> Regarding the layout allocation we used scatter.<br>><br>> The tests were done on the just created filesystem, so no close-to-full<br>> effect. And we run gpfsperf write seq.<br>><br>> Thanks,<br>> Ivano<br>><br>><br>> Il 16/11/17 04:42, Olaf Weiser ha scritto:<br>>> Sure... as long we assume that really all physical disk are used

.. the<br>>> fact that  was told 1/2  or 1/4  might turn out

that one / two complet<br>>> enclosures 're eliminated ... ?  ..that s why I was asking

for  more<br>>> details ..<br>>><br>>> I dont see this degration in my environments. . as long the vdisks

are<br>>> big enough to span over all pdisks ( which should be the case

for<br>>> capacity in a range of TB ) ... the performance stays the same<br>>><br>>> Gesendet von IBM Verse<br>>><br>>> Jan-Frode Myklebust --- Re: [gpfsug-discuss] Write performances

and<br>>> filesystem size ---<br>>><br>>> Von:    "Jan-Frode Myklebust" <janfrode@tanso.net><br>>> An:    "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org><br>>> Datum:    Mi. 15.11.2017 21:35<br>>> Betreff:    Re: [gpfsug-discuss] Write performances

and filesystem size<br>>><br>>> ------------------------------------------------------------------------<br>>><br>>> Olaf, this looks like a Lenovo �ESS GLxS� version. Should be using

same<br>>> number of spindles for any size filesystem, so I would also expect

them<br>>> to perform the same.<br>>><br>>><br>>><br>>> -jf<br>>><br>>><br>>> ons. 15. nov. 2017 kl. 11:26 skrev Olaf Weiser <olaf.weiser@de.ibm.com<br>>> <</font></tt><a href=mailto:olaf.weiser@de.ibm.com><tt><font size=2>mailto:olaf.weiser@de.ibm.com</font></tt></a><tt><font size=2>>>:<br>>><br>>>      to add a comment ...  .. very simply...

depending on how you<br>>>     allocate the physical block storage .... if you

- simply - using<br>>>     less physical resources when reducing the capacity

(in the same<br>>>     ratio) .. you get , what you see....<br>>><br>>>     so you need to tell us, how you allocate your block-storage

.. (Do<br>>>     you using RAID controllers , where are your LUNs

coming from, are<br>>>     then less RAID groups involved, when reducing the

capacity ?...)<br>>><br>>>     GPFS can be configured to give you pretty as much

as what the<br>>>     hardware can deliver.. if you reduce resource..

... you'll get less<br>>>     , if you enhance your hardware .. you get more...

almost regardless<br>>>     of the total capacity in #blocks ..<br>>><br>>><br>>><br>>><br>>><br>>><br>>>     From:        "Kumaran Rajaram"

<kums@us.ibm.com<br>>>     <</font></tt><a href=mailto:kums@us.ibm.com><tt><font size=2>mailto:kums@us.ibm.com</font></tt></a><tt><font size=2>>><br>>>     To:        gpfsug main discussion

list<br>>>     <gpfsug-discuss@spectrumscale.org<br>>>     <</font></tt><a href="mailto:gpfsug-discuss@spectrumscale.org"><tt><font size=2>mailto:gpfsug-discuss@spectrumscale.org</font></tt></a><tt><font size=2>>><br>>>     Date:        11/15/2017 11:56

AM<br>>>     Subject:        Re: [gpfsug-discuss]

Write performances and<br>>>     filesystem size<br>>>     Sent by:        gpfsug-discuss-bounces@spectrumscale.org<br>>>     <</font></tt><a href="mailto:gpfsug-discuss-bounces@spectrumscale.org"><tt><font size=2>mailto:gpfsug-discuss-bounces@spectrumscale.org</font></tt></a><tt><font size=2>><br>>><br>> ------------------------------------------------------------------------<br>>><br>>><br>>><br>>>     Hi,<br>>><br>>>     >>Am I missing something? Is this an expected

behaviour and someone<br>>>     has an explanation for this?<br>>><br>>>     Based on your scenario, write degradation as the

file-system is<br>>>     populated is possible if you had formatted the file-system

with "-j<br>>>     cluster".<br>>><br>>>     For consistent file-system performance, we recommend

*mmcrfs "-j<br>>>     scatter" layoutMap.*   Also, we need to

ensure the mmcrfs "-n"  is<br>>>     set properly.<br>>><br>>>     [snip from mmcrfs]/<br>>>     # mmlsfs <fs> | egrep 'Block allocation| Estimated

number'<br>>>     -j              

  scatter                

 Block allocation type<br>>>     -n              

  128                  

    Estimated number of<br>>>     nodes that will mount file system/<br>>>     [/snip]<br>>><br>>><br>>>     [snip from man mmcrfs]/<br>>>     *layoutMap={scatter|*//*cluster}*//<br>>>                  

   Specifies the block allocation map type. When<br>>>                  

   allocating blocks for a given file, GPFS first<br>>>                  

   uses a round‐robin algorithm to spread the data<br>>>                  

   across all disks in the storage pool. After a<br>>>                  

   disk is selected, the location of the data<br>>>                  

   block on the disk is determined by the block<br>>>                  

   allocation map type*. If cluster is<br>>>                  

   specified, GPFS attempts to allocate blocks in<br>>>                  

   clusters. Blocks that belong to a particular<br>>>                  

   file are kept adjacent to each other within<br>>>                  

   each cluster. If scatter is specified,<br>>>                  

   the location of the block is chosen randomly.*/<br>>>     /<br>>>                  *

 The cluster allocation method may provide<br>>>                  

   better disk performance for some disk<br>>>                  

   subsystems in relatively small installations.<br>>>                  

   The benefits of clustered block allocation<br>>>                  

   diminish when the number of nodes in the<br>>>                  

   cluster or the number of disks in a file system<br>>>                  

   increases, or when the file system’s free space<br>>>                  

   becomes fragmented. *//The *cluster*//<br>>>                  

   allocation method is the default for GPFS<br>>>                  

   clusters with eight or fewer nodes and for file<br>>>                  

   systems with eight or fewer disks./<br>>>     /<br>>>                  

  *The scatter allocation method provides<br>>>                  

   more consistent file system performance by<br>>>                  

   averaging out performance variations due to<br>>>                  

   block location (for many disk subsystems, the<br>>>                  

   location of the data relative to the disk edge<br>>>                  

   has a substantial effect on performance).*//This<br>>>                  

   allocation method is appropriate in most cases<br>>>                  

   and is the default for GPFS clusters with more<br>>>                  

   than eight nodes or file systems with more than<br>>>                  

   eight disks./<br>>>     /<br>>>                  

   The block allocation map type cannot be changed<br>>>                  

   after the storage pool has been created./<br>>><br>>>     */<br>>>     -n/*/*NumNodes*//<br>>>             The estimated number

of nodes that will mount the file<br>>>             system in the local

cluster and all remote clusters.<br>>>             This is used as a best

guess for the initial size of<br>>>             some file system data

structures. The default is 32.<br>>>             This value can be changed

after the file system has been<br>>>             created but it does

not change the existing data<br>>>             structures. Only the

newly created data structure is<br>>>             affected by the new

value. For example, new storage<br>>>             pool./<br>>>     /<br>>>             When you create a GPFS

file system, you might want to<br>>>             overestimate the number

of nodes that will mount the<br>>>             file system. GPFS uses

this information for creating<br>>>             data structures that

are essential for achieving maximum<br>>>             parallelism in file

system operations (For more<br>>>             information, see GPFS

architecture in IBM Spectrum<br>>>             Scale: Concepts, Planning,

and Installation Guide ). If<br>>>             you are sure there will

never be more than 64 nodes,<br>>>             allow the default value

to be applied. If you are<br>>>             planning to add nodes

to your system, you should specify<br>>>             a number larger than

the default./<br>>><br>>>     [/snip from man mmcrfs]<br>>><br>>>     Regards,<br>>>     -Kums<br>>><br>>><br>>><br>>><br>>><br>>>     From:        Ivano Talamo <Ivano.Talamo@psi.ch<br>>>     <</font></tt><a href=mailto:Ivano.Talamo@psi.ch><tt><font size=2>mailto:Ivano.Talamo@psi.ch</font></tt></a><tt><font size=2>>><br>>>     To:        <gpfsug-discuss@spectrumscale.org<br>>>     <</font></tt><a href="mailto:gpfsug-discuss@spectrumscale.org"><tt><font size=2>mailto:gpfsug-discuss@spectrumscale.org</font></tt></a><tt><font size=2>>><br>>>     Date:        11/15/2017 11:25

AM<br>>>     Subject:        [gpfsug-discuss]

Write performances and filesystem<br>> size<br>>>     Sent by:        gpfsug-discuss-bounces@spectrumscale.org<br>>>     <</font></tt><a href="mailto:gpfsug-discuss-bounces@spectrumscale.org"><tt><font size=2>mailto:gpfsug-discuss-bounces@spectrumscale.org</font></tt></a><tt><font size=2>><br>>><br>> ------------------------------------------------------------------------<br>>><br>>><br>>><br>>>     Hello everybody,<br>>><br>>>     together with my colleagues we are actually running

some tests on<br>> a new<br>>>     DSS G220 system and we see some unexpected behaviour.<br>>><br>>>     What we actually see is that write performances

(we did not test read<br>>>     yet) decreases with the decrease of filesystem size.<br>>><br>>>     I will not go into the details of the tests, but

here are some<br>> numbers:<br>>><br>>>     - with a filesystem using the full 1.2 PB space

we get 14 GB/s as the<br>>>     sum of the disk activity on the two IO servers;<br>>>     - with a filesystem using half of the space we get

10 GB/s;<br>>>     - with a filesystem using 1/4 of the space we get

5 GB/s.<br>>><br>>>     We also saw that performances are not affected by

the vdisks layout,<br>>>     ie.<br>>>     taking the full space with one big vdisk or 2 half-size

vdisks per RG<br>>>     gives the same performances.<br>>><br>>>     To our understanding the IO should be spread evenly

across all the<br>>>     pdisks in the declustered array, and looking at

iostat all disks<br>>>     seem to<br>>>     be accessed. But so there must be some other element

that affects<br>>>     performances.<br>>><br>>>     Am I missing something? Is this an expected behaviour

and someone<br>>>     has an<br>>>     explanation for this?<br>>><br>>>     Thank you,<br>>>     Ivano<br>>>     _______________________________________________<br>>>     gpfsug-discuss mailing list<br>>>     gpfsug-discuss at spectrumscale.org <</font></tt><a href=http://spectrumscale.org/><tt><font size=2>http://spectrumscale.org</font></tt></a><tt><font size=2>>_<br>>><br>> __https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=_<br>>><br>>><br>>>     _______________________________________________<br>>>     gpfsug-discuss mailing list<br>>>     gpfsug-discuss at spectrumscale.org <</font></tt><a href=http://spectrumscale.org/><tt><font size=2>http://spectrumscale.org</font></tt></a><tt><font size=2>><br>>>     </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>>><br>>><br>>><br>>>     _______________________________________________<br>>>     gpfsug-discuss mailing list<br>>>     gpfsug-discuss at spectrumscale.org <</font></tt><a href=http://spectrumscale.org/><tt><font size=2>http://spectrumscale.org</font></tt></a><tt><font size=2>><br>>>     </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>>><br>>><br>>><br>>><br>>> _______________________________________________<br>>> gpfsug-discuss mailing list<br>>> gpfsug-discuss at spectrumscale.org<br>>> </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>>><br>> _______________________________________________<br>> gpfsug-discuss mailing list<br>> gpfsug-discuss at spectrumscale.org<br>> </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>><br>><br>><br>><br>> _______________________________________________<br>> gpfsug-discuss mailing list<br>> gpfsug-discuss at spectrumscale.org<br>> </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br></font></tt><br><br></div><BR>