<font size=2 face="sans-serif">Hi Ivano, </font><br><font size=2 face="sans-serif">so from this output, the performance
degradation is not explainable .. </font><br><font size=2 face="sans-serif">in my current environments.. , having
multiple file systems (so vdisks on one BB) .. and it works fine .. </font><br><br><font size=2 face="sans-serif"> as said .. just open a PMR.. I
would'nt consider this as the "expected behavior" </font><br><div><font size=2 face="sans-serif">the only thing is.. the MD disks are
a bit small.. so maybe redo your tests and for a simple compare between
1/2 1/1 or 1/4 capacity test with 2 vdisks only and <i>dataAndMetadata</i></font><br><font size=2 face="sans-serif">cheers</font><br><br><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:
</font><font size=1 face="sans-serif">Ivano Talamo <Ivano.Talamo@psi.ch></font><br><font size=1 color=#5f5f5f face="sans-serif">To:
</font><font size=1 face="sans-serif">gpfsug main discussion
list <gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:
</font><font size=1 face="sans-serif">11/16/2017 08:52 AM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:
</font><font size=1 face="sans-serif">Re: [gpfsug-discuss]
Write performances and filesystem size</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:
</font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><tt><font size=2>Hi,<br><br>as additional information I past the recovery group information in the
<br>full and half size cases.<br>In both cases:<br>- data is on sf_g_01_vdisk01<br>- metadata on sf_g_01_vdisk02<br>- sf_g_01_vdisk07 is not used in the filesystem.<br><br>This is with the full-space filesystem:<br><br>
declustered
current allowable<br> recovery group arrays vdisks
pdisks format version format <br>version<br> ----------------- ----------- ------ ------ --------------
<br>--------------<br> sf-g-01
3 6 86 4.2.2.0
4.2.2.0 <br><br><br> declustered needs
replace <br>scrub background activity<br> array service vdisks pdisks spares
threshold free space <br>duration task progress priority<br> ----------- ------- ------ ------ ------
--------- ---------- <br>-------- -------------------------<br> NVR no
1 2 0,0
1 3632 MiB <br>14 days scrub 95% low<br> DA1 no
4 83 2,44
1 57 TiB <br>14 days scrub 0% low<br> SSD no
1 1 0,0
1 372 GiB <br>14 days scrub 79% low<br><br>
declustered
<br> checksum<br> vdisk RAID code
array vdisk
size block <br>size granularity state remarks<br> ------------------ ------------------ ----------- ----------
<br>---------- ----------- ----- -------<br> sf_g_01_logTip 2WayReplication
NVR 48 MiB
2 <br>MiB 4096 ok logTip<br> sf_g_01_logTipBackup Unreplicated SSD
48 MiB <br>2 MiB 4096 ok logTipBackup<br> sf_g_01_logHome 4WayReplication DA1
144 GiB 2
<br>MiB 4096 ok log<br> sf_g_01_vdisk02 3WayReplication DA1
103 GiB 1
<br>MiB 32 KiB ok<br> sf_g_01_vdisk07 3WayReplication DA1
103 GiB 1
<br>MiB 32 KiB ok<br> sf_g_01_vdisk01 8+2p
DA1 540 TiB
16 <br>MiB 32 KiB ok<br><br> config data declustered array
spare space remarks<br> ------------------ ------------------ -------------
-------<br> rebuild space DA1
53 pdisk <br> increasing VCD spares is suggested<br><br> config data disk group fault tolerance
remarks<br> ------------------ --------------------------------- -------<br> rg descriptor 1 enclosure + 1 drawer + 2 pdisk
limited by <br>rebuild space<br> system index 1 enclosure + 1 drawer +
2 pdisk limited by <br>rebuild space<br><br> vdisk disk group
fault tolerance remarks<br> ------------------ --------------------------------- -------<br> sf_g_01_logTip 1 pdisk<br> sf_g_01_logTipBackup 0 pdisk<br> sf_g_01_logHome 1 enclosure + 1 drawer + 1 pdisk
limited by <br>rebuild space<br> sf_g_01_vdisk02 1 enclosure + 1 drawer
limited by <br>rebuild space<br> sf_g_01_vdisk07 1 enclosure + 1 drawer
limited by <br>rebuild space<br> sf_g_01_vdisk01 2 pdisk<br><br><br>This is with the half-space filesystem:<br><br>
declustered
current allowable<br> recovery group arrays vdisks
pdisks format version format <br>version<br> ----------------- ----------- ------ ------ --------------
<br>--------------<br> sf-g-01
3 6 86 4.2.2.0
4.2.2.0 <br><br><br> declustered needs
replace <br>scrub background activity<br> array service vdisks pdisks spares
threshold free space <br>duration task progress priority<br> ----------- ------- ------ ------ ------
--------- ---------- <br>-------- -------------------------<br> NVR no
1 2 0,0
1 3632 MiB <br>14 days scrub 4% low<br> DA1 no
4 83 2,44
1 395 TiB <br>14 days scrub 0% low<br> SSD no
1 1 0,0
1 372 GiB <br>14 days scrub 79% low<br><br>
declustered
<br> checksum<br> vdisk RAID code
array vdisk
size block <br>size granularity state remarks<br> ------------------ ------------------ ----------- ----------
<br>---------- ----------- ----- -------<br> sf_g_01_logTip 2WayReplication
NVR 48 MiB
2 <br>MiB 4096 ok logTip<br> sf_g_01_logTipBackup Unreplicated SSD
48 MiB <br>2 MiB 4096 ok logTipBackup<br> sf_g_01_logHome 4WayReplication DA1
144 GiB 2
<br>MiB 4096 ok log<br> sf_g_01_vdisk02 3WayReplication DA1
103 GiB 1
<br>MiB 32 KiB ok<br> sf_g_01_vdisk07 3WayReplication DA1
103 GiB 1
<br>MiB 32 KiB ok<br> sf_g_01_vdisk01 8+2p
DA1 270 TiB
16 <br>MiB 32 KiB ok<br><br> config data declustered array
spare space remarks<br> ------------------ ------------------ -------------
-------<br> rebuild space DA1
68 pdisk <br> increasing VCD spares is suggested<br><br> config data disk group fault tolerance
remarks<br> ------------------ --------------------------------- -------<br> rg descriptor 1 node + 3 pdisk
limited by <br>rebuild space<br> system index 1 node + 3 pdisk
limited by <br>rebuild space<br><br> vdisk disk group
fault tolerance remarks<br> ------------------ --------------------------------- -------<br> sf_g_01_logTip 1 pdisk<br> sf_g_01_logTipBackup 0 pdisk<br> sf_g_01_logHome 1 node + 2 pdisk
limited by <br>rebuild space<br> sf_g_01_vdisk02 1 node + 1 pdisk
limited by <br>rebuild space<br> sf_g_01_vdisk07 1 node + 1 pdisk
limited by <br>rebuild space<br> sf_g_01_vdisk01 2 pdisk<br><br><br>Thanks,<br>Ivano<br><br><br><br><br>Il 16/11/17 13:03, Olaf Weiser ha scritto:<br>> Rjx, that makes it a bit clearer.. as your vdisk is big
enough to span<br>> over all pdisks in each of your test 1/1 or 1/2 or 1/4 of
capacity...<br>> should bring the same performance. ..<br>><br>> You mean something about vdisk Layout. ..<br>> So in your test, for the full capacity test, you use just one
vdisk per<br>> RG - so 2 in total for 'data' - right?<br>><br>> What about Md .. did you create separate vdisk for MD / what
size then<br>> ?<br>><br>> Gesendet von IBM Verse<br>><br>> Ivano Talamo --- Re: [gpfsug-discuss] Write performances and filesystem<br>> size ---<br>><br>> Von:
"Ivano Talamo" <Ivano.Talamo@psi.ch><br>> An:
"gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org><br>> Datum:
Do. 16.11.2017 03:49<br>> Betreff:
Re: [gpfsug-discuss] Write performances and filesystem size<br>><br>> ------------------------------------------------------------------------<br>><br>> Hello Olaf,<br>><br>> yes, I confirm that is the Lenovo version of the ESS GL2, so 2<br>> enclosures/4 drawers/166 disks in total.<br>><br>> Each recovery group has one declustered array with all disks inside,
so<br>> vdisks use all the physical ones, even in the case of a vdisk that
is<br>> 1/4 of the total size.<br>><br>> Regarding the layout allocation we used scatter.<br>><br>> The tests were done on the just created filesystem, so no close-to-full<br>> effect. And we run gpfsperf write seq.<br>><br>> Thanks,<br>> Ivano<br>><br>><br>> Il 16/11/17 04:42, Olaf Weiser ha scritto:<br>>> Sure... as long we assume that really all physical disk are used
.. the<br>>> fact that was told 1/2 or 1/4 might turn out
that one / two complet<br>>> enclosures 're eliminated ... ? ..that s why I was asking
for more<br>>> details ..<br>>><br>>> I dont see this degration in my environments. . as long the vdisks
are<br>>> big enough to span over all pdisks ( which should be the case
for<br>>> capacity in a range of TB ) ... the performance stays the same<br>>><br>>> Gesendet von IBM Verse<br>>><br>>> Jan-Frode Myklebust --- Re: [gpfsug-discuss] Write performances
and<br>>> filesystem size ---<br>>><br>>> Von: "Jan-Frode Myklebust" <janfrode@tanso.net><br>>> An: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org><br>>> Datum: Mi. 15.11.2017 21:35<br>>> Betreff: Re: [gpfsug-discuss] Write performances
and filesystem size<br>>><br>>> ------------------------------------------------------------------------<br>>><br>>> Olaf, this looks like a Lenovo «ESS GLxS» version. Should be using
same<br>>> number of spindles for any size filesystem, so I would also expect
them<br>>> to perform the same.<br>>><br>>><br>>><br>>> -jf<br>>><br>>><br>>> ons. 15. nov. 2017 kl. 11:26 skrev Olaf Weiser <olaf.weiser@de.ibm.com<br>>> <</font></tt><a href=mailto:olaf.weiser@de.ibm.com><tt><font size=2>mailto:olaf.weiser@de.ibm.com</font></tt></a><tt><font size=2>>>:<br>>><br>>> to add a comment ... .. very simply...
depending on how you<br>>> allocate the physical block storage .... if you
- simply - using<br>>> less physical resources when reducing the capacity
(in the same<br>>> ratio) .. you get , what you see....<br>>><br>>> so you need to tell us, how you allocate your block-storage
.. (Do<br>>> you using RAID controllers , where are your LUNs
coming from, are<br>>> then less RAID groups involved, when reducing the
capacity ?...)<br>>><br>>> GPFS can be configured to give you pretty as much
as what the<br>>> hardware can deliver.. if you reduce resource..
... you'll get less<br>>> , if you enhance your hardware .. you get more...
almost regardless<br>>> of the total capacity in #blocks ..<br>>><br>>><br>>><br>>><br>>><br>>><br>>> From: "Kumaran Rajaram"
<kums@us.ibm.com<br>>> <</font></tt><a href=mailto:kums@us.ibm.com><tt><font size=2>mailto:kums@us.ibm.com</font></tt></a><tt><font size=2>>><br>>> To: gpfsug main discussion
list<br>>> <gpfsug-discuss@spectrumscale.org<br>>> <</font></tt><a href="mailto:gpfsug-discuss@spectrumscale.org"><tt><font size=2>mailto:gpfsug-discuss@spectrumscale.org</font></tt></a><tt><font size=2>>><br>>> Date: 11/15/2017 11:56
AM<br>>> Subject: Re: [gpfsug-discuss]
Write performances and<br>>> filesystem size<br>>> Sent by: gpfsug-discuss-bounces@spectrumscale.org<br>>> <</font></tt><a href="mailto:gpfsug-discuss-bounces@spectrumscale.org"><tt><font size=2>mailto:gpfsug-discuss-bounces@spectrumscale.org</font></tt></a><tt><font size=2>><br>>><br>> ------------------------------------------------------------------------<br>>><br>>><br>>><br>>> Hi,<br>>><br>>> >>Am I missing something? Is this an expected
behaviour and someone<br>>> has an explanation for this?<br>>><br>>> Based on your scenario, write degradation as the
file-system is<br>>> populated is possible if you had formatted the file-system
with "-j<br>>> cluster".<br>>><br>>> For consistent file-system performance, we recommend
*mmcrfs "-j<br>>> scatter" layoutMap.* Also, we need to
ensure the mmcrfs "-n" is<br>>> set properly.<br>>><br>>> [snip from mmcrfs]/<br>>> # mmlsfs <fs> | egrep 'Block allocation| Estimated
number'<br>>> -j
scatter
Block allocation type<br>>> -n
128
Estimated number of<br>>> nodes that will mount file system/<br>>> [/snip]<br>>><br>>><br>>> [snip from man mmcrfs]/<br>>> *layoutMap={scatter|*//*cluster}*//<br>>>
Specifies the block allocation map type. When<br>>>
allocating blocks for a given file, GPFS first<br>>>
uses a round‐robin algorithm to spread the data<br>>>
across all disks in the storage pool. After a<br>>>
disk is selected, the location of the data<br>>>
block on the disk is determined by the block<br>>>
allocation map type*. If cluster is<br>>>
specified, GPFS attempts to allocate blocks in<br>>>
clusters. Blocks that belong to a particular<br>>>
file are kept adjacent to each other within<br>>>
each cluster. If scatter is specified,<br>>>
the location of the block is chosen randomly.*/<br>>> /<br>>> *
The cluster allocation method may provide<br>>>
better disk performance for some disk<br>>>
subsystems in relatively small installations.<br>>>
The benefits of clustered block allocation<br>>>
diminish when the number of nodes in the<br>>>
cluster or the number of disks in a file system<br>>>
increases, or when the file system’s free space<br>>>
becomes fragmented. *//The *cluster*//<br>>>
allocation method is the default for GPFS<br>>>
clusters with eight or fewer nodes and for file<br>>>
systems with eight or fewer disks./<br>>> /<br>>>
*The scatter allocation method provides<br>>>
more consistent file system performance by<br>>>
averaging out performance variations due to<br>>>
block location (for many disk subsystems, the<br>>>
location of the data relative to the disk edge<br>>>
has a substantial effect on performance).*//This<br>>>
allocation method is appropriate in most cases<br>>>
and is the default for GPFS clusters with more<br>>>
than eight nodes or file systems with more than<br>>>
eight disks./<br>>> /<br>>>
The block allocation map type cannot be changed<br>>>
after the storage pool has been created./<br>>><br>>> */<br>>> -n/*/*NumNodes*//<br>>> The estimated number
of nodes that will mount the file<br>>> system in the local
cluster and all remote clusters.<br>>> This is used as a best
guess for the initial size of<br>>> some file system data
structures. The default is 32.<br>>> This value can be changed
after the file system has been<br>>> created but it does
not change the existing data<br>>> structures. Only the
newly created data structure is<br>>> affected by the new
value. For example, new storage<br>>> pool./<br>>> /<br>>> When you create a GPFS
file system, you might want to<br>>> overestimate the number
of nodes that will mount the<br>>> file system. GPFS uses
this information for creating<br>>> data structures that
are essential for achieving maximum<br>>> parallelism in file
system operations (For more<br>>> information, see GPFS
architecture in IBM Spectrum<br>>> Scale: Concepts, Planning,
and Installation Guide ). If<br>>> you are sure there will
never be more than 64 nodes,<br>>> allow the default value
to be applied. If you are<br>>> planning to add nodes
to your system, you should specify<br>>> a number larger than
the default./<br>>><br>>> [/snip from man mmcrfs]<br>>><br>>> Regards,<br>>> -Kums<br>>><br>>><br>>><br>>><br>>><br>>> From: Ivano Talamo <Ivano.Talamo@psi.ch<br>>> <</font></tt><a href=mailto:Ivano.Talamo@psi.ch><tt><font size=2>mailto:Ivano.Talamo@psi.ch</font></tt></a><tt><font size=2>>><br>>> To: <gpfsug-discuss@spectrumscale.org<br>>> <</font></tt><a href="mailto:gpfsug-discuss@spectrumscale.org"><tt><font size=2>mailto:gpfsug-discuss@spectrumscale.org</font></tt></a><tt><font size=2>>><br>>> Date: 11/15/2017 11:25
AM<br>>> Subject: [gpfsug-discuss]
Write performances and filesystem<br>> size<br>>> Sent by: gpfsug-discuss-bounces@spectrumscale.org<br>>> <</font></tt><a href="mailto:gpfsug-discuss-bounces@spectrumscale.org"><tt><font size=2>mailto:gpfsug-discuss-bounces@spectrumscale.org</font></tt></a><tt><font size=2>><br>>><br>> ------------------------------------------------------------------------<br>>><br>>><br>>><br>>> Hello everybody,<br>>><br>>> together with my colleagues we are actually running
some tests on<br>> a new<br>>> DSS G220 system and we see some unexpected behaviour.<br>>><br>>> What we actually see is that write performances
(we did not test read<br>>> yet) decreases with the decrease of filesystem size.<br>>><br>>> I will not go into the details of the tests, but
here are some<br>> numbers:<br>>><br>>> - with a filesystem using the full 1.2 PB space
we get 14 GB/s as the<br>>> sum of the disk activity on the two IO servers;<br>>> - with a filesystem using half of the space we get
10 GB/s;<br>>> - with a filesystem using 1/4 of the space we get
5 GB/s.<br>>><br>>> We also saw that performances are not affected by
the vdisks layout,<br>>> ie.<br>>> taking the full space with one big vdisk or 2 half-size
vdisks per RG<br>>> gives the same performances.<br>>><br>>> To our understanding the IO should be spread evenly
across all the<br>>> pdisks in the declustered array, and looking at
iostat all disks<br>>> seem to<br>>> be accessed. But so there must be some other element
that affects<br>>> performances.<br>>><br>>> Am I missing something? Is this an expected behaviour
and someone<br>>> has an<br>>> explanation for this?<br>>><br>>> Thank you,<br>>> Ivano<br>>> _______________________________________________<br>>> gpfsug-discuss mailing list<br>>> gpfsug-discuss at spectrumscale.org <</font></tt><a href=http://spectrumscale.org/><tt><font size=2>http://spectrumscale.org</font></tt></a><tt><font size=2>>_<br>>><br>> __https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=_<br>>><br>>><br>>> _______________________________________________<br>>> gpfsug-discuss mailing list<br>>> gpfsug-discuss at spectrumscale.org <</font></tt><a href=http://spectrumscale.org/><tt><font size=2>http://spectrumscale.org</font></tt></a><tt><font size=2>><br>>> </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>>><br>>><br>>><br>>> _______________________________________________<br>>> gpfsug-discuss mailing list<br>>> gpfsug-discuss at spectrumscale.org <</font></tt><a href=http://spectrumscale.org/><tt><font size=2>http://spectrumscale.org</font></tt></a><tt><font size=2>><br>>> </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>>><br>>><br>>><br>>><br>>> _______________________________________________<br>>> gpfsug-discuss mailing list<br>>> gpfsug-discuss at spectrumscale.org<br>>> </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>>><br>> _______________________________________________<br>> gpfsug-discuss mailing list<br>> gpfsug-discuss at spectrumscale.org<br>> </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>><br>><br>><br>><br>> _______________________________________________<br>> gpfsug-discuss mailing list<br>> gpfsug-discuss at spectrumscale.org<br>> </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br>><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br></font></tt><br><br></div><BR>