[gpfsug-discuss] pool block allocation algorithm
Aaron Knister
aaron.s.knister at nasa.gov
Sat Jan 13 17:26:51 GMT 2018
Thanks, Peter. That definitely makes sense and I was actually wondering
if performance was a factor. Do you know where to look to see what GPFS'
perception of "performance" is for a given NSD?
-Aaron
On 1/13/18 12:00 PM, Peter Serocka wrote:
> Within reasonable capacity limits it would also expect
> to direct incoming data to disks that are best “available”
> from a current performance perspective — like doing least
> IOPS, having lowest latency and shortest filled queue length.
>
> You new NSDs, filled only with recent data, might quickly have
> become the most busy units before reaching capacity balance,
> simply because recent data tends to be more active than older stuff.
>
> Makes sense?
>
> — Peter
>
>> On 2018 Jan 13 Sat, at 17:18, Aaron Knister <aaron.s.knister at nasa.gov> wrote:
>>
>> Thanks Everyone! I whipped up a script to dump the block layout of a
>> file and then join that with mmdf information. As part of my exploration
>> I wrote one 2GB file to each of this particular filesystem's 4 data
>> pools last night (using "touch $file; mmchattr $file -P $pool; dd
>> of=$file") and have attached a dump of the layout/nsd information for
>> each file/pool. The fields for the output are:
>>
>> diskId, numBlocksOnDisk, diskName, diskSize, failureGroup, freeBlocks,
>> freePct, freeKbFragments, freeKbFragmentsPct
>>
>>
>> Here's the highlight from pool1:
>>
>> 36 264 d13_06_006 23437934592 1213 4548935680 (19%)
>> 83304320 (0%)
>> 59 74 d10_41_025 23437934592 1011 6993759232 (30%)
>> 58642816 (0%)
>>
>> For this file (And anecdotally what I've seen looking at NSD I/O data
>> for other files written to this pool) the pattern of more blocks being
>> allocated to the NSDs that are ~20% free vs the NSDs that are 30% free
>> seems to be fairly consistent.
>>
>>
>> Looking at a snippet of pool2:
>> 101 238 d15_15_011 23437934592 1415 2008394752 (9%)
>> 181699328 (1%)
>> 102 235 d15_16_012 23437934592 1415 2009153536 (9%)
>> 182165312 (1%)
>> 116 248 d11_42_026 23437934592 1011 4146111488 (18%)
>> 134941504 (1%)
>> 117 249 d13_42_026 23437934592 1213 4147710976 (18%)
>> 135203776 (1%)
>>
>> there are slightly more blocks allocated in general on the NSDs with
>> twice the amount of free space, but it doesn't seem to be a significant
>> amount relative to the delta in free space. The pattern from pool1
>> certainly doesn't hold true here.
>>
>> Pool4 isn't very interesting because all of the NSDs are well balanced
>> in terms of free space (all 16% free).
>>
>> Pool3, however, *is* particularly interesting. Here's a snippet:
>>
>> 106 222 d15_24_016 23437934592 1415 2957561856 (13%)
>> 37436768 (0%)
>> 107 222 d15_25_017 23437934592 1415 2957537280 (13%)
>> 37353984 (0%)
>> 108 222 d15_26_018 23437934592 1415 2957539328 (13%)
>> 37335872 (0%)
>> 125 222 d11_44_028 23437934592 1011 13297235968 (57%)
>> 20505568 (0%)
>> 126 222 d12_44_028 23437934592 1213 13296712704 (57%)
>> 20632768 (0%)
>> 127 222 d12_45_029 23437934592 1213 13297404928 (57%)
>> 20557408 (0%)
>>
>> GPFS consistently allocated the same number of blocks to disks with ~4x
>> the free space than it did the other disks in the pool.
>>
>> Suffice it to say-- I'm *very* confused :)
>>
>> -Aaron
>>
>> On 1/13/18 8:18 AM, Daniel Kidger wrote:
>>> Aaron,
>>>
>>> Also are your new NSDs the same size as your existing ones?
>>> i.e. the NSDs that are at a higher %age full might have more free blocks
>>> than the other NSDs?
>>> Daniel
>>>
>>>
>>> IBM Storage Professional Badge
>>> <https://www.youracclaim.com/user/danel-kidger>
>>>
>>>
>>> *Dr Daniel Kidger*
>>> IBM Technical Sales Specialist
>>> Software Defined Solution Sales
>>>
>>> +44-(0)7818 522 266
>>> daniel.kidger at uk.ibm.com
>>>
>>>
>>>
>>>
>>>
>>> ----- Original message -----
>>> From: Jan-Frode Myklebust <janfrode at tanso.net>
>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>> Cc:
>>> Subject: Re: [gpfsug-discuss] pool block allocation algorithm
>>> Date: Sat, Jan 13, 2018 9:25 AM
>>>
>>> Don’t have documentation/whitepaper, but as I recall, it will first
>>> allocate round-robin over failureGroup, then round-robin over
>>> nsdServers, and then round-robin over volumes. So if these new NSDs
>>> are defined as different failureGroup from the old disks, that might
>>> explain it..
>>>
>>>
>>> -jf
>>> lør. 13. jan. 2018 kl. 00:15 skrev Aaron Knister
>>> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>:
>>>
>>> Apologies if this has been covered elsewhere (I couldn't find it
>>> if it
>>> has). I'm curious how GPFS decides where to allocate new blocks.
>>>
>>> We've got a filesystem that we added some NSDs to a while back
>>> and it
>>> hurt there for a little while because it appeared as though GPFS was
>>> choosing to allocate new blocks much more frequently on the
>>> ~100% free
>>> LUNs than the existing LUNs (I can't recall how free they were
>>> at the
>>> time). Looking at it now, though, it seems GPFS is doing the
>>> opposite.
>>> There's now a ~10% difference between the LUNs added and the
>>> existing
>>> LUNs (20% free vs 30% free) and GPFS is choosing to allocate new
>>> writes
>>> at a ratio of about 3:1 on the disks with *fewer* free blocks
>>> than on
>>> the disks with more free blocks. That's completely inconsistent with
>>> what we saw when we initially added the disks which makes me
>>> wonder how
>>> GPFS is choosing to allocate new blocks (other than the obvious bits
>>> about failure group, and replication factor). Could someone
>>> explain (or
>>> point me at a whitepaper) what factors GPFS uses when allocating
>>> blocks,
>>> particularly as it pertains to choosing one NSD over another
>>> within the
>>> same failure group.
>>>
>>> Thanks!
>>>
>>> -Aaron
>>>
>>> --
>>> Aaron Knister
>>> NASA Center for Climate Simulation (Code 606.2)
>>> Goddard Space Flight Center
>>> (301) 286-2776
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=QYsXVDOdNRcII7FPtAbCXEyYJzNSd_UXq8bmreALKxs&e=>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=NSd3e2hwIKBCwSxsKe-GTf8EwJ6AkZQiTsRQZ73UH20&e=>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=NSd3e2hwIKBCwSxsKe-GTf8EwJ6AkZQiTsRQZ73UH20&e=
>>>
>>>
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with number
>>> 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>
>> --
>> Aaron Knister
>> NASA Center for Climate Simulation (Code 606.2)
>> Goddard Space Flight Center
>> (301) 286-2776
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list