[gpfsug-discuss] pool block allocation algorithm
Aaron Knister
aaron.s.knister at nasa.gov
Sat Jan 13 16:56:52 GMT 2018
Sorry, I didn't explicitly say it, but the output I sent should answer
both Daniel's and Jan-Frode's questions.
In short, though, the new NSDs were added to existing failure groups and
they should all be (except in one or two cases where we re-formatted the
LUN and the size changed slightly) the same size.
-Aaron
On 1/13/18 11:18 AM, Aaron Knister wrote:
> Thanks Everyone! I whipped up a script to dump the block layout of a
> file and then join that with mmdf information. As part of my exploration
> I wrote one 2GB file to each of this particular filesystem's 4 data
> pools last night (using "touch $file; mmchattr $file -P $pool; dd
> of=$file") and have attached a dump of the layout/nsd information for
> each file/pool. The fields for the output are:
>
> diskId, numBlocksOnDisk, diskName, diskSize, failureGroup, freeBlocks,
> freePct, freeKbFragments, freeKbFragmentsPct
>
>
> Here's the highlight from pool1:
>
> 36 264 d13_06_006 23437934592 1213 4548935680 (19%)
> 83304320 (0%)
> 59 74 d10_41_025 23437934592 1011 6993759232 (30%)
> 58642816 (0%)
>
> For this file (And anecdotally what I've seen looking at NSD I/O data
> for other files written to this pool) the pattern of more blocks being
> allocated to the NSDs that are ~20% free vs the NSDs that are 30% free
> seems to be fairly consistent.
>
>
> Looking at a snippet of pool2:
> 101 238 d15_15_011 23437934592 1415 2008394752 (9%)
> 181699328 (1%)
> 102 235 d15_16_012 23437934592 1415 2009153536 (9%)
> 182165312 (1%)
> 116 248 d11_42_026 23437934592 1011 4146111488 (18%)
> 134941504 (1%)
> 117 249 d13_42_026 23437934592 1213 4147710976 (18%)
> 135203776 (1%)
>
> there are slightly more blocks allocated in general on the NSDs with
> twice the amount of free space, but it doesn't seem to be a significant
> amount relative to the delta in free space. The pattern from pool1
> certainly doesn't hold true here.
>
> Pool4 isn't very interesting because all of the NSDs are well balanced
> in terms of free space (all 16% free).
>
> Pool3, however, *is* particularly interesting. Here's a snippet:
>
> 106 222 d15_24_016 23437934592 1415 2957561856 (13%)
> 37436768 (0%)
> 107 222 d15_25_017 23437934592 1415 2957537280 (13%)
> 37353984 (0%)
> 108 222 d15_26_018 23437934592 1415 2957539328 (13%)
> 37335872 (0%)
> 125 222 d11_44_028 23437934592 1011 13297235968 (57%)
> 20505568 (0%)
> 126 222 d12_44_028 23437934592 1213 13296712704 (57%)
> 20632768 (0%)
> 127 222 d12_45_029 23437934592 1213 13297404928 (57%)
> 20557408 (0%)
>
> GPFS consistently allocated the same number of blocks to disks with ~4x
> the free space than it did the other disks in the pool.
>
> Suffice it to say-- I'm *very* confused :)
>
> -Aaron
>
> On 1/13/18 8:18 AM, Daniel Kidger wrote:
>> Aaron,
>>
>> Also are your new NSDs the same size as your existing ones?
>> i.e. the NSDs that are at a higher %age full might have more free blocks
>> than the other NSDs?
>> Daniel
>>
>>
>> IBM Storage Professional Badge
>> <https://www.youracclaim.com/user/danel-kidger>
>>
>>
>> *Dr Daniel Kidger*
>> IBM Technical Sales Specialist
>> Software Defined Solution Sales
>>
>> +44-(0)7818 522 266
>> daniel.kidger at uk.ibm.com
>>
>>
>>
>>
>>
>> ----- Original message -----
>> From: Jan-Frode Myklebust <janfrode at tanso.net>
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Cc:
>> Subject: Re: [gpfsug-discuss] pool block allocation algorithm
>> Date: Sat, Jan 13, 2018 9:25 AM
>>
>> Don’t have documentation/whitepaper, but as I recall, it will first
>> allocate round-robin over failureGroup, then round-robin over
>> nsdServers, and then round-robin over volumes. So if these new NSDs
>> are defined as different failureGroup from the old disks, that might
>> explain it..
>>
>>
>> -jf
>> lør. 13. jan. 2018 kl. 00:15 skrev Aaron Knister
>> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>:
>>
>> Apologies if this has been covered elsewhere (I couldn't find it
>> if it
>> has). I'm curious how GPFS decides where to allocate new blocks.
>>
>> We've got a filesystem that we added some NSDs to a while back
>> and it
>> hurt there for a little while because it appeared as though GPFS was
>> choosing to allocate new blocks much more frequently on the
>> ~100% free
>> LUNs than the existing LUNs (I can't recall how free they were
>> at the
>> time). Looking at it now, though, it seems GPFS is doing the
>> opposite.
>> There's now a ~10% difference between the LUNs added and the
>> existing
>> LUNs (20% free vs 30% free) and GPFS is choosing to allocate new
>> writes
>> at a ratio of about 3:1 on the disks with *fewer* free blocks
>> than on
>> the disks with more free blocks. That's completely inconsistent with
>> what we saw when we initially added the disks which makes me
>> wonder how
>> GPFS is choosing to allocate new blocks (other than the obvious bits
>> about failure group, and replication factor). Could someone
>> explain (or
>> point me at a whitepaper) what factors GPFS uses when allocating
>> blocks,
>> particularly as it pertains to choosing one NSD over another
>> within the
>> same failure group.
>>
>> Thanks!
>>
>> -Aaron
>>
>> --
>> Aaron Knister
>> NASA Center for Climate Simulation (Code 606.2)
>> Goddard Space Flight Center
>> (301) 286-2776
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=QYsXVDOdNRcII7FPtAbCXEyYJzNSd_UXq8bmreALKxs&e=>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=NSd3e2hwIKBCwSxsKe-GTf8EwJ6AkZQiTsRQZ73UH20&e=>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=NSd3e2hwIKBCwSxsKe-GTf8EwJ6AkZQiTsRQZ73UH20&e=
>>
>>
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list