[gpfsug-discuss] pool block allocation algorithm

Aaron Knister aaron.s.knister at nasa.gov
Sat Jan 13 16:56:52 GMT 2018


Sorry, I didn't explicitly say it, but the output I sent should answer
both Daniel's and Jan-Frode's questions.

In short, though, the new NSDs were added to existing failure groups and
they should all be (except in one or two cases where we re-formatted the
LUN and the size changed slightly) the same size.

-Aaron

On 1/13/18 11:18 AM, Aaron Knister wrote:
> Thanks Everyone! I whipped up a script to dump the block layout of a
> file and then join that with mmdf information. As part of my exploration
> I wrote one 2GB file to each of this particular filesystem's 4 data
> pools last night (using "touch $file; mmchattr $file -P $pool; dd
> of=$file") and have attached a dump of the layout/nsd information for
> each file/pool. The fields for the output are:
> 
> diskId, numBlocksOnDisk, diskName, diskSize, failureGroup, freeBlocks,
> freePct, freeKbFragments, freeKbFragmentsPct
> 
> 
> Here's the highlight from pool1:
> 
>  36  264  d13_06_006    23437934592  1213    4548935680  (19%)
> 83304320   (0%)
>  59   74  d10_41_025    23437934592  1011    6993759232  (30%)
> 58642816   (0%)
> 
> For this file (And anecdotally what I've seen looking at NSD I/O data
> for other files written to this pool) the pattern of more blocks being
> allocated to the NSDs that are ~20% free vs the NSDs that are 30% free
> seems to be fairly consistent.
> 
> 
> Looking at a snippet of pool2:
> 101  238  d15_15_011    23437934592  1415    2008394752   (9%)
> 181699328   (1%)
> 102  235  d15_16_012    23437934592  1415    2009153536   (9%)
> 182165312   (1%)
> 116  248  d11_42_026    23437934592  1011    4146111488  (18%)
> 134941504   (1%)
> 117  249  d13_42_026    23437934592  1213    4147710976  (18%)
> 135203776   (1%)
> 
> there are slightly more blocks allocated in general on the NSDs with
> twice the amount of free space, but it doesn't seem to be a significant
> amount relative to the delta in free space. The pattern from pool1
> certainly doesn't hold true here.
> 
> Pool4 isn't very interesting because all of the NSDs are well balanced
> in terms of free space (all 16% free).
> 
> Pool3, however, *is* particularly interesting. Here's a snippet:
> 
> 106  222  d15_24_016    23437934592  1415    2957561856  (13%)
> 37436768   (0%)
> 107  222  d15_25_017    23437934592  1415    2957537280  (13%)
> 37353984   (0%)
> 108  222  d15_26_018    23437934592  1415    2957539328  (13%)
> 37335872   (0%)
> 125  222  d11_44_028    23437934592  1011   13297235968  (57%)
> 20505568   (0%)
> 126  222  d12_44_028    23437934592  1213   13296712704  (57%)
> 20632768   (0%)
> 127  222  d12_45_029    23437934592  1213   13297404928  (57%)
> 20557408   (0%)
> 
> GPFS consistently allocated the same number of blocks to disks with ~4x
> the free space than it did the other disks in the pool.
> 
> Suffice it to say-- I'm *very* confused :)
> 
> -Aaron
> 
> On 1/13/18 8:18 AM, Daniel Kidger wrote:
>> Aaron,
>>  
>> Also are your new NSDs the same size as your existing ones?
>> i.e. the NSDs that are at a higher %age full might have more free blocks
>> than the other NSDs?
>> Daniel
>>
>>  
>> IBM Storage Professional Badge
>> <https://www.youracclaim.com/user/danel-kidger>
>>  
>> 	  	
>> *Dr Daniel Kidger*
>> IBM Technical Sales Specialist
>> Software Defined Solution Sales
>>
>> +44-(0)7818 522 266
>> daniel.kidger at uk.ibm.com
>>
>>  
>>  
>>  
>>
>>     ----- Original message -----
>>     From: Jan-Frode Myklebust <janfrode at tanso.net>
>>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>     Cc:
>>     Subject: Re: [gpfsug-discuss] pool block allocation algorithm
>>     Date: Sat, Jan 13, 2018 9:25 AM
>>      
>>     Don’t have documentation/whitepaper, but as I recall, it will first
>>     allocate round-robin over failureGroup, then round-robin over
>>     nsdServers, and then round-robin over volumes. So if these new NSDs
>>     are defined as different failureGroup from the old disks, that might
>>     explain it..
>>
>>
>>     -jf
>>     lør. 13. jan. 2018 kl. 00:15 skrev Aaron Knister
>>     <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>:
>>
>>         Apologies if this has been covered elsewhere (I couldn't find it
>>         if it
>>         has). I'm curious how GPFS decides where to allocate new blocks.
>>
>>         We've got a filesystem that we added some NSDs to a while back
>>         and it
>>         hurt there for a little while because it appeared as though GPFS was
>>         choosing to allocate new blocks much more frequently on the
>>         ~100% free
>>         LUNs than the existing LUNs (I can't recall how free they were
>>         at the
>>         time). Looking at it now, though, it seems GPFS is doing the
>>         opposite.
>>         There's now a ~10% difference between the LUNs added and the
>>         existing
>>         LUNs (20% free vs 30% free) and GPFS is choosing to allocate new
>>         writes
>>         at a ratio of about 3:1 on the disks with *fewer* free blocks
>>         than on
>>         the disks with more free blocks. That's completely inconsistent with
>>         what we saw when we initially added the disks which makes me
>>         wonder how
>>         GPFS is choosing to allocate new blocks (other than the obvious bits
>>         about failure group, and replication factor). Could someone
>>         explain (or
>>         point me at a whitepaper) what factors GPFS uses when allocating
>>         blocks,
>>         particularly as it pertains to choosing one NSD over another
>>         within the
>>         same failure group.
>>
>>         Thanks!
>>
>>         -Aaron
>>
>>         --
>>         Aaron Knister
>>         NASA Center for Climate Simulation (Code 606.2)
>>         Goddard Space Flight Center
>>         (301) 286-2776
>>         _______________________________________________
>>         gpfsug-discuss mailing list
>>         gpfsug-discuss at spectrumscale.org
>>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=QYsXVDOdNRcII7FPtAbCXEyYJzNSd_UXq8bmreALKxs&e=>
>>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=NSd3e2hwIKBCwSxsKe-GTf8EwJ6AkZQiTsRQZ73UH20&e=>
>>
>>     _______________________________________________
>>     gpfsug-discuss mailing list
>>     gpfsug-discuss at spectrumscale.org
>>     https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=NSd3e2hwIKBCwSxsKe-GTf8EwJ6AkZQiTsRQZ73UH20&e=
>>
>>  
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list