[gpfsug-discuss] pool block allocation algorithm

Aaron Knister aaron.s.knister at nasa.gov
Sat Jan 13 17:26:51 GMT 2018


Thanks, Peter. That definitely makes sense and I was actually wondering
if performance was a factor. Do you know where to look to see what GPFS'
perception of "performance" is for a given NSD?

-Aaron

On 1/13/18 12:00 PM, Peter Serocka wrote:
> Within reasonable capacity limits it would also expect
> to direct incoming data to disks that are best “available”
> from a current performance perspective — like doing least
> IOPS, having lowest latency and shortest filled queue length.
> 
> You new NSDs, filled only with recent data, might quickly have
> become the most busy units before reaching capacity balance,
> simply because recent data tends to be more active than older stuff.
> 
> Makes sense?
> 
> — Peter
> 
>> On 2018 Jan 13 Sat, at 17:18, Aaron Knister <aaron.s.knister at nasa.gov> wrote:
>> 
>> Thanks Everyone! I whipped up a script to dump the block layout of a
>> file and then join that with mmdf information. As part of my exploration
>> I wrote one 2GB file to each of this particular filesystem's 4 data
>> pools last night (using "touch $file; mmchattr $file -P $pool; dd
>> of=$file") and have attached a dump of the layout/nsd information for
>> each file/pool. The fields for the output are:
>> 
>> diskId, numBlocksOnDisk, diskName, diskSize, failureGroup, freeBlocks,
>> freePct, freeKbFragments, freeKbFragmentsPct
>> 
>> 
>> Here's the highlight from pool1:
>> 
>> 36  264  d13_06_006    23437934592  1213    4548935680  (19%)
>> 83304320   (0%)
>> 59   74  d10_41_025    23437934592  1011    6993759232  (30%)
>> 58642816   (0%)
>> 
>> For this file (And anecdotally what I've seen looking at NSD I/O data
>> for other files written to this pool) the pattern of more blocks being
>> allocated to the NSDs that are ~20% free vs the NSDs that are 30% free
>> seems to be fairly consistent.
>> 
>> 
>> Looking at a snippet of pool2:
>> 101  238  d15_15_011    23437934592  1415    2008394752   (9%)
>> 181699328   (1%)
>> 102  235  d15_16_012    23437934592  1415    2009153536   (9%)
>> 182165312   (1%)
>> 116  248  d11_42_026    23437934592  1011    4146111488  (18%)
>> 134941504   (1%)
>> 117  249  d13_42_026    23437934592  1213    4147710976  (18%)
>> 135203776   (1%)
>> 
>> there are slightly more blocks allocated in general on the NSDs with
>> twice the amount of free space, but it doesn't seem to be a significant
>> amount relative to the delta in free space. The pattern from pool1
>> certainly doesn't hold true here.
>> 
>> Pool4 isn't very interesting because all of the NSDs are well balanced
>> in terms of free space (all 16% free).
>> 
>> Pool3, however, *is* particularly interesting. Here's a snippet:
>> 
>> 106  222  d15_24_016    23437934592  1415    2957561856  (13%)
>> 37436768   (0%)
>> 107  222  d15_25_017    23437934592  1415    2957537280  (13%)
>> 37353984   (0%)
>> 108  222  d15_26_018    23437934592  1415    2957539328  (13%)
>> 37335872   (0%)
>> 125  222  d11_44_028    23437934592  1011   13297235968  (57%)
>> 20505568   (0%)
>> 126  222  d12_44_028    23437934592  1213   13296712704  (57%)
>> 20632768   (0%)
>> 127  222  d12_45_029    23437934592  1213   13297404928  (57%)
>> 20557408   (0%)
>> 
>> GPFS consistently allocated the same number of blocks to disks with ~4x
>> the free space than it did the other disks in the pool.
>> 
>> Suffice it to say-- I'm *very* confused :)
>> 
>> -Aaron
>> 
>> On 1/13/18 8:18 AM, Daniel Kidger wrote:
>>> Aaron,
>>>  
>>> Also are your new NSDs the same size as your existing ones?
>>> i.e. the NSDs that are at a higher %age full might have more free blocks
>>> than the other NSDs?
>>> Daniel
>>> 
>>>  
>>> IBM Storage Professional Badge
>>> <https://www.youracclaim.com/user/danel-kidger>
>>>  
>>>                 
>>> *Dr Daniel Kidger*
>>> IBM Technical Sales Specialist
>>> Software Defined Solution Sales
>>> 
>>> +44-(0)7818 522 266
>>> daniel.kidger at uk.ibm.com
>>> 
>>>  
>>>  
>>>  
>>> 
>>>    ----- Original message -----
>>>    From: Jan-Frode Myklebust <janfrode at tanso.net>
>>>    Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>>    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>>    Cc:
>>>    Subject: Re: [gpfsug-discuss] pool block allocation algorithm
>>>    Date: Sat, Jan 13, 2018 9:25 AM
>>>     
>>>    Don’t have documentation/whitepaper, but as I recall, it will first
>>>    allocate round-robin over failureGroup, then round-robin over
>>>    nsdServers, and then round-robin over volumes. So if these new NSDs
>>>    are defined as different failureGroup from the old disks, that might
>>>    explain it..
>>> 
>>> 
>>>    -jf
>>>    lør. 13. jan. 2018 kl. 00:15 skrev Aaron Knister
>>>    <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>:
>>> 
>>>        Apologies if this has been covered elsewhere (I couldn't find it
>>>        if it
>>>        has). I'm curious how GPFS decides where to allocate new blocks.
>>> 
>>>        We've got a filesystem that we added some NSDs to a while back
>>>        and it
>>>        hurt there for a little while because it appeared as though GPFS was
>>>        choosing to allocate new blocks much more frequently on the
>>>        ~100% free
>>>        LUNs than the existing LUNs (I can't recall how free they were
>>>        at the
>>>        time). Looking at it now, though, it seems GPFS is doing the
>>>        opposite.
>>>        There's now a ~10% difference between the LUNs added and the
>>>        existing
>>>        LUNs (20% free vs 30% free) and GPFS is choosing to allocate new
>>>        writes
>>>        at a ratio of about 3:1 on the disks with *fewer* free blocks
>>>        than on
>>>        the disks with more free blocks. That's completely inconsistent with
>>>        what we saw when we initially added the disks which makes me
>>>        wonder how
>>>        GPFS is choosing to allocate new blocks (other than the obvious bits
>>>        about failure group, and replication factor). Could someone
>>>        explain (or
>>>        point me at a whitepaper) what factors GPFS uses when allocating
>>>        blocks,
>>>        particularly as it pertains to choosing one NSD over another
>>>        within the
>>>        same failure group.
>>> 
>>>        Thanks!
>>> 
>>>        -Aaron
>>> 
>>>        --
>>>        Aaron Knister
>>>        NASA Center for Climate Simulation (Code 606.2)
>>>        Goddard Space Flight Center
>>>        (301) 286-2776
>>>        _______________________________________________
>>>        gpfsug-discuss mailing list
>>>        gpfsug-discuss at spectrumscale.org
>>>        <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=QYsXVDOdNRcII7FPtAbCXEyYJzNSd_UXq8bmreALKxs&e=>
>>>        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>        <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=NSd3e2hwIKBCwSxsKe-GTf8EwJ6AkZQiTsRQZ73UH20&e=>
>>> 
>>>    _______________________________________________
>>>    gpfsug-discuss mailing list
>>>    gpfsug-discuss at spectrumscale.org
>>>    https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E&m=f89xvht1uMUzAcLpusakZb1snMOgweGu0KTkKp9oedI&s=NSd3e2hwIKBCwSxsKe-GTf8EwJ6AkZQiTsRQZ73UH20&e=
>>> 
>>>  
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with number
>>> 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> 
>> 
>> -- 
>> Aaron Knister
>> NASA Center for Climate Simulation (Code 606.2)
>> Goddard Space Flight Center
>> (301) 286-2776
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list