[gpfsug-discuss] mmrestripefs "No space left on device"

John Hanks griznog at gmail.com
Thu Nov 2 18:14:44 GMT 2017


tsfindiconde tracked the file to user.quota, which somehow escaped my
previous attempt to "mv *.quota /elsewhere/" I've moved that now and
verified it is actually gone and will retry once the current restripe on
the sata0 pool is wrapped up.

jbh

On Thu, Nov 2, 2017 at 10:57 AM, Frederick Stock <stockf at us.ibm.com> wrote:

> Did you run the tsfindinode command to see where that file is located?
> Also, what does the mmdf show for your other pools notably the sas0 storage
> pool?
>
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821 <(720)%20430-8821>
> stockf at us.ibm.com
>
>
>
> From:        John Hanks <griznog at gmail.com>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        11/02/2017 01:17 PM
> Subject:        Re: [gpfsug-discuss] mmrestripefs "No space left on
> device"
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> We do have different amounts of space in the system pool which had the
> changes applied:
>
> [root at scg4-hn01 ~]# mmdf gsfs0 -P system
> disk                disk size  failure holds    holds              free
> KB             free KB
> name                    in KB    group metadata data        in full
> blocks        in fragments
> --------------- ------------- -------- -------- ----- --------------------
> -------------------
> Disks in storage pool: system (Maximum disk size allowed is 3.6 TB)
> VD000               377487360      100 Yes      No        143109120 (
> 38%)      35708688 ( 9%)
> DMD_NSD_804         377487360      100 Yes      No         79526144 (
> 21%)       2924584 ( 1%)
> VD002               377487360      100 Yes      No        143067136 (
> 38%)      35713888 ( 9%)
> DMD_NSD_802         377487360      100 Yes      No         79570432 (
> 21%)       2926672 ( 1%)
> VD004               377487360      100 Yes      No        143107584 (
> 38%)      35727776 ( 9%)
> DMD_NSD_805         377487360      200 Yes      No         79555584 (
> 21%)       2940040 ( 1%)
> VD001               377487360      200 Yes      No        142964992 (
> 38%)      35805384 ( 9%)
> DMD_NSD_803         377487360      200 Yes      No         79580160 (
> 21%)       2919560 ( 1%)
> VD003               377487360      200 Yes      No        143132672 (
> 38%)      35764200 ( 9%)
> DMD_NSD_801         377487360      200 Yes      No         79550208 (
> 21%)       2915232 ( 1%)
>                 -------------                         --------------------
> -------------------
> (pool total)       3774873600                            1113164032 (
> 29%)     193346024 ( 5%)
>
>
> and mmldisk shows that there is a problem with replication:
>
> ...
> Number of quorum disks: 5
> Read quorum value:      3
> Write quorum value:     3
> Attention: Due to an earlier configuration change the file system
> is no longer properly replicated.
>
>
> I thought a 'mmrestripe -r' would fix this, not that I have to fix it
> first before restriping?
>
> jbh
>
>
> On Thu, Nov 2, 2017 at 9:45 AM, Frederick Stock <*stockf at us.ibm.com*
> <stockf at us.ibm.com>> wrote:
> Assuming you are replicating data and metadata have you confirmed that all
> failure groups have the same free space?  That is could it be that one of
> your failure groups has less space than the others?  You can verify this
> with the output of mmdf and look at the NSD sizes and space available.
>
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | *720-430-8821* <(720)%20430-8821>
> *stockf at us.ibm.com* <stockf at us.ibm.com>
>
>
>
> From:        John Hanks <*griznog at gmail.com* <griznog at gmail.com>>
> To:        gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org*
> <gpfsug-discuss at spectrumscale.org>>
> Date:        11/02/2017 12:20 PM
> Subject:        Re: [gpfsug-discuss] mmrestripefs "No space left on
> device"
> Sent by:        *gpfsug-discuss-bounces at spectrumscale.org*
> <gpfsug-discuss-bounces at spectrumscale.org>
> ------------------------------
>
>
>
> Addendum to last message:
>
> We haven't upgraded recently as far as I know (I just inherited this a
> couple of months ago.) but am planning an outage soon to upgrade from
> 4.2.0-4 to 4.2.3-5.
>
> My growing collection of output files generally contain something like
>
> This inode list was generated in the Parallel Inode Traverse on Thu Nov  2
> 08:34:22 2017
> INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID
> MEMO(INODE_FLAGS FILE_TYPE [ERROR])
>  53506        0:0        0           1                 0
> illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device
>
> With that inode varying slightly.
>
> jbh
>
> On Thu, Nov 2, 2017 at 8:55 AM, Scott Fadden <*sfadden at us.ibm.com*
> <sfadden at us.ibm.com>> wrote:
> Sorry just reread as I hit send and saw this was mmrestripe, in my case it
> was mmdeledisk.
>
> Did you try running the command on just one pool. Or using -B instead?
>
> What is the file it is complaining about in "/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711"
> ?
>
> Looks like it could be related to the maxfeaturelevel of the cluster. Have
> you recently upgraded? Is everything up to the same level?
>
> Scott Fadden
> Spectrum Scale - Technical Marketing
> Phone: *(503) 880-5833* <(503)%20880-5833>
> *sfadden at us.ibm.com* <sfadden at us.ibm.com>
> *http://www.ibm.com/systems/storage/spectrum/scale*
> <http://www.ibm.com/systems/storage/spectrum/scale>
>
>
> ----- Original message -----
> From: Scott Fadden/Portland/IBM
> To: *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>
> Cc: *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device"
> Date: Thu, Nov 2, 2017 8:44 AM
>
> I opened a defect on this the other day, in my case it was an incorrect
> error message. What it meant to say was,"The pool is not empty." Are you
> trying to remove the last disk in a pool? If so did you empty the pool with
> a MIGRATE policy first?
>
>
> Scott Fadden
> Spectrum Scale - Technical Marketing
> Phone: *(503) 880-5833* <(503)%20880-5833>
> *sfadden at us.ibm.com* <sfadden at us.ibm.com>
> *http://www.ibm.com/systems/storage/spectrum/scale*
> <http://www.ibm.com/systems/storage/spectrum/scale>
>
>
> ----- Original message -----
> From: John Hanks <*griznog at gmail.com* <griznog at gmail.com>>
> Sent by: *gpfsug-discuss-bounces at spectrumscale.org*
> <gpfsug-discuss-bounces at spectrumscale.org>
> To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org*
> <gpfsug-discuss at spectrumscale.org>>
> Cc:
> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device"
> Date: Thu, Nov 2, 2017 8:34 AM
>
> We have no snapshots ( they were the first to go when we initially hit the
> full metadata NSDs).
>
> I've increased quotas so that no filesets have hit a space quota.
>
> Verified that there are no inode quotas anywhere.
>
> mmdf shows the least amount of free space on any nsd to be 9% free.
>
> Still getting this error:
>
> [root at scg-gs0 ~]# mmrestripefs gsfs0 -r -N scg-gs0,scg-gs1,scg-gs2,scg-gs3
> Scanning file system metadata, phase 1 ...
> Scan completed successfully.
> Scanning file system metadata, phase 2 ...
> Scanning file system metadata for sas0 storage pool
> Scanning file system metadata for sata0 storage pool
> Scan completed successfully.
> Scanning file system metadata, phase 3 ...
> Scan completed successfully.
> Scanning file system metadata, phase 4 ...
> Scan completed successfully.
> Scanning user file metadata ...
> Error processing user file metadata.
> No space left on device
> Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711' on
> scg-gs0 for inodes with broken disk addresses or failures.
> mmrestripefs: Command failed. Examine previous error messages to determine
> cause.
>
> I should note too that this fails almost immediately, far to quickly to
> fill up any location it could be trying to write to.
>
> jbh
>
> On Thu, Nov 2, 2017 at 7:57 AM, David Johnson <*david_johnson at brown.edu*
> <david_johnson at brown.edu>> wrote:
> One thing that may be relevant is if you have snapshots, depending on your
> release level,
> inodes in the snapshot may considered immutable, and will not be
> migrated.  Once the snapshots
> have been deleted, the inodes are freed up and you won’t see the (somewhat
> misleading) message
> about no space.
>
>  — ddj
> Dave Johnson
> Brown University
>
> On Nov 2, 2017, at 10:43 AM, John Hanks <*griznog at gmail.com*
> <griznog at gmail.com>> wrote:
> Thanks all for the suggestions.
>
> Having our metadata NSDs fill up was what prompted this exercise, but
> space was previously feed up on those by switching them from metadata+data
> to metadataOnly and using a policy to migrate files out of that pool. So
> these now have about 30% free space (more if you include fragmented space).
> The restripe attempt is just to make a final move of any remaining data off
> those devices. All the NSDs now have free space on them.
>
> df -i shows inode usage at about 84%, so plenty of free inodes for the
> filesystem as a whole.
>
> We did have old  .quota files laying around but removing them didn't have
> any impact.
>
> mmlsfileset fs -L -i is taking a while to complete, I'll let it simmer
> while getting to work.
>
> mmrepquota does show about a half-dozen filesets that have hit their quota
> for space (we don't set quotas on inodes). Once I'm settled in this morning
> I'll try giving them a little extra space and see what happens.
>
> jbh
>
>
> On Thu, Nov 2, 2017 at 4:19 AM, Oesterlin, Robert <
> *Robert.Oesterlin at nuance.com* <Robert.Oesterlin at nuance.com>> wrote:
> One thing that I’ve run into before is that on older file systems you had
> the “*.quota” files in the file system root. If you upgraded the file
> system to a newer version (so these files aren’t used) - There was a bug at
> one time where these didn’t get properly migrated during a restripe.
> Solution was to just remove them
>
>
>
>
>
> Bob Oesterlin
>
> Sr Principal Storage Engineer, Nuance
>
>
>
> *From: *<*gpfsug-discuss-bounces at spectrumscale.org*
> <gpfsug-discuss-bounces at spectrumscale.org>> on behalf of John Hanks <
> *griznog at gmail.com* <griznog at gmail.com>>
> *Reply-To: *gpfsug main discussion list <
> *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>>
> *Date: *Wednesday, November 1, 2017 at 5:55 PM
> *To: *gpfsug <*gpfsug-discuss at spectrumscale.org*
> <gpfsug-discuss at spectrumscale.org>>
> *Subject: *[EXTERNAL] [gpfsug-discuss] mmrestripefs "No space left on
> device"
>
>
>
> Hi all,
>
>
>
> I'm trying to do a restripe after setting some nsds to metadataOnly and I
> keep running into this error:
>
>
>
> Scanning user file metadata ...
>
>    0.01 % complete on Wed Nov  1 15:36:01 2017  (     40960 inodes with
> total     531689 MB data processed)
>
> Error processing user file metadata.
>
> Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779708' on
> scg-gs0 for inodes with broken disk addresses or failures.
>
> mmrestripefs: Command failed. Examine previous error messages to determine
> cause.
>
>
>
> The file it points to says:
>
>
>
> This inode list was generated in the Parallel Inode Traverse on Wed Nov  1
> 15:36:06 2017
>
> INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID
> MEMO(INODE_FLAGS FILE_TYPE [ERROR])
>
>  53504        0:0        0           1                 0
> illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device
>
>
>
>
>
> /var on the node I am running this on has > 128 GB free, all the NSDs have
> plenty of free space, the filesystem being restriped has plenty of free
> space and if I watch the node while running this no filesystem on it even
> starts to get full. Could someone tell me where mmrestripefs is attempting
> to write and/or how to point it at a different location?
>
>
>
> Thanks,
>
>
>
> jbh
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=>
> *http://gpfsug.org/mailman*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=>
> /listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=>
> *http://gpfsug.org/mailman*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=>
> /listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=>
>
> *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=WvredVor59NfZe-GxK5qa27t7_OT-zg1uOs__CSYmJM&e=>
>
> *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=WvredVor59NfZe-GxK5qa27t7_OT-zg1uOs__CSYmJM&e=>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=yDRpuvz3LOTwvP2pkIJEU7NWUxwMOcYHyXBRoWCPF-s&e=>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.
> org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_
> iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=
> XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=
> yDRpuvz3LOTwvP2pkIJEU7NWUxwMOcYHyXBRoWCPF-s&e=
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171102/85b637c5/attachment-0002.htm>


More information about the gpfsug-discuss mailing list