[gpfsug-discuss] Migration to separate metadata and data disks

Yuri L Volobuev volobuev at us.ibm.com
Tue Sep 6 20:06:32 BST 2016


The correct way to accomplish what you're looking for (in particular,
changing the fs-wide level of replication) is mmrestripefs -R.  This
command also takes care of moving data off disks now marked metadataOnly.

The restripe job hits an error trying to move blocks of the inode file,
i.e. before it gets to actual user data blocks.  Note that at this point
the metadata replication factor is still 2.  This suggests one of two
possibilities: (1) there isn't enough actual free space on the remaining
metadataOnly disks, (2) there isn't enough space in some failure groups to
allocate two replicas.

All of this assumes you're operating within a single storage pool.  If
multiple storage pools are in play, there are other possibilities.

'mmdf' output would be helpful in providing more helpful advice.  With the
information at hand, I can only suggest trying to accomplish the task in
two phases: (a) deallocated extra metadata replicas, by doing mmchfs -m 1 +
mmrestripefs -R (b) move metadata off SATA disks.  I do want to point out
that metadata replication is a highly recommended insurance policy to have
for your file system.  As with other kinds of insurance, you may or may not
need it, but if you do end up needing it, you'll be very glad you have it.
The costs, in terms of extra metadata space and performance overhead, are
very reasonable.

yuri




From:	Miroslav Bauer <bauer at cesnet.cz>
To:	gpfsug-discuss at spectrumscale.org,
Date:	09/01/2016 07:29 AM
Subject:	Re: [gpfsug-discuss] Migration to separate metadata and data
            disks
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



Yes, failure group id is exactly what I meant :). Unfortunately,
mmrestripefs with -R
behaves the same as with -r. I also believed that mmrestripefs -R is the
correct tool for
fixing the replication settings on inodes (according to manpages), but I
will try possible
solutions you and Marc suggested and let you know how it went.

Thank you,
--
Miroslav Bauer

On 09/01/2016 04:02 PM, Aaron Knister wrote:
> Oh! I think you've already provided the info I was looking for :) I
> thought that failGroup=3 meant there were 3 failure groups within the
> SSDs. I suspect that's not at all what you meant and that actually is
> the failure group of all of those disks. That I think explains what's
> going on-- there's only one failure group's worth of metadata-capable
> disks available and as such GPFS can't place the 2nd replica for
> existing files.
>
> Here's what I would suggest:
>
> - Create at least 2 failure groups within the SSDs
> - Put the default metadata replication factor back to 2
> - Run a restripefs -R to shuffle files around and restore the metadata
> replication factor of 2 to any files created while it was set to 1
>
> If you're not interested in replication for metadata then perhaps all
> you need to do is the mmrestripefs -R. I think that should
> un-replicate the file from the SATA disks leaving the copy on the SSDs.
>
> Hope that helps.
>
> -Aaron
>
> On 9/1/16 9:39 AM, Aaron Knister wrote:
>> By the way, I suspect the no space on device errors are because GPFS
>> believes for some reason that it is unable to maintain the metadata
>> replication factor of 2 that's likely set on all previously created
>> inodes.
>>
>> On 9/1/16 9:36 AM, Aaron Knister wrote:
>>> I must admit, I'm curious as to the reason you're dropping the
>>> replication factor from 2 down to 1. There are some serious advantages
>>> we've seen to having multiple metadata replicas, as far as error
>>> recovery is concerned.
>>>
>>> Could you paste an output of mmlsdisk for the filesystem?
>>>
>>> -Aaron
>>>
>>> On 9/1/16 9:30 AM, Miroslav Bauer wrote:
>>>> Hello,
>>>>
>>>> I have a GPFS 3.5 filesystem (fs1) and I'm trying to migrate the
>>>> filesystem metadata from state:
>>>> -m = 2 (default metadata replicas)
>>>> - SATA disks (dataAndMetadata, failGroup=1)
>>>> - SSDs (metadataOnly, failGroup=3)
>>>> to the desired state:
>>>> -m = 1
>>>> - SATA disks (dataOnly, failGroup=1)
>>>> - SSDs (metadataOnly, failGroup=3)
>>>>
>>>> I have done the following steps in the following order:
>>>> 1) change SATA disks to dataOnly (stanza file modifies the 'usage'
>>>> attribute only):
>>>> # mmchdisk fs1 change -F dataOnly_disks.stanza
>>>> Attention: Disk parameters were changed.
>>>>   Use the mmrestripefs command with the -r option to relocate data and
>>>> metadata.
>>>> Verifying file system configuration information ...
>>>> mmchdisk: Propagating the cluster configuration data to all
>>>>   affected nodes.  This is an asynchronous process.
>>>>
>>>> 2) change default metadata replicas number 2->1
>>>> # mmchfs fs1 -m 1
>>>>
>>>> 3) run mmrestripefs as suggested by output of 1)
>>>> # mmrestripefs fs1 -r
>>>> Scanning file system metadata, phase 1 ...
>>>> Error processing inodes.
>>>> No space left on device
>>>> mmrestripefs: Command failed.  Examine previous error messages to
>>>> determine cause.
>>>>
>>>> It is, however, still possible to create new files on the filesystem.
>>>> When I return one of the SATA disks as a dataAndMetadata disk, the
>>>> mmrestripefs
>>>> command stops complaining about No space left on device. Both df and
>>>> mmdf
>>>> say that there is enough space both for data (SATA) and metadata
>>>> (SSDs).
>>>> Does anyone have an idea why is it complaining?
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Miroslav Bauer
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>
>>>
>>
>


[attachment "smime.p7s" deleted by Yuri L Volobuev/Austin/IBM]
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160906/b279d79b/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160906/b279d79b/attachment-0002.gif>


More information about the gpfsug-discuss mailing list