<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Hello Yuri,</p>
here goes the actual mmdf output of filesystem in question:<br>
disk disk size failure holds
holds free free<br>
name group metadata data in full
blocks in fragments<br>
--------------- ------------- -------- -------- -----
-------------------- -------------------<br>
Disks in storage pool: system (Maximum disk size allowed is 40 TB)<br>
dcsh_10C 5T 1 Yes Yes 1.661T
( 33%) 68.48G ( 1%) <br>
dcsh_10D 6.828T 1 Yes Yes 2.809T
( 41%) 83.82G ( 1%) <br>
dcsh_11C 5T 1 Yes Yes 1.659T
( 33%) 69.01G ( 1%) <br>
dcsh_11D 6.828T 1 Yes Yes 2.81T
( 41%) 83.33G ( 1%) <br>
dcsh_12C 5T 1 Yes Yes 1.659T
( 33%) 69.48G ( 1%) <br>
dcsh_12D 6.828T 1 Yes Yes 2.807T
( 41%) 83.14G ( 1%) <br>
dcsh_13C 5T 1 Yes Yes 1.659T
( 33%) 69.35G ( 1%) <br>
dcsh_13D 6.828T 1 Yes Yes 2.81T
( 41%) 82.97G ( 1%) <br>
dcsh_14C 5T 1 Yes Yes 1.66T
( 33%) 69.06G ( 1%) <br>
dcsh_14D 6.828T 1 Yes Yes 2.811T
( 41%) 83.61G ( 1%) <br>
dcsh_15C 5T 1 Yes Yes 1.658T
( 33%) 69.38G ( 1%) <br>
dcsh_15D 6.828T 1 Yes Yes 2.814T
( 41%) 83.69G ( 1%) <br>
dcsd_15D 6.828T 1 Yes Yes 2.811T
( 41%) 83.98G ( 1%) <br>
dcsd_15C 5T 1 Yes Yes 1.66T
( 33%) 68.66G ( 1%) <br>
dcsd_14D 6.828T 1 Yes Yes 2.81T
( 41%) 84.18G ( 1%) <br>
dcsd_14C 5T 1 Yes Yes 1.659T
( 33%) 69.43G ( 1%) <br>
dcsd_13D 6.828T 1 Yes Yes 2.81T
( 41%) 83.27G ( 1%) <br>
dcsd_13C 5T 1 Yes Yes 1.66T
( 33%) 69.1G ( 1%) <br>
dcsd_12D 6.828T 1 Yes Yes 2.81T
( 41%) 83.61G ( 1%) <br>
dcsd_12C 5T 1 Yes Yes 1.66T
( 33%) 69.42G ( 1%) <br>
dcsd_11D 6.828T 1 Yes Yes 2.811T
( 41%) 83.59G ( 1%) <br>
dcsh_10B 5T 1 Yes Yes 1.633T
( 33%) 76.97G ( 2%) <br>
dcsh_11A 5T 1 Yes Yes 1.632T
( 33%) 77.29G ( 2%) <br>
dcsh_11B 5T 1 Yes Yes 1.633T
( 33%) 76.73G ( 1%) <br>
dcsh_12A 5T 1 Yes Yes 1.634T
( 33%) 76.49G ( 1%) <br>
dcsd_11C 5T 1 Yes Yes 1.66T
( 33%) 69.25G ( 1%) <br>
dcsd_10D 6.828T 1 Yes Yes 2.811T
( 41%) 83.39G ( 1%) <br>
dcsh_10A 5T 1 Yes Yes 1.633T
( 33%) 77.06G ( 2%) <br>
dcsd_10C 5T 1 Yes Yes 1.66T
( 33%) 69.83G ( 1%) <br>
dcsd_15B 5T 1 Yes Yes 1.635T
( 33%) 76.52G ( 1%) <br>
dcsd_15A 5T 1 Yes Yes 1.634T
( 33%) 76.24G ( 1%) <br>
dcsd_14B 5T 1 Yes Yes 1.634T
( 33%) 76.31G ( 1%) <br>
dcsd_14A 5T 1 Yes Yes 1.634T
( 33%) 76.23G ( 1%) <br>
dcsd_13B 5T 1 Yes Yes 1.634T
( 33%) 76.13G ( 1%) <br>
dcsd_13A 5T 1 Yes Yes 1.634T
( 33%) 76.22G ( 1%) <br>
dcsd_12B 5T 1 Yes Yes 1.635T
( 33%) 77.49G ( 2%) <br>
dcsd_12A 5T 1 Yes Yes 1.633T
( 33%) 77.13G ( 2%) <br>
dcsd_11B 5T 1 Yes Yes 1.633T
( 33%) 76.86G ( 2%) <br>
dcsd_11A 5T 1 Yes Yes 1.632T
( 33%) 76.22G ( 1%) <br>
dcsd_10B 5T 1 Yes Yes 1.633T
( 33%) 76.79G ( 1%) <br>
dcsd_10A 5T 1 Yes Yes 1.633T
( 33%) 77.21G ( 2%) <br>
dcsh_15B 5T 1 Yes Yes 1.635T
( 33%) 76.04G ( 1%) <br>
dcsh_15A 5T 1 Yes Yes 1.634T
( 33%) 76.84G ( 2%) <br>
dcsh_14B 5T 1 Yes Yes 1.635T
( 33%) 76.75G ( 1%) <br>
dcsh_14A 5T 1 Yes Yes 1.633T
( 33%) 76.05G ( 1%) <br>
dcsh_13B 5T 1 Yes Yes 1.634T
( 33%) 76.35G ( 1%) <br>
dcsh_13A 5T 1 Yes Yes 1.634T
( 33%) 76.68G ( 1%) <br>
dcsh_12B 5T 1 Yes Yes 1.635T
( 33%) 76.74G ( 1%) <br>
ssd_5_5 80G 3 Yes No 22.31G
( 28%) 7.155G ( 9%) <br>
ssd_4_4 80G 3 Yes No 22.21G
( 28%) 7.196G ( 9%) <br>
ssd_3_3 80G 3 Yes No 22.2G
( 28%) 7.239G ( 9%) <br>
ssd_2_2 80G 3 Yes No 22.24G
( 28%) 7.146G ( 9%) <br>
ssd_1_1 80G 3 Yes No 22.29G
( 28%) 7.134G ( 9%) <br>
-------------
-------------------- -------------------<br>
(pool total) 262.3T 92.96T
( 35%) 3.621T ( 1%)<br>
<br>
Disks in storage pool: maid4 (Maximum disk size allowed is 466 TB)<br>
...<dataOnly disks>...<br>
-------------
-------------------- -------------------<br>
(pool total) 291T 126.5T
( 43%) 562.6G ( 0%)<br>
<br>
Disks in storage pool: maid5 (Maximum disk size allowed is 466 TB)<br>
...<dataOnly disks>...<br>
-------------
-------------------- -------------------<br>
(pool total) 436.6T 120.8T
( 28%) 25.23G ( 0%)<br>
<br>
Disks in storage pool: maid6 (Maximum disk size allowed is 466 TB)<br>
...<dataOnly disks>....<br>
-------------
-------------------- -------------------<br>
(pool total) 582.1T 358.7T
( 62%) 9.458G ( 0%)<br>
<br>
=============
==================== ===================<br>
(data) 1.535P 698.9T
( 44%) 4.17T ( 0%)<br>
(metadata) 262.3T 92.96T
( 35%) 3.621T ( 1%)<br>
=============
==================== ===================<br>
(total) 1.535P 699T
( 44%) 4.205T ( 0%)<br>
<br>
Inode Information<br>
-----------------<br>
Number of used inodes: 79607225<br>
Number of free inodes: 82340423<br>
Number of allocated inodes: 161947648<br>
Maximum number of inodes: 1342177280<br>
<br>
I have a smaller testing FS with the same setup (with plenty of free
space),<br>
and the actual sequence of commands that worked for me was:<br>
mmchfs fs1 -m1<br>
mmrestripefs fs1 -R<br>
mmrestripefs fs1 -b<br>
mmchdisk fs1 change -F ~/nsd_metadata_test (dataAndMetadata ->
dataOnly)<br>
mmrestripefs fs1 -r<br>
<br>
Could you please evaluate more on the performance overhead with
having metadata<br>
on SSD+SATA? Are the read operations automatically directed to
faster disks by GPFS?<br>
Is each write operation waiting for write to be finished by SATA
disks?<br>
<br>
Thank you,<br>
<pre class="moz-signature" cols="72">--
Miroslav Bauer</pre>
<div class="moz-cite-prefix">On 09/06/2016 09:06 PM, Yuri L Volobuev
wrote:<br>
</div>
<blockquote
cite="mid:OFAA143E38.7F7DEBEA-ON88258026.0067D66E-88258026.0068F7E7@notes.na.collabserv.com"
type="cite">
<p>The correct way to accomplish what you're looking for (in
particular, changing the fs-wide level of replication) is
mmrestripefs -R. This command also takes care of moving data off
disks now marked metadataOnly. <br>
<br>
The restripe job hits an error trying to move blocks of the
inode file, i.e. before it gets to actual user data blocks. Note
that at this point the metadata replication factor is still 2.
This suggests one of two possibilities: (1) there isn't enough
actual free space on the remaining metadataOnly disks, (2) there
isn't enough space in some failure groups to allocate two
replicas.<br>
<br>
All of this assumes you're operating within a single storage
pool. If multiple storage pools are in play, there are other
possibilities.<br>
<br>
'mmdf' output would be helpful in providing more helpful advice.
With the information at hand, I can only suggest trying to
accomplish the task in two phases: (a) deallocated extra
metadata replicas, by doing mmchfs -m 1 + mmrestripefs -R (b)
move metadata off SATA disks. I do want to point out that
metadata replication is a highly recommended insurance policy to
have for your file system. As with other kinds of insurance, you
may or may not need it, but if you do end up needing it, you'll
be very glad you have it. The costs, in terms of extra metadata
space and performance overhead, are very reasonable.<br>
<br>
yuri<br>
<br>
<br>
<font color="#424282">Miroslav Bauer ---09/01/2016 07:29:06
AM---Yes, failure group id is exactly what I meant :).
Unfortunately, mmrestripefs with -R</font><br>
<br>
<font color="#5F5F5F" size="2">From: </font><font size="2">Miroslav
Bauer <a class="moz-txt-link-rfc2396E" href="mailto:bauer@cesnet.cz"><bauer@cesnet.cz></a></font><br>
<font color="#5F5F5F" size="2">To: </font><font size="2"><a class="moz-txt-link-abbreviated" href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>,
</font><br>
<font color="#5F5F5F" size="2">Date: </font><font size="2">09/01/2016
07:29 AM</font><br>
<font color="#5F5F5F" size="2">Subject: </font><font size="2">Re:
[gpfsug-discuss] Migration to separate metadata and data disks</font><br>
<font color="#5F5F5F" size="2">Sent by: </font><font size="2"><a class="moz-txt-link-abbreviated" href="mailto:gpfsug-discuss-bounces@spectrumscale.org">gpfsug-discuss-bounces@spectrumscale.org</a></font><br>
</p>
<hr style="color:#8091A5; " align="left" size="2"
noshade="noshade" width="100%"><br>
<br>
<br>
<tt>Yes, failure group id is exactly what I meant :).
Unfortunately, <br>
mmrestripefs with -R<br>
behaves the same as with -r. I also believed that mmrestripefs
-R is the <br>
correct tool for<br>
fixing the replication settings on inodes (according to
manpages), but I <br>
will try possible<br>
solutions you and Marc suggested and let you know how it went.<br>
<br>
Thank you,<br>
--<br>
Miroslav Bauer<br>
<br>
On 09/01/2016 04:02 PM, Aaron Knister wrote:<br>
> Oh! I think you've already provided the info I was looking
for :) I <br>
> thought that failGroup=3 meant there were 3 failure groups
within the <br>
> SSDs. I suspect that's not at all what you meant and that
actually is <br>
> the failure group of all of those disks. That I think
explains what's <br>
> going on-- there's only one failure group's worth of
metadata-capable <br>
> disks available and as such GPFS can't place the 2nd
replica for <br>
> existing files.<br>
><br>
> Here's what I would suggest:<br>
><br>
> - Create at least 2 failure groups within the SSDs<br>
> - Put the default metadata replication factor back to 2<br>
> - Run a restripefs -R to shuffle files around and restore
the metadata <br>
> replication factor of 2 to any files created while it was
set to 1<br>
><br>
> If you're not interested in replication for metadata then
perhaps all <br>
> you need to do is the mmrestripefs -R. I think that should
<br>
> un-replicate the file from the SATA disks leaving the copy
on the SSDs.<br>
><br>
> Hope that helps.<br>
><br>
> -Aaron<br>
><br>
> On 9/1/16 9:39 AM, Aaron Knister wrote:<br>
>> By the way, I suspect the no space on device errors are
because GPFS<br>
>> believes for some reason that it is unable to maintain
the metadata<br>
>> replication factor of 2 that's likely set on all
previously created <br>
>> inodes.<br>
>><br>
>> On 9/1/16 9:36 AM, Aaron Knister wrote:<br>
>>> I must admit, I'm curious as to the reason you're
dropping the<br>
>>> replication factor from 2 down to 1. There are some
serious advantages<br>
>>> we've seen to having multiple metadata replicas, as
far as error<br>
>>> recovery is concerned.<br>
>>><br>
>>> Could you paste an output of mmlsdisk for the
filesystem?<br>
>>><br>
>>> -Aaron<br>
>>><br>
>>> On 9/1/16 9:30 AM, Miroslav Bauer wrote:<br>
>>>> Hello,<br>
>>>><br>
>>>> I have a GPFS 3.5 filesystem (fs1) and I'm
trying to migrate the<br>
>>>> filesystem metadata from state:<br>
>>>> -m = 2 (default metadata replicas)<br>
>>>> - SATA disks (dataAndMetadata, failGroup=1)<br>
>>>> - SSDs (metadataOnly, failGroup=3)<br>
>>>> to the desired state:<br>
>>>> -m = 1<br>
>>>> - SATA disks (dataOnly, failGroup=1)<br>
>>>> - SSDs (metadataOnly, failGroup=3)<br>
>>>><br>
>>>> I have done the following steps in the
following order:<br>
>>>> 1) change SATA disks to dataOnly (stanza file
modifies the 'usage'<br>
>>>> attribute only):<br>
>>>> # mmchdisk fs1 change -F dataOnly_disks.stanza<br>
>>>> Attention: Disk parameters were changed.<br>
>>>> Use the mmrestripefs command with the -r
option to relocate data and<br>
>>>> metadata.<br>
>>>> Verifying file system configuration information
...<br>
>>>> mmchdisk: Propagating the cluster configuration
data to all<br>
>>>> affected nodes. This is an asynchronous
process.<br>
>>>><br>
>>>> 2) change default metadata replicas number
2->1<br>
>>>> # mmchfs fs1 -m 1<br>
>>>><br>
>>>> 3) run mmrestripefs as suggested by output of
1)<br>
>>>> # mmrestripefs fs1 -r<br>
>>>> Scanning file system metadata, phase 1 ...<br>
>>>> Error processing inodes.<br>
>>>> No space left on device<br>
>>>> mmrestripefs: Command failed. Examine previous
error messages to<br>
>>>> determine cause.<br>
>>>><br>
>>>> It is, however, still possible to create new
files on the filesystem.<br>
>>>> When I return one of the SATA disks as a
dataAndMetadata disk, the<br>
>>>> mmrestripefs<br>
>>>> command stops complaining about No space left
on device. Both df and<br>
>>>> mmdf<br>
>>>> say that there is enough space both for data
(SATA) and metadata <br>
>>>> (SSDs).<br>
>>>> Does anyone have an idea why is it complaining?<br>
>>>><br>
>>>> Thanks,<br>
>>>><br>
>>>> -- <br>
>>>> Miroslav Bauer<br>
>>>><br>
>>>><br>
>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> gpfsug-discuss mailing list<br>
>>>> gpfsug-discuss at spectrumscale.org<br>
>>>> </tt><tt><a moz-do-not-send="true"
href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></tt><tt><br>
>>>><br>
>>><br>
>><br>
><br>
<br>
<br>
[attachment "smime.p7s" deleted by Yuri L Volobuev/Austin/IBM]
_______________________________________________<br>
gpfsug-discuss mailing list<br>
gpfsug-discuss at spectrumscale.org<br>
</tt><tt><a moz-do-not-send="true"
href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a></tt><tt><br>
</tt><br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
<a class="moz-txt-link-freetext" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a>
</pre>
</blockquote>
<br>
</body>
</html>