<div dir="ltr">Running GPFS 4.2.3 on a DDN GridScaler and users are getting the <font face="monospace">No space left on device</font> message when trying to write to a file. In <font face="monospace">/var/adm/ras/mmfs.log </font>the only recent errors are this:<div><br></div><div><font face="monospace">2024-06-06_15:51:22.311-0400: mmcommon getContactNodes cluster failed. Return code -1.<br>2024-06-06_15:51:22.311-0400: The previous error was detected on node x.x.x.x (headnode).<br>2024-06-06_15:53:25.088-0400: mmcommon getContactNodes cluster failed. Return code -1.<br>2024-06-06_15:53:25.088-0400: The previous error was detected on node x.x.x.x (headnode).</font><br></div><div><br></div><div>according to <a href="https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615">https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615</a> <br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Check the preceding messages, and consult the earlier chapters of this document. A frequent cause for such errors is lack of space in <font face="monospace">/var</font>.</blockquote><div><br></div><div>We have plenty of space left.</div><div><br></div><div><font face="monospace"> /usr/lpp/mmfs/bin/mmlsdisk cluster<br>disk         driver   sector     failure holds    holds                            storage<br>name         type       size       group metadata data  status        availability pool<br>------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------<br>S01_MDT200_1 nsd        4096         200 Yes      No    ready         up           system       <br>S01_MDT201_1 nsd        4096         201 Yes      No    ready         up           system       <br>S01_DAT0001_1 nsd        4096         100 No       Yes   ready         up           data1        <br>S01_DAT0002_1 nsd        4096         101 No       Yes   ready         up           data1        <br>S01_DAT0003_1 nsd        4096         100 No       Yes   ready         up           data1        <br>S01_DAT0004_1 nsd        4096         101 No       Yes   ready         up           data1        <br>S01_DAT0005_1 nsd        4096         100 No       Yes   ready         up           data1        <br>S01_DAT0006_1 nsd        4096         101 No       Yes   ready         up           data1        <br>S01_DAT0007_1 nsd        4096         100 No       Yes   ready         up           data1  </font><br></div><div><br></div><div><font face="monospace"> /usr/lpp/mmfs/bin/mmdf headnode <br>disk                disk size  failure holds    holds              free KB             free KB<br>name                    in KB    group metadata data        in full blocks        in fragments<br>--------------- ------------- -------- -------- ----- -------------------- -------------------<br>Disks in storage pool: system (Maximum disk size allowed is 14 TB)<br>S01_MDT200_1       1862270976      200 Yes      No        969134848 ( 52%)       2948720 ( 0%) <br>S01_MDT201_1       1862270976      201 Yes      No        969126144 ( 52%)       2957424 ( 0%) <br>                -------------                         -------------------- -------------------<br>(pool total)       3724541952                            1938260992 ( 52%)       5906144 ( 0%)<br><br>Disks in storage pool: data1 (Maximum disk size allowed is 578 TB)<br>S01_DAT0007_1     77510737920      100 No       Yes     21080752128 ( 27%)     897723392 ( 1%) <br>S01_DAT0005_1     77510737920      100 No       Yes     14507212800 ( 19%)     949412160 ( 1%) <br>S01_DAT0001_1     77510737920      100 No       Yes     14503620608 ( 19%)     951327680 ( 1%) <br>S01_DAT0003_1     77510737920      100 No       Yes     14509205504 ( 19%)     949340544 ( 1%) <br>S01_DAT0002_1     77510737920      101 No       Yes     14504585216 ( 19%)     948377536 ( 1%) <br>S01_DAT0004_1     77510737920      101 No       Yes     14503647232 ( 19%)     952892480 ( 1%) <br>S01_DAT0006_1     77510737920      101 No       Yes     14504486912 ( 19%)     949072512 ( 1%) <br>                -------------                         -------------------- -------------------<br>(pool total)     542575165440                          108113510400 ( 20%)    6598146304 ( 1%)<br><br>                =============                         ==================== ===================<br>(data)           542575165440                          108113510400 ( 20%)    6598146304 ( 1%)<br>(metadata)         3724541952                            1938260992 ( 52%)       5906144 ( 0%)<br>                =============                         ==================== ===================<br>(total)          546299707392                          110051771392 ( 22%)    6604052448 ( 1%)<br><br>Inode Information<br>-----------------<br>Total number of used inodes in all Inode spaces:          154807668<br>Total number of free inodes in all Inode spaces:           12964492<br>Total number of allocated inodes in all Inode spaces:     167772160<br>Total of Maximum number of inodes in all Inode spaces:    276971520</font><br></div><div><font face="monospace"><br></font></div><div><font face="arial, sans-serif">On the head node:</font></div><div><font face="monospace"><br></font></div><div><font face="monospace">df -h<br>Filesystem                Size  Used Avail Use% Mounted on<br>/dev/sda4                 430G  216G  215G  51% /<br>devtmpfs                   47G     0   47G   0% /dev<br>tmpfs                      47G     0   47G   0% /dev/shm<br>tmpfs                      47G  4.1G   43G   9% /run<br>tmpfs                      47G     0   47G   0% /sys/fs/cgroup<br>/dev/sda1                 504M  114M  365M  24% /boot<br>/dev/sda2                 100M  9.9M   90M  10% /boot/efi<br>x.x.x.:/nfs-share  430G  326G  105G  76% /nfs-share<br>cluster                      506T  405T  101T  81% /cluster<br>tmpfs                     9.3G     0  9.3G   0% /run/user/443748<br>tmpfs                     9.3G     0  9.3G   0% /run/user/547288<br>tmpfs                     9.3G     0  9.3G   0% /run/user/551336<br>tmpfs                     9.3G     0  9.3G   0% /run/user/547289<br></font></div><div><font face="monospace"><br></font></div><div><font face="arial, sans-serif">The login nodes have plenty of space in </font><font face="monospace">/var:</font></div><div><font face="monospace">/dev/sda3        50G  8.7G   42G  18% /var<br></font></div><div><font face="monospace"><br></font></div><div><font face="arial, sans-serif">What else should we check? We are just at 81% on the GPFS mounted file system but that should be enough for more space without these errors. Any recommended service(s) that we can restart?</font></div><div><br></div></div>