[gpfsug-discuss] No space left on device, but plenty of quota space for inodes and blocks

Thu Jun 6 21:45:30 BST 2024

Running GPFS 4.2.3 on a DDN GridScaler and users are getting the No space
left on device message when trying to write to a file. In /var/adm/ras/mmfs.log
the only recent errors are this:

2024-06-06_15:51:22.311-0400: mmcommon getContactNodes cluster failed.
Return code -1.
2024-06-06_15:51:22.311-0400: The previous error was detected on node
x.x.x.x (headnode).
2024-06-06_15:53:25.088-0400: mmcommon getContactNodes cluster failed.
Return code -1.
2024-06-06_15:53:25.088-0400: The previous error was detected on node
x.x.x.x (headnode).

according to
https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615

Check the preceding messages, and consult the earlier chapters of this
> document. A frequent cause for such errors is lack of space in /var.

We have plenty of space left.

 /usr/lpp/mmfs/bin/mmlsdisk cluster
disk         driver   sector     failure holds    holds
       storage
name         type       size       group metadata data  status
 availability pool
------------ -------- ------ ----------- -------- ----- -------------
------------ ------------
S01_MDT200_1 nsd        4096         200 Yes      No    ready         up
        system
S01_MDT201_1 nsd        4096         201 Yes      No    ready         up
        system
S01_DAT0001_1 nsd        4096         100 No       Yes   ready         up
        data1
S01_DAT0002_1 nsd        4096         101 No       Yes   ready         up
        data1
S01_DAT0003_1 nsd        4096         100 No       Yes   ready         up
        data1
S01_DAT0004_1 nsd        4096         101 No       Yes   ready         up
        data1
S01_DAT0005_1 nsd        4096         100 No       Yes   ready         up
        data1
S01_DAT0006_1 nsd        4096         101 No       Yes   ready         up
        data1
S01_DAT0007_1 nsd        4096         100 No       Yes   ready         up
        data1

 /usr/lpp/mmfs/bin/mmdf headnode
disk                disk size  failure holds    holds              free KB
            free KB
name                    in KB    group metadata data        in full blocks
       in fragments
--------------- ------------- -------- -------- ----- --------------------
-------------------
Disks in storage pool: system (Maximum disk size allowed is 14 TB)
S01_MDT200_1       1862270976      200 Yes      No        969134848 ( 52%)
      2948720 ( 0%)
S01_MDT201_1       1862270976      201 Yes      No        969126144 ( 52%)
      2957424 ( 0%)
                -------------                         --------------------
-------------------
(pool total)       3724541952                            1938260992 ( 52%)
      5906144 ( 0%)

Disks in storage pool: data1 (Maximum disk size allowed is 578 TB)
S01_DAT0007_1     77510737920      100 No       Yes     21080752128 ( 27%)
    897723392 ( 1%)
S01_DAT0005_1     77510737920      100 No       Yes     14507212800 ( 19%)
    949412160 ( 1%)
S01_DAT0001_1     77510737920      100 No       Yes     14503620608 ( 19%)
    951327680 ( 1%)
S01_DAT0003_1     77510737920      100 No       Yes     14509205504 ( 19%)
    949340544 ( 1%)
S01_DAT0002_1     77510737920      101 No       Yes     14504585216 ( 19%)
    948377536 ( 1%)
S01_DAT0004_1     77510737920      101 No       Yes     14503647232 ( 19%)
    952892480 ( 1%)
S01_DAT0006_1     77510737920      101 No       Yes     14504486912 ( 19%)
    949072512 ( 1%)
                -------------                         --------------------
-------------------
(pool total)     542575165440                          108113510400 ( 20%)
   6598146304 ( 1%)

                =============                         ====================
===================
(data)           542575165440                          108113510400 ( 20%)
   6598146304 ( 1%)
(metadata)         3724541952                            1938260992 ( 52%)
      5906144 ( 0%)
                =============                         ====================
===================
(total)          546299707392                          110051771392 ( 22%)
   6604052448 ( 1%)

Inode Information
-----------------
Total number of used inodes in all Inode spaces:          154807668
Total number of free inodes in all Inode spaces:           12964492
Total number of allocated inodes in all Inode spaces:     167772160
Total of Maximum number of inodes in all Inode spaces:    276971520

On the head node:

df -h
Filesystem                Size  Used Avail Use% Mounted on
/dev/sda4                 430G  216G  215G  51% /
devtmpfs                   47G     0   47G   0% /dev
tmpfs                      47G     0   47G   0% /dev/shm
tmpfs                      47G  4.1G   43G   9% /run
tmpfs                      47G     0   47G   0% /sys/fs/cgroup
/dev/sda1                 504M  114M  365M  24% /boot
/dev/sda2                 100M  9.9M   90M  10% /boot/efi
x.x.x.:/nfs-share  430G  326G  105G  76% /nfs-share
cluster                      506T  405T  101T  81% /cluster
tmpfs                     9.3G     0  9.3G   0% /run/user/443748
tmpfs                     9.3G     0  9.3G   0% /run/user/547288
tmpfs                     9.3G     0  9.3G   0% /run/user/551336
tmpfs                     9.3G     0  9.3G   0% /run/user/547289

The login nodes have plenty of space in /var:
/dev/sda3        50G  8.7G   42G  18% /var

What else should we check? We are just at 81% on the GPFS mounted file
system but that should be enough for more space without these errors. Any
recommended service(s) that we can restart?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240606/e93e8c92/attachment.htm>