[gpfsug-discuss] No space left on device, but plenty of quota space for inodes and blocks
Rob Kudyba
rk3199 at columbia.edu
Thu Jun 6 21:45:30 BST 2024
Running GPFS 4.2.3 on a DDN GridScaler and users are getting the No space
left on device message when trying to write to a file. In /var/adm/ras/mmfs.log
the only recent errors are this:
2024-06-06_15:51:22.311-0400: mmcommon getContactNodes cluster failed.
Return code -1.
2024-06-06_15:51:22.311-0400: The previous error was detected on node
x.x.x.x (headnode).
2024-06-06_15:53:25.088-0400: mmcommon getContactNodes cluster failed.
Return code -1.
2024-06-06_15:53:25.088-0400: The previous error was detected on node
x.x.x.x (headnode).
according to
https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615
Check the preceding messages, and consult the earlier chapters of this
> document. A frequent cause for such errors is lack of space in /var.
We have plenty of space left.
/usr/lpp/mmfs/bin/mmlsdisk cluster
disk driver sector failure holds holds
storage
name type size group metadata data status
availability pool
------------ -------- ------ ----------- -------- ----- -------------
------------ ------------
S01_MDT200_1 nsd 4096 200 Yes No ready up
system
S01_MDT201_1 nsd 4096 201 Yes No ready up
system
S01_DAT0001_1 nsd 4096 100 No Yes ready up
data1
S01_DAT0002_1 nsd 4096 101 No Yes ready up
data1
S01_DAT0003_1 nsd 4096 100 No Yes ready up
data1
S01_DAT0004_1 nsd 4096 101 No Yes ready up
data1
S01_DAT0005_1 nsd 4096 100 No Yes ready up
data1
S01_DAT0006_1 nsd 4096 101 No Yes ready up
data1
S01_DAT0007_1 nsd 4096 100 No Yes ready up
data1
/usr/lpp/mmfs/bin/mmdf headnode
disk disk size failure holds holds free KB
free KB
name in KB group metadata data in full blocks
in fragments
--------------- ------------- -------- -------- ----- --------------------
-------------------
Disks in storage pool: system (Maximum disk size allowed is 14 TB)
S01_MDT200_1 1862270976 200 Yes No 969134848 ( 52%)
2948720 ( 0%)
S01_MDT201_1 1862270976 201 Yes No 969126144 ( 52%)
2957424 ( 0%)
------------- --------------------
-------------------
(pool total) 3724541952 1938260992 ( 52%)
5906144 ( 0%)
Disks in storage pool: data1 (Maximum disk size allowed is 578 TB)
S01_DAT0007_1 77510737920 100 No Yes 21080752128 ( 27%)
897723392 ( 1%)
S01_DAT0005_1 77510737920 100 No Yes 14507212800 ( 19%)
949412160 ( 1%)
S01_DAT0001_1 77510737920 100 No Yes 14503620608 ( 19%)
951327680 ( 1%)
S01_DAT0003_1 77510737920 100 No Yes 14509205504 ( 19%)
949340544 ( 1%)
S01_DAT0002_1 77510737920 101 No Yes 14504585216 ( 19%)
948377536 ( 1%)
S01_DAT0004_1 77510737920 101 No Yes 14503647232 ( 19%)
952892480 ( 1%)
S01_DAT0006_1 77510737920 101 No Yes 14504486912 ( 19%)
949072512 ( 1%)
------------- --------------------
-------------------
(pool total) 542575165440 108113510400 ( 20%)
6598146304 ( 1%)
============= ====================
===================
(data) 542575165440 108113510400 ( 20%)
6598146304 ( 1%)
(metadata) 3724541952 1938260992 ( 52%)
5906144 ( 0%)
============= ====================
===================
(total) 546299707392 110051771392 ( 22%)
6604052448 ( 1%)
Inode Information
-----------------
Total number of used inodes in all Inode spaces: 154807668
Total number of free inodes in all Inode spaces: 12964492
Total number of allocated inodes in all Inode spaces: 167772160
Total of Maximum number of inodes in all Inode spaces: 276971520
On the head node:
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda4 430G 216G 215G 51% /
devtmpfs 47G 0 47G 0% /dev
tmpfs 47G 0 47G 0% /dev/shm
tmpfs 47G 4.1G 43G 9% /run
tmpfs 47G 0 47G 0% /sys/fs/cgroup
/dev/sda1 504M 114M 365M 24% /boot
/dev/sda2 100M 9.9M 90M 10% /boot/efi
x.x.x.:/nfs-share 430G 326G 105G 76% /nfs-share
cluster 506T 405T 101T 81% /cluster
tmpfs 9.3G 0 9.3G 0% /run/user/443748
tmpfs 9.3G 0 9.3G 0% /run/user/547288
tmpfs 9.3G 0 9.3G 0% /run/user/551336
tmpfs 9.3G 0 9.3G 0% /run/user/547289
The login nodes have plenty of space in /var:
/dev/sda3 50G 8.7G 42G 18% /var
What else should we check? We are just at 81% on the GPFS mounted file
system but that should be enough for more space without these errors. Any
recommended service(s) that we can restart?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240606/e93e8c92/attachment.htm>
More information about the gpfsug-discuss
mailing list