[gpfsug-discuss] mmhealth - where is the info hiding?
Buterbaugh, Kevin L
Kevin.Buterbaugh at Vanderbilt.Edu
Thu Jul 19 23:23:06 BST 2018
Hi Valdis,
Is this what you’re looking for (from an IBMer in response to another question a few weeks back)?
assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings:
# mmhealth thresholds list
### Threshold Rules ###
rule_name metric error warn direction filterBy groupBy sensitivity
--------------------------------------------------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300
MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300
DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300
MemFree_Rule mem_memfree 50000 100000 low node 300
# mmhealth thresholds delete MetaDataCapUtil_Rule
The rule(s) was(were) deleted successfully
# mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name
# mmhealth thresholds list
### Threshold Rules ###
rule_name metric error warn direction filterBy groupBy sensitivity --------------------------------------------------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300
MemFree_Rule mem_memfree 50000 100000 low node 300
DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300
MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300
Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633
On Jul 19, 2018, at 4:25 PM, valdis.kletnieks at vt.edu<mailto:valdis.kletnieks at vt.edu> wrote:
So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on
one thing..
Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which
cleaned out a bunch of other long-past events that were "stuck" as failed /
degraded even though they were corrected days/weeks ago - keep this in mind as
you read on....
# mmhealth cluster show
Component Total Failed Degraded Healthy Other
-------------------------------------------------------------------------------------
NODE 10 0 0 10 0
GPFS 10 0 0 10 0
NETWORK 10 0 0 10 0
FILESYSTEM 1 0 1 0 0
DISK 102 0 0 102 0
CES 4 0 0 4 0
GUI 1 0 0 1 0
PERFMON 10 0 0 10 0
THRESHOLD 10 0 0 10 0
Great. One hit for 'degraded' filesystem.
# mmhealth node show --unhealthy -N all
(skipping all the nodes that show healthy)
Node name: arnsd3-vtc.nis.internal
Node status: HEALTHY
Status Change: 21 hours ago
Component Status Status Change Reasons
-----------------------------------------------------------------------------------
FILESYSTEM FAILED 24 days ago pool-data_high_error(archive/system)
(...)
Node name: arproto2-isb.nis.internal
Node status: HEALTHY
Status Change: 21 hours ago
Component Status Status Change Reasons
----------------------------------------------------------------------------------
FILESYSTEM DEGRADED 6 days ago pool-data_high_warn(archive/system)
mmdf tells me:
nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%)
nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%)
(94 more LUNs all within 0.2% of these for usage - data is striped out pretty well)
There's also 6 SSD LUNs for metadata:
nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%)
(again, evenly striped)
So who is remembering that status, and how to clear it?
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ca2e808fa12e74ed277bc08d5edc51bc3%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636676353194563950&sdata=5biJuM0K0XwEw3BMwbS5epNQhrlig%2FFON7k1V79G%2Fyc%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180719/b7d4cdb7/attachment-0002.htm>
More information about the gpfsug-discuss
mailing list