[gpfsug-discuss] gpfsgui in a core dump/restart loop

Losen, Stephen C (scl) scl at virginia.edu
Tue Nov 30 12:47:46 GMT 2021


Hi folks,
Our gpfsgui service keeps crashing and restarting. About every three minutes we get files like these in /var/crash/scalemgmt

-rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 core.20211130.065414.59174.0001.dmp
-rw-r--r-- 1 scalemgmt scalemgmt    2636747 Nov 30 06:54 javacore.20211130.065414.59174.0002.txt
-rw-r--r-- 1 scalemgmt scalemgmt    1903304 Nov 30 06:54 Snap.20211130.065414.59174.0003.trc
-rw-r--r-- 1 scalemgmt scalemgmt        202 Nov 30 06:54 jitdump.20211130.065414.59174.0004.dmp

The core.*.dmp files are cores from the java command.

And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log.

Any suggestions? Thanks for any help.


2021-11-30_07:25:09.944-0500: [W] ET_gui          Event=gui_down identifier= arg0=started arg1=stopped
2021-11-30_07:25:09.961-0500: [I] ET_gui          state_change for service: gui to FAILED at 2021.11.30 07.25.09.961572
2021-11-30_07:25:09.963-0500: [I] ClientThread-4  received command: 'thresholds  refresh  collectors  4021694'
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  reload collectors                                 
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  read_collectors                                   
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryHandler: query response has no data results  
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryHandler: query response has no data results  
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.061-0500: [I] ClientThread-4  _activate_rules_scheduler completed               
2021-11-30_07:25:10.147-0500: [I] ET_gui          Event=component_state_change identifier= arg0=GUI arg1=FAILED
2021-11-30_07:25:10.148-0500: [I] ET_gui          StateChange: change_to=FAILED nodestate=DEGRADED CESState=UNKNOWN
2021-11-30_07:25:10.148-0500: [I] ET_gui          Service gui state changed. isInRunningState=True, wasInRunningState=True. New state=4
2021-11-30_07:25:10.148-0500: [I] ET_gui          Monitor: LocalState:FAILED Events:607 Entities:0 RT:  0.83
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', '-c 4021693']
2021-11-30_07:25:11.975-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:04.553-0500: [D] ET_perfmon      File collectors has no newer version than 4021693  - CCRProxy.getFile:119
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021693 failed
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:11.976-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:12.077-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds  refresh  collectors  4021695'
2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors                                 
2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors                                   
2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response has no data results  
2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response has no data results  
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler completed               
2021-11-30_07:25:15.528-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', '-c 4021694']
2021-11-30_07:25:15.528-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:12.076-0500: [D] ET_perfmon      File collectors has no newer version than 4021694  - CCRProxy.getFile:119
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021694 failed
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:15.529-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:15.626-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:16.594-0500: [I] ClientThread-3  received command: 'thresholds  refresh  collectors  4021696'
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  reload collectors                                 
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  read_collectors                                   
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', '-c 4021695']
2021-11-30_07:25:19.780-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:15.625-0500: [D] ET_perfmon      File collectors has no newer version than 4021695  - CCRProxy.getFile:119
2021-11-30_07:25:16.781-0500: [D] ClientThread-3  File zmrules.json has no newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021695 failed
2021-11-30_07:25:19.781-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:19.781-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:19.881-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:21.238-0500: [I] ClientThread-7  received command: 'thresholds  refresh  collectors  4021697'
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  reload collectors                                 
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  read_collectors                                   
2021-11-30_07:25:21.324-0500: [W] NMES            monitor event arrived while still busy for perfmon
2021-11-30_07:25:21.481-0500: [I] ET_threshold    Event=thresh_monitor_del_active identifier=active_thresh_monitor arg0=active_thresh_monitor
2021-11-30_07:25:21.482-0500: [I] ET_threshold    Monitor: LocalState:HEALTHY Events:1 Entities:1 RT:  0.16
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', '-c 4021696']
2021-11-30_07:25:24.211-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:19.881-0500: [D] ET_perfmon      File collectors has no newer version than 4021696  - CCRProxy.getFile:119
2021-11-30_07:25:21.411-0500: [D] ClientThread-7  File zmrules.json has no newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021696 failed
2021-11-30_07:25:24.212-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:24.212-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:24.314-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:24.543-0500: [I] ET_gui          ServiceMonitor => out=Type=notify

And then gpfsgui apparently crashes and systemd automatically restarts it.


Steve Losen
Research Computing
University of Virginia
scl at virginia.edu   434-924-0640



More information about the gpfsug-discuss mailing list