[gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes

Wahl, Edward ewahl at osc.edu
Thu Jan 30 15:52:27 GMT 2020


Interesting.  We just deployed an ESS here and are running into a very similar problem with the gui refresh it appears.  Takes my ppc64le's about 45 seconds to run rinv when they are idle.
 I had just opened a support case on this last evening.  We're on ESS 5.3.4 as well.   I will wait to see what support says.   

Ed Wahl
Ohio Supercomputer Center


-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ulrich Sibiller
Sent: Thursday, January 30, 2020 9:44 AM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes

On 1/29/20 2:05 PM, Billich Heinrich Rainer (ID SD) wrote:
> Hello,
> 
> Can I change the times at which the GUI runs HW_INVENTORY and related tasks?
> 
> we frequently get  messages like
> 
>     gui_refresh_task_failed     GUI           WARNING     12 hours ago      
> The following GUI refresh task(s) failed: HW_INVENTORY
> 
> The tasks fail due to timeouts. Running the task manually most times 
> succeeds. We do run two gui nodes per cluster and I noted that both 
> servers seem run the HW_INVENTORY at the exact same time which may 
> lead to locking or congestion issues, actually the logs show messages 
> like
> 
> EFSSA0194I Waiting for concurrent operation to complete.
> 
> The gui calls ‘rinv’ on the xCat servers. Rinv for a single   
> little-endian  server takes a long time – about 2-3 minutes , while it finishes in  about 15s for big-endian server.
> 
> Hence the long runtime of rinv on little-endian systems may be an 
> issue, too
> 
> We run 5.0.4-1 efix9 on the gui and ESS  5.3.4.1 on the GNR systems  
> (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a separate xCat/ems server for each type. The GUI nodes are ppc64le.
> 
> We did see this issue with several gpfs version on the gui and with at least two ESS/xCat versions.
> 
> Just to be sure I did purge the Posgresql tables.
> 
> I did try
> 
> /usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY
> 
> /usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY –debug
> 
> And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are difficult.


I have seen the same on ppc64le. From time to time it recovers but then it starts again. The timeouts are okay, it is the hardware. I haven opened a call at IBM and they suggested upgrading to ESS 5.3.5 because of the new firmwares which I am currently doing. I can dig out more details if you want.

Uli
--
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$ 


More information about the gpfsug-discuss mailing list