[gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes

Jan-Frode Myklebust janfrode at tanso.net
Mon Feb 3 19:41:31 GMT 2020


I think both 5.3.4.2 and 5.3.5 includes FW860.70, but the readme doesn’t
show this correctly.


  -jf

man. 3. feb. 2020 kl. 11:02 skrev Billich Heinrich Rainer (ID SD) <
heinrich.billich at id.ethz.ch>:

> Thank you. I wonder if there is any ESS version which deploys FW860.70 for
> ppc64le. The Readme for 5.3.5 lists FW860.60 again, same as 5.3.4?
>
>
>
> Cheers,
>
>
>
> Heiner
>
> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of Jan-Frode
> Myklebust <janfrode at tanso.net>
> *Reply to: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> *Date: *Thursday, 30 January 2020 at 18:00
> *To: *gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> *Subject: *Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY
> with two active GUI nodes
>
>
>
>
>
> I *think* this was a known bug in the Power firmware included with 5.3.4,
> and that it was fixed in the FW860.70. Something hanging/crashing in IPMI.
>
>
>
>
>
>
>
>   -jf
>
>
>
> tor. 30. jan. 2020 kl. 17:10 skrev Wahl, Edward <ewahl at osc.edu>:
>
> Interesting.  We just deployed an ESS here and are running into a very
> similar problem with the gui refresh it appears.  Takes my ppc64le's about
> 45 seconds to run rinv when they are idle.
>  I had just opened a support case on this last evening.  We're on ESS
> 5.3.4 as well.   I will wait to see what support says.
>
> Ed Wahl
> Ohio Supercomputer Center
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org <
> gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ulrich Sibiller
> Sent: Thursday, January 30, 2020 9:44 AM
> To: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY
> with two active GUI nodes
>
> On 1/29/20 2:05 PM, Billich Heinrich Rainer (ID SD) wrote:
> > Hello,
> >
> > Can I change the times at which the GUI runs HW_INVENTORY and related
> tasks?
> >
> > we frequently get  messages like
> >
> >     gui_refresh_task_failed     GUI           WARNING     12 hours
> ago
> > The following GUI refresh task(s) failed: HW_INVENTORY
> >
> > The tasks fail due to timeouts. Running the task manually most times
> > succeeds. We do run two gui nodes per cluster and I noted that both
> > servers seem run the HW_INVENTORY at the exact same time which may
> > lead to locking or congestion issues, actually the logs show messages
> > like
> >
> > EFSSA0194I Waiting for concurrent operation to complete.
> >
> > The gui calls ‘rinv’ on the xCat servers. Rinv for a single
> > little-endian  server takes a long time – about 2-3 minutes , while it
> finishes in  about 15s for big-endian server.
> >
> > Hence the long runtime of rinv on little-endian systems may be an
> > issue, too
> >
> > We run 5.0.4-1 efix9 on the gui and ESS  5.3.4.1 on the GNR systems
> > (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a
> separate xCat/ems server for each type. The GUI nodes are ppc64le.
> >
> > We did see this issue with several gpfs version on the gui and with at
> least two ESS/xCat versions.
> >
> > Just to be sure I did purge the Posgresql tables.
> >
> > I did try
> >
> > /usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY
> >
> > /usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY –debug
> >
> > And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are
> difficult.
>
>
> I have seen the same on ppc64le. From time to time it recovers but then it
> starts again. The timeouts are okay, it is the hardware. I haven opened a
> call at IBM and they suggested upgrading to ESS 5.3.5 because of the new
> firmwares which I am currently doing. I can dig out more details if you
> want.
>
> Uli
> --
> Science + Computing AG
> Vorstandsvorsitzender/Chairman of the board of management:
> Dr. Martin Matzke
> Vorstand/Board of Management:
> Matthias Schempp, Sabine Hohenstein
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Philippe Miltin
> Aufsichtsrat/Supervisory Board:
> Martin Wibbe, Ursula Morgenstern
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart Registernummer/Commercial
> Register No.: HRB 382196 _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
> https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$
> <https://urldefense.com/v3/__http:/gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200203/902feba2/attachment-0002.htm>


More information about the gpfsug-discuss mailing list