[gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes

Billich Heinrich Rainer (ID SD) heinrich.billich at id.ethz.ch
Wed Jan 29 13:05:30 GMT 2020


Hello,

Can I change the times at which the GUI runs HW_INVENTORY and related tasks?

we frequently get  messages like

   gui_refresh_task_failed     GUI           WARNING     12 hours ago      The following GUI refresh task(s) failed: HW_INVENTORY

The tasks fail due to timeouts. Running the task manually most times succeeds. We do run two gui nodes per cluster and I noted that both servers seem run the HW_INVENTORY at the exact same time which may lead to locking or congestion issues, actually the logs show messages like

EFSSA0194I Waiting for concurrent operation to complete.

The gui calls ‘rinv’ on the xCat servers. Rinv for a single   little-endian  server takes a long time – about 2-3 minutes , while it finishes in  about 15s for big-endian server.

Hence the long runtime of rinv on little-endian systems may be an issue, too

We run 5.0.4-1 efix9 on the gui and ESS  5.3.4.1 on the GNR systems  (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a separate xCat/ems server for each type. The GUI nodes are ppc64le.

We did see this issue with several gpfs version on the gui and with at least two ESS/xCat versions.

Just to be sure I did purge the Posgresql tables.

I did try

/usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY
/usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY –debug

And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are difficult.

Thank you,

Heiner


--
=======================
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200129/adae2ce5/attachment-0001.htm>


More information about the gpfsug-discuss mailing list