[gpfsug-discuss] Remote cluster gpfs communication on IP different then one for Daemon or Admin node name.

Patel, Tarak (SSC/SPC) tarak.patel at canada.ca
Wed Jun 7 16:42:45 BST 2017


Hi all,

We've been experiencing issues with remote cluster node expelling CES nodes causing remote filesystems to unmount. The issue is related gpfs communication using Ethernet IP rather than IP defined on IB which is used for Daemon node name and Admin node name. So remote cluster is aware of IPs that are not defined in GPFS configuration as Admin/Daemon node name. The CES nodes are configure to have IB as well as Ethernet (for client interactive and NFS access). We've double checked /etc/hosts and DNS and all looks to be in order since the CES IPoIB IP is present in /etc/hosts of remote cluster. I'm unsure where cluster manager for remote cluster is getting the Ethernet IP if there is no mention of it in GPFS configuration. The CES nodes were added later therefore they are not listed as Contact Nodes in 'mmremotecluster show' output.

The CES nodes use IP defined on IB for GPFS configuration and we also have Ethernet which has the default route defined. In order to ensure that all IB communication passes via IPoIB, we've even defined a static route so that all GPFS communication will use IPoIB (since we are dealing with a different fabric). 'mmfsadm dump tscomm' reports multiple IPs for CES nodes which includes the Ethernet and also the IPoIB. I'm unsure if there is a way to drop some connections on GPFS (cluster wide) after stopping a specific CES node and ensure that only IB is listed. I realize that one option would be to define subnet parameter for remote cluster which will require a downtime (solution to be explored at later date).

Hope that someone can explain how or why remote cluster is picking IPs not used in GPFS config for remote nodes and how to ensure those IPs are not used in future.

Thank you,

Tarak


--

Tarak Patel

Chef d'équipe, Integration HPC, Solution de calcul E-Science
Service partagé Canada / Gouvernment du Canada
tarak.patel at canada.ca<mailto:tarak.patel at canada.ca>
1-514-421-7299

Team Lead, HPC Integration, E-Science Computing Solution
Shared Services Canada, Government of Canada
tarak.patel at canada.ca<mailto:tarak.patel at canada.ca>
1-514-421-7299



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170607/87a4241b/attachment-0001.htm>


More information about the gpfsug-discuss mailing list