[gpfsug-discuss] subnets confusion

Brian Marshall mimarsh2 at vt.edu
Tue Nov 8 13:40:43 GMT 2016


All,

I have a tricky (at least to me) subnets question.

I have  2 NSD Server clusters:
Serv1 -> daemon on 10.51 with high speed network on 10.82
Serv2 -> daemon on 10.42 a high speed network

 and 2 client clusters:
Cli1 -> daemon on 10.81 with high speed network on 10.82
Cli2 -> daemon on 10.41 with high speed network on 10.42

 Serv1 has the following subnets operand:
subnets 10.82.0.0/Serv1;Cli1 10.41.0.0/Cli2

Cli1 has the following subnets
subnets 10.82.0.0/Serv1;Cli1

Cli2 has the following subnets
subnets 10.51.0.0/Serv1 10.41.0.0/Cli2 10.42.0.0/Serv2


Problem:
Sometimes Serv1 will try to contact Cli2 nodes on the 10.42 address which
they don't have access to.  I get errors like
Close connection to 10.42.1 0.1 hs001.cluster.ib (Connection timed out)
Cli2 nodes can connect/re-connect to Serv1 once the server cluster kicks
them out.

Serv1 has Cli2 listed on its 10.41 subnets operand, so I don't fully
understand why Serv1 does not use 10.41 to connect

Possible Solution??
I think to fix this I either need to add Serv1 to the 10.41 subnet of Cli2
 OR move the 10.42 operand on Cli2 to the front of the list.


I am working from this link

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/GPFS+Network+Communication+Overview


Please let me know if you need more info.  I have tried to strip this down
to the bare minimum and in doing so may have left out good details.

Thank you,
Brian Marshall
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161108/030bdaea/attachment-0001.htm>


More information about the gpfsug-discuss mailing list