[gpfsug-discuss] Odd networking/name resolution issue

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Mon May 11 11:01:49 BST 2020


On 10/05/2020 14:28, Jaime Pinto wrote:
> The rationale for my suggestion doesn't have much to do with the central 
> DNS server, but everything to do with the DNS client side of the service.
> If you have a very busy cluster at times, and a number of nodes really 
> busy with 1000+ IOPs for instance, so much that the OS on the client 
> can't barely spare a cycle to query the DSN server on what the IP 
> associated with the name of interface leading to the GPFS infrastructure 
> is, or even process that response when it returns, on the same interface 
> where it's having contentions and trying to process all the gpfs data 
> transactions, you can have temporary catch 22 situations. This can 
> generate a backlog of waiters, and eventual expelling of some nodes when 
> the cluster managers don't hear from them in reasonable time.
> 
> It's doesn't really matter if you have a central DNS server in steroids.
> 

If that is the scenario it will struggle to look up the DNS to IP lists 
in /etc/hosts. GPFS itself will struggle to get scheduled. Besides which 
systemd is caching it all locally anyways which will get hit *before* 
/etc/hosts does. In an none systemd install then most likely nscd is 
running which provides similar service. You don't appear to a full grasp 
of how it hostname to IP resolution works.

It is a outdated notion to suggest using a system that was deprecated 
over 30 years ago that needs stomping on because it comes from IMHO a 
lack of seeing the whole picture.

Finally as I understand it GPFS maintains it own hostname to IP lookup 
table which makes it a completely moot point. You can see this from the 
fact you cannot change the IP address of a node in a cluster. You must 
remove it and then rejoin with the new IP address. If it was storing 
client information as hostnames and using hostname to IP resolution to 
work out the IP addresses that would not be the case.

As such your suggestion is dreaming up a solution to a none existent 
problem, which makes it an even worse idea IMHO.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG



More information about the gpfsug-discuss mailing list