[gpfsug-discuss] Wrong nodename after server restart

Tue Sep 12 10:40:35 BST 2017

Hi,

I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and 
after that I was unable to move CES IP address back with strange error 
"mmces address move: GPFS is down on this node". After I double checked 
that gpfs state is active on all nodes, I dug deeper and I think I found 
problem, but I don't really know how this could happen.

Look at the names of nodes:

[root at gpfs-n2 ~]# mmlscluster     # Looks good

GPFS cluster information
========================
   GPFS cluster name:         gpfscl1.img.local
   GPFS cluster id:           17792677515884116443
   GPFS UID domain:           img.local
   Remote shell command:      /usr/bin/ssh
   Remote file copy command:  /usr/bin/scp
   Repository type:           CCR

  Node  Daemon node name       IP address       Admin node name        
Designation
----------------------------------------------------------------------------------
    1   gpfs-n4.img.local      192.168.20.64 gpfs-n4.img.local      
quorum-manager
    2   gpfs-quorum.img.local  192.168.20.60 gpfs-quorum.img.local  quorum
    3   gpfs-n3.img.local      192.168.20.63 gpfs-n3.img.local      
quorum-manager
    4   tau.img.local          192.168.1.248 tau.img.local
    5   gpfs-n1.img.local      192.168.20.61 gpfs-n1.img.local      
quorum-manager
    6   gpfs-n2.img.local      192.168.20.62 gpfs-n2.img.local      
quorum-manager
    8   whale.img.cas.cz       147.231.150.108 whale.img.cas.cz

[root at gpfs-n2 ~]# mmlsmount gpfs01 -L   # not so good

File system gpfs01 is mounted on 7 nodes:
   192.168.20.63   gpfs-n3
   192.168.20.61   gpfs-n1
   192.168.20.62   gpfs-n2
   192.168.1.248   tau
   192.168.20.64   gpfs-n4.img.local
   192.168.20.60   gpfs-quorum.img.local
   147.231.150.108 whale.img.cas.cz

[root at gpfs-n2 ~]# tsctl shownodes up | tr ','  '\n'   # very wrong
whale.img.cas.cz.img.local
tau.img.local
gpfs-quorum.img.local.img.local
gpfs-n1.img.local
gpfs-n2.img.local
gpfs-n3.img.local
gpfs-n4.img.local.img.local

The "tsctl shownodes up" is the reason why I'm not able to move CES 
address back to gpfs-n4 node, but the real problem are different 
nodenames. I think OS is configured correctly:

[root at gpfs-n4 /]# hostname
gpfs-n4

[root at gpfs-n4 /]# hostname -f
gpfs-n4.img.local

[root at gpfs-n4 /]# cat /etc/resolv.conf
nameserver 192.168.20.30
nameserver 147.231.150.2
search img.local
domain img.local

[root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4
192.168.20.64    gpfs-n4.img.local gpfs-n4

[root at gpfs-n4 /]# host gpfs-n4
gpfs-n4.img.local has address 192.168.20.64

[root at gpfs-n4 /]# host 192.168.20.64
64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local.

Can someone help me with this.

Thanks,
Michal

p.s.  gpfs version: 4.2.3-2 (CentOS 7)