<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<div>Hi Luke,<br>

<br>

When using RFC 1918 space among remote clusters, GPFS assumes that each cluster's privately addressed networks are not reachable from one another.  You must add explicit shared subnets via mmchconfig. Try setting subnets as follows:<br>

<br>

gpfs.oerc.local:<br>

subnets="10.200.0.0 10.200.0.0/cpdn.oerc.local"<br>

<br>

cpdn.oerc.local:<br>

subnets="10.200.0.0 10.200.0.0/gpfs.oerc.local"<br>

<br>

I think you may also need to set the cipher list locally on each cluster to AUTHONLY via mmauth. On my clusters, these match. (No cluster says "none specified".)

<br>

<br>

Hope that helps,<br>

Paul<br>

<br>

<br>

<br>

Sent with Good (www.good.com)<br>

<strong>

<div><font face="Tahoma" color="#000000" size="2"> </font></div>

</strong>

<hr tabindex="-1">

<font face="Tahoma" size="2"><b>From:</b> gpfsug-discuss-bounces@gpfsug.org on behalf of Luke Raimbach<br>

<b>Sent:</b> Wednesday, May 07, 2014 5:28:59 AM<br>

<b>To:</b> gpfsug-discuss@gpfsug.org<br>

<b>Subject:</b> [gpfsug-discuss] GPFS Remote Mount Fails<br>

</font><br>

<div></div>

</div>

<font size="2"><span style="font-size:10pt;">

<div class="PlainText">Dear All,<br>

<br>

I'm having a problem remote mounting a file system. I have two clusters:<br>

<br>

gpfs.oerc.local which owns file system 'gpfs'<br>

cpdn.oerc.local which owns no file systems<br>

<br>

I want to remote mount file system 'gpfs' from cluster cpdn.oerc.local. I'll post the configuration for both clusters further down. The error I receive on a node in cluster cpdn.oerc.local is:<br>

<br>

<br>

Wed May  7 10:05:19.595 2014: Waiting to join remote cluster gpfs.oerc.local<br>

Wed May  7 10:05:20.598 2014: Remote mounts are not enabled within this cluster.<br>

Wed May  7 10:05:20.599 2014: Remote mounts are not enabled within this cluster.<br>

Wed May  7 10:05:20.598 2014: A node join was rejected.  This could be due to<br>

incompatible daemon versions, failure to find the node<br>

in the configuration database, or no configuration manager found.<br>

Wed May  7 10:05:20.600 2014: Failed to join remote cluster gpfs.oerc.local<br>

Wed May  7 10:05:20.601 2014: Command: err 693: mount gpfs.oerc.local:gpfs<br>

Wed May  7 10:05:20.600 2014: Message failed because the destination node refused the connection.<br>

<br>

<br>

I'm concerned about the "Remote mounts are not enabled within this cluster" messages. Having followed the configuration steps in the GPFS Advanced Administration Guide, I end up with the following configurations:<br>

<br>

## GPFS Cluster 'gpfs.oerc.local' ##<br>

<br>

[root@gpfs01 ~]# mmlscluster<br>

<br>

GPFS cluster information<br>

========================<br>

  GPFS cluster name:         gpfs.oerc.local<br>

  GPFS cluster id:           748734524680043237<br>

  GPFS UID domain:           gpfs.oerc.local<br>

  Remote shell command:      /usr/bin/ssh<br>

  Remote file copy command:  /usr/bin/scp<br>

<br>

GPFS cluster configuration servers:<br>

-----------------------------------<br>

  Primary server:    gpfs01.oerc.local<br>

  Secondary server:  gpfs02.oerc.local<br>

<br>

 Node  Daemon node name    IP address     Admin node name     Designation<br>

--------------------------------------------------------------------------<br>

   1   gpfs01.oerc.local   10.100.10.21   gpfs01.oerc.local   quorum-manager<br>

   2   gpfs02.oerc.local   10.100.10.22   gpfs02.oerc.local   quorum-manager<br>

   3   linux.oerc.local    10.100.10.1    linux.oerc.local<br>

   4   jupiter.oerc.local  10.100.10.2    jupiter.oerc.local<br>

   5   cnfs0.oerc.local    10.100.10.100  cnfs0.oerc.local<br>

   6   cnfs1.oerc.local    10.100.10.101  cnfs1.oerc.local<br>

   7   cnfs2.oerc.local    10.100.10.102  cnfs2.oerc.local<br>

   8   cnfs3.oerc.local    10.100.10.103  cnfs3.oerc.local<br>

   9   tsm01.oerc.local    10.100.10.51   tsm01.oerc.local    quorum-manager<br>

<br>

<br>

[root@gpfs01 ~]# mmremotecluster show all<br>

Cluster name:    cpdn.oerc.local<br>

Contact nodes:   10.100.10.60,10.100.10.61,10.100.10.62<br>

SHA digest:      e9a2dc678a62d6c581de0b89b49a90f28f401327<br>

File systems:    (none defined)<br>

<br>

<br>

[root@gpfs01 ~]# mmauth show all<br>

Cluster name:        cpdn.oerc.local<br>

Cipher list:         AUTHONLY<br>

SHA digest:          e9a2dc678a62d6c581de0b89b49a90f28f401327<br>

File system access:  gpfs      (rw, root remapped to 99:99)<br>

<br>

Cluster name:        gpfs.oerc.local (this cluster)<br>

Cipher list:         (none specified)<br>

SHA digest:          e7a68ff688d6ef055eb40fe74677b272d6c60879<br>

File system access:  (all rw)<br>

<br>

<br>

[root@gpfs01 ~]# mmlsconfig<br>

Configuration data for cluster gpfs.oerc.local:<br>

-----------------------------------------------<br>

myNodeConfigNumber 1<br>

clusterName gpfs.oerc.local<br>

clusterId 748734524680043237<br>

autoload yes<br>

minReleaseLevel 3.4.0.7<br>

dmapiFileHandleSize 32<br>

maxMBpS 6400<br>

maxblocksize 2M<br>

pagepool 4G<br>

[cnfs0,cnfs1,cnfs2,cnfs3]<br>

pagepool 2G<br>

[common]<br>

tiebreakerDisks vd0_0;vd2_2;vd5_5<br>

cnfsSharedRoot /gpfs/.ha<br>

nfsPrefetchStrategy 1<br>

cnfsVIP gpfs-nfs<br>

subnets 10.200.0.0<br>

cnfsMountdPort 4000<br>

cnfsNFSDprocs 128<br>

[common]<br>

adminMode central<br>

<br>

File systems in cluster gpfs.oerc.local:<br>

----------------------------------------<br>

/dev/gpfs<br>

<br>

<br>

## GPFS Cluster 'cpdn.oerc.local' ##<br>

<br>

[root@cpdn-ppc01 ~]# mmlscluster<br>

<br>

GPFS cluster information<br>

========================<br>

  GPFS cluster name:         cpdn.oerc.local<br>

  GPFS cluster id:           10699506775530551223<br>

  GPFS UID domain:           cpdn.oerc.local<br>

  Remote shell command:      /usr/bin/ssh<br>

  Remote file copy command:  /usr/bin/scp<br>

<br>

GPFS cluster configuration servers:<br>

-----------------------------------<br>

  Primary server:    cpdn-ppc02.oerc.local<br>

  Secondary server:  cpdn-ppc03.oerc.local<br>

<br>

 Node  Daemon node name       IP address    Admin node name        Designation<br>

-------------------------------------------------------------------------------<br>

   1   cpdn-ppc01.oerc.local  10.100.10.60  cpdn-ppc01.oerc.local  quorum<br>

   2   cpdn-ppc02.oerc.local  10.100.10.61  cpdn-ppc02.oerc.local  quorum-manager<br>

   3   cpdn-ppc03.oerc.local  10.100.10.62  cpdn-ppc03.oerc.local  quorum-manager<br>

<br>

[root@cpdn-ppc01 ~]# mmremotecluster show all<br>

Cluster name:    gpfs.oerc.local<br>

Contact nodes:   10.100.10.21,10.100.10.22<br>

SHA digest:      e7a68ff688d6ef055eb40fe74677b272d6c60879<br>

File systems:    gpfs (gpfs)<br>

<br>

<br>

[root@cpdn-ppc01 ~]# mmauth show all<br>

Cluster name:        gpfs.oerc.local<br>

Cipher list:         AUTHONLY<br>

SHA digest:          e7a68ff688d6ef055eb40fe74677b272d6c60879<br>

File system access:  (none authorized)<br>

<br>

Cluster name:        cpdn.oerc.local (this cluster)<br>

Cipher list:         (none specified)<br>

SHA digest:          e9a2dc678a62d6c581de0b89b49a90f28f401327<br>

File system access:  (all rw)<br>

<br>

<br>

[root@cpdn-ppc01 ~]# mmremotefs show all<br>

Local Name  Remote Name  Cluster name       Mount Point        Mount Options    Automount  Drive  Priority<br>

gpfs        gpfs         gpfs.oerc.local    /gpfs              rw               yes          -        0<br>

<br>

<br>

[root@cpdn-ppc01 ~]# mmlsconfig<br>

Configuration data for cluster cpdn.oerc.local:<br>

-----------------------------------------------<br>

myNodeConfigNumber 1<br>

clusterName cpdn.oerc.local<br>

clusterId 10699506775530551223<br>

autoload yes<br>

dmapiFileHandleSize 32<br>

minReleaseLevel 3.4.0.7<br>

subnets 10.200.0.0<br>

pagepool 4G<br>

[cpdn-ppc02,cpdn-ppc03]<br>

pagepool 2G<br>

[common]<br>

traceRecycle local<br>

trace all 4 tm 2 thread 1 mutex 1 vnode 2 ksvfs 3 klockl 2 io 3 pgalloc 1 mb 1 lock 2 fsck 3<br>

adminMode central<br>

<br>

File systems in cluster cpdn.oerc.local:<br>

----------------------------------------<br>

(none)<br>

<br>

<br>

As far as I can see I have everything set up and have exchanged the public keys for each cluster and installed them using the -k switch for mmremotecluster and mmauth on the respective clusters. I've tried reconfiguring the admin-interface and daemon-interface

 names on the cpdn.oerc.local cluster but get the same error (stab in the dark after looking at some trace dumps and seeing IP address inconsistencies). Now I'm worried I've missed something really obvious! Any help greatly appreciated. Here's some trace output

 from the mmmount gpfs command when run from the cpdn.oerc.local cluster:<br>

<br>

<br>

35.736808   2506 TRACE_MUTEX: Thread 0x320031 (MountHandlerThread) signalling condvar 0x7F8968092D90 (0x7F8968092D90) (ThreadSuspendResumeCondvar) waitCount 1<br>

35.736811   2506 TRACE_MUTEX: internalSignalSave: Created event word 0xFFFF88023AEE1108 for mutex ThreadSuspendResumeMutex<br>

35.736812   2506 TRACE_MUTEX: Releasing mutex 0x1489F28 (0x1489F28) (ThreadSuspendResumeMutex) in daemon (threads waiting)<br>

35.736894   2506 TRACE_BASIC: Wed May  7 08:24:15.991 2014: Waiting to join remote cluster gpfs.oerc.local<br>

35.736927   2506 TRACE_MUTEX: Thread 0x320031 (MountHandlerThread) waiting on condvar 0x14BAB50 (0x14BAB50) (ClusterConfigurationBCHCond): waiting to join remote cluster<br>

35.737369   2643 TRACE_SP: RunProbeCluster: enter. EligibleQuorumNode 0 maxPingIterations 10<br>

35.737371   2643 TRACE_SP: RunProbeCluster: cl 1 gpnStatus none prevLeaseSeconds 0 loopIteration 1 pingIteration 1/10 nToTry 2 nResponses 0 nProbed 0<br>

35.739561   2643 TRACE_DLEASE: Pinger::send: node <c1p2> err 0<br>

35.739620   2643 TRACE_DLEASE: Pinger::send: node <c1p1> err 0<br>

35.739624   2643 TRACE_THREAD: Thread 0x324050 (ProbeRemoteClusterThread) delaying until 1399447456.994516000: waiting for ProbeCluster ping response<br>

35.739726   2579 TRACE_DLEASE: Pinger::receiveLoop: echoreply from <c1p2> 10.100.10.22<br>

35.739728   2579 TRACE_DLEASE: Pinger::receiveLoop: echoreply from <c1p1> 10.100.10.21<br>

35.739730   2579 TRACE_BASIC: cxiRecvfrom: sock 9 buf 0x7F896CB64960 len 128 flags 0 failed with err 11<br>

35.824879   2596 TRACE_DLEASE: checkAndRenewLease: cluster 0 leader <c0n1> (me 0) remountRetryNeeded 0<br>

35.824885   2596 TRACE_DLEASE: renewLease: leaseage 10 (100 ticks/sec) now 429499910 lastLeaseReplyReceived 429498823<br>

35.824891   2596 TRACE_TS: tscSend: service 00010001 msg 'ccMsgDiskLease' n_dest 1 data_len 4 msg_id 94 msg 0x7F89500098B0 mr 0x7F89500096E0<br>

35.824894   2596 TRACE_TS: acquireConn enter: addr <c0n1><br>

35.824895   2596 TRACE_TS: acquireConn exit: err 0 connP 0x7F8948025210<br>

35.824898   2596 TRACE_TS: sendMessage dest <c0n1> 10.200.61.1 cpdn-ppc02: msg_id 94 type 14 tagP 0x7F8950009CB8 seq 89, state initial<br>

35.824957   2596 TRACE_TS: llc_send_msg: returning 0<br>

35.824958   2596 TRACE_TS: tscSend: replies[0] dest <c0n1>, status pending, err 0<br>

35.824960   2596 TRACE_TS: tscSend: rc = 0x0<br>

35.824961   2596 TRACE_DLEASE: checkAndRenewLease: cluster 0 nextLeaseCheck in 2 sec<br>

35.824989   2596 TRACE_THREAD: Thread 0x20C04D (DiskLeaseThread) delaying until 1399447458.079879000: RunLeaseChecks waiting for next check time<br>

35.825509   2642 TRACE_TS: socket_dequeue_next: returns 8<br>

35.825511   2642 TRACE_TS: socket_dequeue_next: returns -1<br>

35.825513   2642 TRACE_TS: receiverEvent enter: sock 8 event 0x5 state reading header<br>

35.825527   2642 TRACE_TS: service_message: enter: msg 'reply', msg_id 94 seq 88 ackseq 89, from <c0n1> 10.200.61.1, active 0<br>

35.825531   2642 TRACE_TS: tscHandleMsgDirectly: service 00010001, msg 'reply', msg_id 94, len 4, from <c0n1> 10.100.10.61<br>

35.825533   2642 TRACE_TS: HandleReply: status success, err 0; 0 msgs pending after this reply<br>

35.825534   2642 TRACE_MUTEX: Acquired mutex 0x7F896805AC68 (0x7F896805AC68) (PendMsgTabMutex) in daemon using trylock<br>

35.825537   2642 TRACE_DLEASE: renewLease: ccMsgDiskLease reply.status 6 err 0 from <c0n1> (expected 10.100.10.61) current leader 10.100.10.61<br>

35.825545   2642 TRACE_DLEASE: DMS timer [0] started, delay 58, time 4295652<br>

35.825546   2642 TRACE_DLEASE: updateMyLease: oldLease 4294988 newLease 4294999 (35 sec left) leaseLost 0<br>

35.825556   2642 TRACE_BASIC: cxiRecv: sock 8 buf 0x7F8954010BE8 len 32 flags 0 failed with err 11<br>

35.825557   2642 TRACE_TS: receiverEvent exit: sock 8 err 54 newTypes 1 state reading header<br>

36.739811   2643 TRACE_TS: llc_pick_dest_addr: use default addrs from 10.100.10.60 to 10.100.10.22 (primary listed 1 0)<br>

36.739814   2643 TRACE_SP: RunProbeCluster: sending probe 1 to <c1p2> gid 00000000:00000000 flags 01<br>

36.739824   2643 TRACE_TS: tscSend: service 00010001 msg 'ccMsgProbeCluster2' n_dest 1 data_len 100 msg_id 95 msg 0x7F8950009F20 mr 0x7F8950009D50<br>

36.739829   2643 TRACE_TS: acquireConn enter: addr <c1p2><br>

36.739831   2643 TRACE_TS: acquireConn exit: err 0 connP 0x7F8964025040<br>

36.739835   2643 TRACE_TS: sendMessage dest <c1p2> 10.100.10.22 10.100.10.22: msg_id 95 type 36 tagP 0x7F895000A328 seq 1, state initial<br>

36.739838   2643 TRACE_TS: llc_pick_dest_addr: use default addrs from 10.100.10.60 to 10.100.10.22 (primary listed 1 0)<br>

36.739914   2643 TRACE_BASIC: Wed May  7 08:24:16.994 2014: Remote mounts are not enabled within this cluster.<br>

36.739963   2643 TRACE_TS: TcpConn::make_connection: status=init, err=720, dest 10.100.10.22<br>

36.739965   2643 TRACE_TS: llc_send_msg: returning 693<br>

36.739966   2643 TRACE_TS: tscSend: replies[0] dest <c1p2>, status node_failed, err 693<br>

36.739968   2643 TRACE_MUTEX: Acquired mutex 0x7F896805AC90 (0x7F896805AC90) (PendMsgTabMutex) in daemon using trylock<br>

36.739969   2643 TRACE_TS: tscSend: rc = 0x1<br>

36.739970   2643 TRACE_SP: RunProbeCluster: reply rc 693 tryHere <none>, flags 0<br>

36.739972   2643 TRACE_SP: RunProbeCluster: cl 1 gpnStatus none prevLeaseSeconds 0 loopIteration 1 pingIteration 2/10 nToTry 2 nResponses 2 nProbed 1<br>

36.739973   2643 TRACE_TS: llc_pick_dest_addr: use default addrs from 10.100.10.60 to 10.100.10.21 (primary listed 1 0)<br>

36.739974   2643 TRACE_SP: RunProbeCluster: sending probe 1 to <c1p1> gid 00000000:00000000 flags 01<br>

36.739977   2643 TRACE_TS: tscSend: service 00010001 msg 'ccMsgProbeCluster2' n_dest 1 data_len 100 msg_id 96 msg 0x7F895000A590 mr 0x7F895000A3C0<br>

36.739978   2643 TRACE_TS: acquireConn enter: addr <c1p1><br>

36.739979   2643 TRACE_TS: acquireConn exit: err 0 connP 0x7F89640258D0<br>

36.739980   2643 TRACE_TS: sendMessage dest <c1p1> 10.100.10.21 10.100.10.21: msg_id 96 type 36 tagP 0x7F895000A998 seq 1, state initial<br>

36.739982   2643 TRACE_TS: llc_pick_dest_addr: use default addrs from 10.100.10.60 to 10.100.10.21 (primary listed 1 0)<br>

36.739993   2643 TRACE_BASIC: Wed May  7 08:24:16.995 2014: Remote mounts are not enabled within this cluster.<br>

36.740003   2643 TRACE_TS: TcpConn::make_connection: status=init, err=720, dest 10.100.10.21<br>

36.740005   2643 TRACE_TS: llc_send_msg: returning 693<br>

36.740005   2643 TRACE_TS: tscSend: replies[0] dest <c1p1>, status node_failed, err 693<br>

<br>

<br>

Sorry if the formatting above gets horribly screwed. Thanks for any assistance,<br>

<br>

Luke<br>

<br>

--<br>

<br>

Luke Raimbach<br>

IT Manager<br>

Oxford e-Research Centre<br>

7 Keble Road,<br>

Oxford,<br>

OX1 3QG<br>

<br>

+44(0)1865 610639<br>

<br>

<br>

_______________________________________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at gpfsug.org<br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

</div>

</span></font>

</body>

</html>