<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<div>Hi Luke,<br>
<br>
When using RFC 1918 space among remote clusters, GPFS assumes that each cluster's privately addressed networks are not reachable from one another. You must add explicit shared subnets via mmchconfig. Try setting subnets as follows:<br>
<br>
gpfs.oerc.local:<br>
subnets="10.200.0.0 10.200.0.0/cpdn.oerc.local"<br>
<br>
cpdn.oerc.local:<br>
subnets="10.200.0.0 10.200.0.0/gpfs.oerc.local"<br>
<br>
I think you may also need to set the cipher list locally on each cluster to AUTHONLY via mmauth. On my clusters, these match. (No cluster says "none specified".)
<br>
<br>
Hope that helps,<br>
Paul<br>
<br>
<br>
<br>
Sent with Good (www.good.com)<br>
<strong>
<div><font face="Tahoma" color="#000000" size="2"> </font></div>
</strong>
<hr tabindex="-1">
<font face="Tahoma" size="2"><b>From:</b> gpfsug-discuss-bounces@gpfsug.org on behalf of Luke Raimbach<br>
<b>Sent:</b> Wednesday, May 07, 2014 5:28:59 AM<br>
<b>To:</b> gpfsug-discuss@gpfsug.org<br>
<b>Subject:</b> [gpfsug-discuss] GPFS Remote Mount Fails<br>
</font><br>
<div></div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">Dear All,<br>
<br>
I'm having a problem remote mounting a file system. I have two clusters:<br>
<br>
gpfs.oerc.local which owns file system 'gpfs'<br>
cpdn.oerc.local which owns no file systems<br>
<br>
I want to remote mount file system 'gpfs' from cluster cpdn.oerc.local. I'll post the configuration for both clusters further down. The error I receive on a node in cluster cpdn.oerc.local is:<br>
<br>
<br>
Wed May 7 10:05:19.595 2014: Waiting to join remote cluster gpfs.oerc.local<br>
Wed May 7 10:05:20.598 2014: Remote mounts are not enabled within this cluster.<br>
Wed May 7 10:05:20.599 2014: Remote mounts are not enabled within this cluster.<br>
Wed May 7 10:05:20.598 2014: A node join was rejected. This could be due to<br>
incompatible daemon versions, failure to find the node<br>
in the configuration database, or no configuration manager found.<br>
Wed May 7 10:05:20.600 2014: Failed to join remote cluster gpfs.oerc.local<br>
Wed May 7 10:05:20.601 2014: Command: err 693: mount gpfs.oerc.local:gpfs<br>
Wed May 7 10:05:20.600 2014: Message failed because the destination node refused the connection.<br>
<br>
<br>
I'm concerned about the "Remote mounts are not enabled within this cluster" messages. Having followed the configuration steps in the GPFS Advanced Administration Guide, I end up with the following configurations:<br>
<br>
## GPFS Cluster 'gpfs.oerc.local' ##<br>
<br>
[root@gpfs01 ~]# mmlscluster<br>
<br>
GPFS cluster information<br>
========================<br>
GPFS cluster name: gpfs.oerc.local<br>
GPFS cluster id: 748734524680043237<br>
GPFS UID domain: gpfs.oerc.local<br>
Remote shell command: /usr/bin/ssh<br>
Remote file copy command: /usr/bin/scp<br>
<br>
GPFS cluster configuration servers:<br>
-----------------------------------<br>
Primary server: gpfs01.oerc.local<br>
Secondary server: gpfs02.oerc.local<br>
<br>
Node Daemon node name IP address Admin node name Designation<br>
--------------------------------------------------------------------------<br>
1 gpfs01.oerc.local 10.100.10.21 gpfs01.oerc.local quorum-manager<br>
2 gpfs02.oerc.local 10.100.10.22 gpfs02.oerc.local quorum-manager<br>
3 linux.oerc.local 10.100.10.1 linux.oerc.local<br>
4 jupiter.oerc.local 10.100.10.2 jupiter.oerc.local<br>
5 cnfs0.oerc.local 10.100.10.100 cnfs0.oerc.local<br>
6 cnfs1.oerc.local 10.100.10.101 cnfs1.oerc.local<br>
7 cnfs2.oerc.local 10.100.10.102 cnfs2.oerc.local<br>
8 cnfs3.oerc.local 10.100.10.103 cnfs3.oerc.local<br>
9 tsm01.oerc.local 10.100.10.51 tsm01.oerc.local quorum-manager<br>
<br>
<br>
[root@gpfs01 ~]# mmremotecluster show all<br>
Cluster name: cpdn.oerc.local<br>
Contact nodes: 10.100.10.60,10.100.10.61,10.100.10.62<br>
SHA digest: e9a2dc678a62d6c581de0b89b49a90f28f401327<br>
File systems: (none defined)<br>
<br>
<br>
[root@gpfs01 ~]# mmauth show all<br>
Cluster name: cpdn.oerc.local<br>
Cipher list: AUTHONLY<br>
SHA digest: e9a2dc678a62d6c581de0b89b49a90f28f401327<br>
File system access: gpfs (rw, root remapped to 99:99)<br>
<br>
Cluster name: gpfs.oerc.local (this cluster)<br>
Cipher list: (none specified)<br>
SHA digest: e7a68ff688d6ef055eb40fe74677b272d6c60879<br>
File system access: (all rw)<br>
<br>
<br>
[root@gpfs01 ~]# mmlsconfig<br>
Configuration data for cluster gpfs.oerc.local:<br>
-----------------------------------------------<br>
myNodeConfigNumber 1<br>
clusterName gpfs.oerc.local<br>
clusterId 748734524680043237<br>
autoload yes<br>
minReleaseLevel 3.4.0.7<br>
dmapiFileHandleSize 32<br>
maxMBpS 6400<br>
maxblocksize 2M<br>
pagepool 4G<br>
[cnfs0,cnfs1,cnfs2,cnfs3]<br>
pagepool 2G<br>
[common]<br>
tiebreakerDisks vd0_0;vd2_2;vd5_5<br>
cnfsSharedRoot /gpfs/.ha<br>
nfsPrefetchStrategy 1<br>
cnfsVIP gpfs-nfs<br>
subnets 10.200.0.0<br>
cnfsMountdPort 4000<br>
cnfsNFSDprocs 128<br>
[common]<br>
adminMode central<br>
<br>
File systems in cluster gpfs.oerc.local:<br>
----------------------------------------<br>
/dev/gpfs<br>
<br>
<br>
## GPFS Cluster 'cpdn.oerc.local' ##<br>
<br>
[root@cpdn-ppc01 ~]# mmlscluster<br>
<br>
GPFS cluster information<br>
========================<br>
GPFS cluster name: cpdn.oerc.local<br>
GPFS cluster id: 10699506775530551223<br>
GPFS UID domain: cpdn.oerc.local<br>
Remote shell command: /usr/bin/ssh<br>
Remote file copy command: /usr/bin/scp<br>
<br>
GPFS cluster configuration servers:<br>
-----------------------------------<br>
Primary server: cpdn-ppc02.oerc.local<br>
Secondary server: cpdn-ppc03.oerc.local<br>
<br>
Node Daemon node name IP address Admin node name Designation<br>
-------------------------------------------------------------------------------<br>
1 cpdn-ppc01.oerc.local 10.100.10.60 cpdn-ppc01.oerc.local quorum<br>
2 cpdn-ppc02.oerc.local 10.100.10.61 cpdn-ppc02.oerc.local quorum-manager<br>
3 cpdn-ppc03.oerc.local 10.100.10.62 cpdn-ppc03.oerc.local quorum-manager<br>
<br>
[root@cpdn-ppc01 ~]# mmremotecluster show all<br>
Cluster name: gpfs.oerc.local<br>
Contact nodes: 10.100.10.21,10.100.10.22<br>
SHA digest: e7a68ff688d6ef055eb40fe74677b272d6c60879<br>
File systems: gpfs (gpfs)<br>
<br>
<br>
[root@cpdn-ppc01 ~]# mmauth show all<br>
Cluster name: gpfs.oerc.local<br>
Cipher list: AUTHONLY<br>
SHA digest: e7a68ff688d6ef055eb40fe74677b272d6c60879<br>
File system access: (none authorized)<br>
<br>
Cluster name: cpdn.oerc.local (this cluster)<br>
Cipher list: (none specified)<br>
SHA digest: e9a2dc678a62d6c581de0b89b49a90f28f401327<br>
File system access: (all rw)<br>
<br>
<br>
[root@cpdn-ppc01 ~]# mmremotefs show all<br>
Local Name Remote Name Cluster name Mount Point Mount Options Automount Drive Priority<br>
gpfs gpfs gpfs.oerc.local /gpfs rw yes - 0<br>
<br>
<br>
[root@cpdn-ppc01 ~]# mmlsconfig<br>
Configuration data for cluster cpdn.oerc.local:<br>
-----------------------------------------------<br>
myNodeConfigNumber 1<br>
clusterName cpdn.oerc.local<br>
clusterId 10699506775530551223<br>
autoload yes<br>
dmapiFileHandleSize 32<br>
minReleaseLevel 3.4.0.7<br>
subnets 10.200.0.0<br>
pagepool 4G<br>
[cpdn-ppc02,cpdn-ppc03]<br>
pagepool 2G<br>
[common]<br>
traceRecycle local<br>
trace all 4 tm 2 thread 1 mutex 1 vnode 2 ksvfs 3 klockl 2 io 3 pgalloc 1 mb 1 lock 2 fsck 3<br>
adminMode central<br>
<br>
File systems in cluster cpdn.oerc.local:<br>
----------------------------------------<br>
(none)<br>
<br>
<br>
As far as I can see I have everything set up and have exchanged the public keys for each cluster and installed them using the -k switch for mmremotecluster and mmauth on the respective clusters. I've tried reconfiguring the admin-interface and daemon-interface
names on the cpdn.oerc.local cluster but get the same error (stab in the dark after looking at some trace dumps and seeing IP address inconsistencies). Now I'm worried I've missed something really obvious! Any help greatly appreciated. Here's some trace output
from the mmmount gpfs command when run from the cpdn.oerc.local cluster:<br>
<br>
<br>
35.736808 2506 TRACE_MUTEX: Thread 0x320031 (MountHandlerThread) signalling condvar 0x7F8968092D90 (0x7F8968092D90) (ThreadSuspendResumeCondvar) waitCount 1<br>
35.736811 2506 TRACE_MUTEX: internalSignalSave: Created event word 0xFFFF88023AEE1108 for mutex ThreadSuspendResumeMutex<br>
35.736812 2506 TRACE_MUTEX: Releasing mutex 0x1489F28 (0x1489F28) (ThreadSuspendResumeMutex) in daemon (threads waiting)<br>
35.736894 2506 TRACE_BASIC: Wed May 7 08:24:15.991 2014: Waiting to join remote cluster gpfs.oerc.local<br>
35.736927 2506 TRACE_MUTEX: Thread 0x320031 (MountHandlerThread) waiting on condvar 0x14BAB50 (0x14BAB50) (ClusterConfigurationBCHCond): waiting to join remote cluster<br>
35.737369 2643 TRACE_SP: RunProbeCluster: enter. EligibleQuorumNode 0 maxPingIterations 10<br>
35.737371 2643 TRACE_SP: RunProbeCluster: cl 1 gpnStatus none prevLeaseSeconds 0 loopIteration 1 pingIteration 1/10 nToTry 2 nResponses 0 nProbed 0<br>
35.739561 2643 TRACE_DLEASE: Pinger::send: node <c1p2> err 0<br>
35.739620 2643 TRACE_DLEASE: Pinger::send: node <c1p1> err 0<br>
35.739624 2643 TRACE_THREAD: Thread 0x324050 (ProbeRemoteClusterThread) delaying until 1399447456.994516000: waiting for ProbeCluster ping response<br>
35.739726 2579 TRACE_DLEASE: Pinger::receiveLoop: echoreply from <c1p2> 10.100.10.22<br>
35.739728 2579 TRACE_DLEASE: Pinger::receiveLoop: echoreply from <c1p1> 10.100.10.21<br>
35.739730 2579 TRACE_BASIC: cxiRecvfrom: sock 9 buf 0x7F896CB64960 len 128 flags 0 failed with err 11<br>
35.824879 2596 TRACE_DLEASE: checkAndRenewLease: cluster 0 leader <c0n1> (me 0) remountRetryNeeded 0<br>
35.824885 2596 TRACE_DLEASE: renewLease: leaseage 10 (100 ticks/sec) now 429499910 lastLeaseReplyReceived 429498823<br>
35.824891 2596 TRACE_TS: tscSend: service 00010001 msg 'ccMsgDiskLease' n_dest 1 data_len 4 msg_id 94 msg 0x7F89500098B0 mr 0x7F89500096E0<br>
35.824894 2596 TRACE_TS: acquireConn enter: addr <c0n1><br>
35.824895 2596 TRACE_TS: acquireConn exit: err 0 connP 0x7F8948025210<br>
35.824898 2596 TRACE_TS: sendMessage dest <c0n1> 10.200.61.1 cpdn-ppc02: msg_id 94 type 14 tagP 0x7F8950009CB8 seq 89, state initial<br>
35.824957 2596 TRACE_TS: llc_send_msg: returning 0<br>
35.824958 2596 TRACE_TS: tscSend: replies[0] dest <c0n1>, status pending, err 0<br>
35.824960 2596 TRACE_TS: tscSend: rc = 0x0<br>
35.824961 2596 TRACE_DLEASE: checkAndRenewLease: cluster 0 nextLeaseCheck in 2 sec<br>
35.824989 2596 TRACE_THREAD: Thread 0x20C04D (DiskLeaseThread) delaying until 1399447458.079879000: RunLeaseChecks waiting for next check time<br>
35.825509 2642 TRACE_TS: socket_dequeue_next: returns 8<br>
35.825511 2642 TRACE_TS: socket_dequeue_next: returns -1<br>
35.825513 2642 TRACE_TS: receiverEvent enter: sock 8 event 0x5 state reading header<br>
35.825527 2642 TRACE_TS: service_message: enter: msg 'reply', msg_id 94 seq 88 ackseq 89, from <c0n1> 10.200.61.1, active 0<br>
35.825531 2642 TRACE_TS: tscHandleMsgDirectly: service 00010001, msg 'reply', msg_id 94, len 4, from <c0n1> 10.100.10.61<br>
35.825533 2642 TRACE_TS: HandleReply: status success, err 0; 0 msgs pending after this reply<br>
35.825534 2642 TRACE_MUTEX: Acquired mutex 0x7F896805AC68 (0x7F896805AC68) (PendMsgTabMutex) in daemon using trylock<br>
35.825537 2642 TRACE_DLEASE: renewLease: ccMsgDiskLease reply.status 6 err 0 from <c0n1> (expected 10.100.10.61) current leader 10.100.10.61<br>
35.825545 2642 TRACE_DLEASE: DMS timer [0] started, delay 58, time 4295652<br>
35.825546 2642 TRACE_DLEASE: updateMyLease: oldLease 4294988 newLease 4294999 (35 sec left) leaseLost 0<br>
35.825556 2642 TRACE_BASIC: cxiRecv: sock 8 buf 0x7F8954010BE8 len 32 flags 0 failed with err 11<br>
35.825557 2642 TRACE_TS: receiverEvent exit: sock 8 err 54 newTypes 1 state reading header<br>
36.739811 2643 TRACE_TS: llc_pick_dest_addr: use default addrs from 10.100.10.60 to 10.100.10.22 (primary listed 1 0)<br>
36.739814 2643 TRACE_SP: RunProbeCluster: sending probe 1 to <c1p2> gid 00000000:00000000 flags 01<br>
36.739824 2643 TRACE_TS: tscSend: service 00010001 msg 'ccMsgProbeCluster2' n_dest 1 data_len 100 msg_id 95 msg 0x7F8950009F20 mr 0x7F8950009D50<br>
36.739829 2643 TRACE_TS: acquireConn enter: addr <c1p2><br>
36.739831 2643 TRACE_TS: acquireConn exit: err 0 connP 0x7F8964025040<br>
36.739835 2643 TRACE_TS: sendMessage dest <c1p2> 10.100.10.22 10.100.10.22: msg_id 95 type 36 tagP 0x7F895000A328 seq 1, state initial<br>
36.739838 2643 TRACE_TS: llc_pick_dest_addr: use default addrs from 10.100.10.60 to 10.100.10.22 (primary listed 1 0)<br>
36.739914 2643 TRACE_BASIC: Wed May 7 08:24:16.994 2014: Remote mounts are not enabled within this cluster.<br>
36.739963 2643 TRACE_TS: TcpConn::make_connection: status=init, err=720, dest 10.100.10.22<br>
36.739965 2643 TRACE_TS: llc_send_msg: returning 693<br>
36.739966 2643 TRACE_TS: tscSend: replies[0] dest <c1p2>, status node_failed, err 693<br>
36.739968 2643 TRACE_MUTEX: Acquired mutex 0x7F896805AC90 (0x7F896805AC90) (PendMsgTabMutex) in daemon using trylock<br>
36.739969 2643 TRACE_TS: tscSend: rc = 0x1<br>
36.739970 2643 TRACE_SP: RunProbeCluster: reply rc 693 tryHere <none>, flags 0<br>
36.739972 2643 TRACE_SP: RunProbeCluster: cl 1 gpnStatus none prevLeaseSeconds 0 loopIteration 1 pingIteration 2/10 nToTry 2 nResponses 2 nProbed 1<br>
36.739973 2643 TRACE_TS: llc_pick_dest_addr: use default addrs from 10.100.10.60 to 10.100.10.21 (primary listed 1 0)<br>
36.739974 2643 TRACE_SP: RunProbeCluster: sending probe 1 to <c1p1> gid 00000000:00000000 flags 01<br>
36.739977 2643 TRACE_TS: tscSend: service 00010001 msg 'ccMsgProbeCluster2' n_dest 1 data_len 100 msg_id 96 msg 0x7F895000A590 mr 0x7F895000A3C0<br>
36.739978 2643 TRACE_TS: acquireConn enter: addr <c1p1><br>
36.739979 2643 TRACE_TS: acquireConn exit: err 0 connP 0x7F89640258D0<br>
36.739980 2643 TRACE_TS: sendMessage dest <c1p1> 10.100.10.21 10.100.10.21: msg_id 96 type 36 tagP 0x7F895000A998 seq 1, state initial<br>
36.739982 2643 TRACE_TS: llc_pick_dest_addr: use default addrs from 10.100.10.60 to 10.100.10.21 (primary listed 1 0)<br>
36.739993 2643 TRACE_BASIC: Wed May 7 08:24:16.995 2014: Remote mounts are not enabled within this cluster.<br>
36.740003 2643 TRACE_TS: TcpConn::make_connection: status=init, err=720, dest 10.100.10.21<br>
36.740005 2643 TRACE_TS: llc_send_msg: returning 693<br>
36.740005 2643 TRACE_TS: tscSend: replies[0] dest <c1p1>, status node_failed, err 693<br>
<br>
<br>
Sorry if the formatting above gets horribly screwed. Thanks for any assistance,<br>
<br>
Luke<br>
<br>
--<br>
<br>
Luke Raimbach<br>
IT Manager<br>
Oxford e-Research Centre<br>
7 Keble Road,<br>
Oxford,<br>
OX1 3QG<br>
<br>
+44(0)1865 610639<br>
<br>
<br>
_______________________________________________<br>
gpfsug-discuss mailing list<br>
gpfsug-discuss at gpfsug.org<br>
<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>
</div>
</span></font>
</body>
</html>