[gpfsug-discuss] help with multi-cluster setup: Network isunreachable

Jaime Pinto pinto at scinet.utoronto.ca
Wed May 10 02:26:19 BST 2017


As it turned out, the 'authorized_keys' file placed in the  
/var/mmfs/ssl directory of the NDS for the new storage cluster 4  
(4.1.1-14) needed an explicit entry of the following format for the  
bracket associated with clients on cluster 0:
nistCompliance=off

Apparently the default for 4.1.x is:
nistCompliance=SP800-131A

I just noticed that on cluster 3 (4.1.1-7) that entry is also present  
for the bracket associated with clients cluster 0. I guess the Seagate  
fellows that helped us install the G200 in our facility had that  
figured out.

The original "TLS handshake" error message kind of gave me a hint of  
the problem, however the 4.1 installation manual specifically  
mentioned that this could be an issue only on 4.2 onward. The  
troubleshoot guide for 4.2 has this excerpt:

"Ensure that the configurations of GPFS and the remote key management  
(RKM) server are
compatible when it comes to the version of the TLS protocol used upon  
key retrieval (GPFS uses the nistCompliance configuration variable to  
control that). In particular, if nistCompliance=SP800-131A is set in  
GPFS, ensure that the TLS v1.2 protocol is enabled in
the RKM server. If this does not resolve the issue, contact the IBM  
Support Center.". So, how am I to know that nistCompliance=off is even  
an option?


For backward compatibility with the older storage clusters on 3.5 the  
clients cluster need to have nistCompliance=off

I hope this helps the fellows in mixed versions environments, since  
it's not obvious from the 3.5/4.1 installation manuals or the  
troubleshoots guide what we should do.

Thanks everyone for the help.
Jaime





Quoting "Uwe Falke" <UWEFALKE at de.ibm.com>:

> Hi, Jaime,
> I'd suggest you trace a client while trying to connect and check what
> addresses it is going to talk to actually. It is a bit tedious, but you
> will be able to find this in the trace report file. You might also get an
> idea what's going wrong...
>
>
>
> Mit freundlichen Grüßen / Kind regards
>
>
> Dr. Uwe Falke
>
> IT Specialist
> High Performance Computing Services / Integrated Technology Services /
> Data Center Services
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland
> Rathausstr. 7
> 09111 Chemnitz
> Phone: +49 371 6978 2165
> Mobile: +49 175 575 2877
> E-Mail: uwefalke at de.ibm.com
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:
> Andreas Hasse, Thomas Wolter
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
> HRB 17122
>
>
>
>
> From:   "Jaime Pinto" <pinto at scinet.utoronto.ca>
> To:     "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
> Date:   05/08/2017 06:06 PM
> Subject:        [gpfsug-discuss] help with multi-cluster setup: Network is
> unreachable
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
>
>
>
> We have a setup in which "cluster 0" is made up of clients only on
> gpfs v4.1, ie, no NDS's or formal storage on this primary membership.
>
> All storage for those clients come in a multi-cluster fashion, from
> clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7).
>
> We recently added a new storage cluster 4 (4.1.1-14), and for some
> obscure reason we keep getting "Network is unreachable" during mount
> by clients, even though there were no issues or errors with the
> multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add'
> worked fine, and all clients have an entry in /etc/fstab for the file
> system associated with the new cluster 4. The weird thing is that we
> can mount cluster 3 fine (also 4.1).
>
> Another piece og information is that as far as GPFS goes all clusters
> are configured to communicate exclusively over Infiniband, each on a
> different 10.20.x.x network, but broadcast 10.20.255.255. As far as
> the IB network goes there are no problems routing/pinging around all
> the clusters. So this must be internal to GPFS.
>
> None of the clusters have the subnet parameter set explicitly at
> configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem
> we need to. All have cipherList AUTHONLY. One difference is that
> cluster 4 has DMAPI enabled (don't think it matters).
>
> Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients
> during mount (10.20.179.1 is one of the NDS on cluster 4):
> Mon May  8 11:35:27.773 2017: [I] Waiting to join remote cluster
> wosgpfs.wos-gateway01-ib0
> Mon May  8 11:35:28.777 2017: [W] The TLS handshake with node
> 10.20.179.1 failed with error 447 (client side).
> Mon May  8 11:35:28.781 2017: [E] Failed to join remote cluster
> wosgpfs.wos-gateway01-ib0
> Mon May  8 11:35:28.782 2017: [W] Command: err 719: mount
> wosgpfs.wos-gateway01-ib0:wosgpfs
> Mon May  8 11:35:28.783 2017: Network is unreachable
>
>
> I see this reference to "TLS handshake" and error 447, however
> according to the manual this TLS is only set to be default on 4.2
> onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY.
>
> mmdiag --network for some of the client gives this excerpt (broken
> status):
>      tapenode-ib0                        <c4p1>   10.20.83.5
> broken     233  -1    0         0          Linux/L
>      gpc-f114n014-ib0                    <c4p2>   10.20.114.14
> broken     233  -1    0         0          Linux/L
>      gpc-f114n015-ib0                    <c4p3>   10.20.114.15
> broken     233  -1    0         0          Linux/L
>      gpc-f114n016-ib0                    <c4p4>   10.20.114.16
> broken     233  -1    0         0          Linux/L
>      wos-gateway01-ib0                   <c4p5>   10.20.179.1
> broken     233  -1    0         0          Linux/L
>
>
>
> I guess I just need a hint on how to troubleshoot this situation (the
> 4.1 troubleshoot guide is not helping).
>
> Thanks
> Jaime
>
>
>
> ---
> Jaime Pinto
> SciNet HPC Consortium - Compute/Calcul Canada
> www.scinet.utoronto.ca - www.computecanada.ca
> University of Toronto
> 661 University Ave. (MaRS), Suite 1140
> Toronto, ON, M5G1M1
> P: 416-978-2755
> C: 416-505-1477
>
> ----------------------------------------------------------------
> This message was sent using IMP at SciNet Consortium, University of
> Toronto.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>






          ************************************
           TELL US ABOUT YOUR SUCCESS STORIES
          http://www.scinethpc.ca/testimonials
          ************************************
---
Jaime Pinto
SciNet HPC Consortium - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.ca
University of Toronto
661 University Ave. (MaRS), Suite 1140
Toronto, ON, M5G1M1
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of Toronto.




More information about the gpfsug-discuss mailing list