[gpfsug-discuss] Edge case failure mode

Simon Thompson (IT Research Support) S.J.Thompson at bham.ac.uk
Thu May 11 19:05:08 BST 2017


Cheers Bryan ...

http://goo.gl/YXitIF

Points to: (Outlook/mailing list is line breaking and cutting the trailing 0)

https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=105030

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>" <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Thursday, 11 May 2017 at 18:58
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Edge case failure mode


Hey Simon,



I clicked your link but I think it went to a page that is not about this RFE:



[cid:image001.png at 01D2CA56.41F66270]



Cheers,

-Bryan



-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support)
Sent: Thursday, May 11, 2017 12:49 PM
To: gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Edge case failure mode



Just following up on some discussions we had at the UG this week. I

mentioned a few weeks back that we were having issues with failover of

NFS, and we figured a work around to our clients for this so that failover

works great now (plus there is some code fixes coming down the line as

well to help).



Here's my story of fun with protocol nodes ...



Since then we've occasionally been seeing the load average of 1 CES node

rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients.

A lot of digging and we found that CTDB was reporting > 80% memory used,

so we tweaked the page pool down to solve this.



Great we thought ... But alas that wasn't the cause.



Just to be clear 95% of the time, the CES node is fine, I can do and ls in

the mounted file-systems and all is good. When the load rises to 400, an

ls takes 20-30 seconds, so they are related, but what is the initial

cause? Other CES nodes are 100% fine and if we do mmces node suspend, and

then resume all is well on the node (and no other CES node assumes the

problem as the IP moves). Its not always the same CES IP, node or even

data centre, and most of the time is looks fine.



I logged a ticket with OCF today, and one thing they suggested was to

disable NFSv3 as they've seen similar behaviour at another site. As far as

I know, all my NFS clients are v4, but sure we disable v3 anyway as its

not actually needed. (Both at the ganesha layer, change the default for

exports and reconfigure all existing exports to v4 only for good measure).

That didn't help, but certainly worth a try!



Note that my CES cluster is multi-cluster mounting the file-systems and

from the POSIX side, its fine most of the time.



We've used the mmnetverify command to check that all is well as well. Of

course this only checks the local cluster, not remote nodes, but as we

aren't seeing expels and can access the FS, we assume that the GPFS layer

is working fine.



So we finally log a PMR with IBM, I catch a node in a broken state and

pull a trace from it and upload that, and ask what other traces they might

want (apparently there is no protocol trace for NFS in 4.2.2-3).



Now, when we run this, I note that its doing things like mmlsfileset to

the remote storage, coming from two clusters and some of this is timing

out. We've already had issues with rp_filter on remote nodes causing

expels, but the storage backend here has only 1 nic, and we can mount and

access it all fine.



So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not

GPFS ping of course), but not make "admin" calls to it. Ssh appears to

work fine as well BTW to it.



So I check on my CES and this is multi-homed and rp_filter is enabled.

Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm

sure I'm an edge case, but it would be REALLY REALLY helpful to get

mmnetverify to work across a cluster (e.g. I say this is a remote node and

here's its FQDN, can you talk to it) which would have helped with

diagnosis here. I'm not entirely sure why ssh etc would work and pass

rp_filter, but not GPFS traffic (in some cases apparently), but I guess

its something to do with how GPFS is binding and then the kernel routing

layer.



I'm still not sure if this is my root cause as the occurrences of the high

load are a bit random (anything from every hour to being stable for 2-3

days), but since making the rp_filter change this afternoon, so far ...?



I've created an RFE for mmnetverify to be able to test across a cluster...

https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503

0





Simon



_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170511/785d2ff4/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 36506 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170511/785d2ff4/attachment-0002.png>


More information about the gpfsug-discuss mailing list