[gpfsug-discuss] gpfs client expels

Salvatore Di Nardo sdinardo at ebi.ac.uk
Thu Aug 21 14:18:19 BST 2014


This is an interesting point!

We use ethernet ( 10g links on the clients) but we dont have a separate 
network for the admin network.

Could you explain this a bit further, because the clients and the 
servers we have are on different subnet so the packet are routed.. I 
don't see a practical way to separate them. The clients are blades in a 
chassis so even if i create 2 interfaces, they will physically use the 
came "cable" to go to the first switch. even the clients ( 600 clients) 
have different subsets.

I will forward this consideration to our network admin , so see if we 
can work on a dedicated network.

thanks for your tip.

Regards,
Salvatore




On 21/08/14 14:03, Vic Cornell wrote:
> Hi Salvatore,
>
> Are you using ethernet or infiniband as the GPFS interconnect to your 
> clients?
>
> If 10/40GbE - do you have a separate admin network?
>
> I have seen behaviour similar to this where the storage traffic causes 
> congestion and the "admin" traffic gets lost or delayed causing expels.
>
> Vic
>
>
>
> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk 
> <mailto:sdinardo at ebi.ac.uk>> wrote:
>
>> Thanks for the feedback, but we managed to find a scenario that 
>> excludes network problems.
>>
>> we have a file called */input_file/* of nearly 100GB:
>>
>> if from *client A* we do:
>>
>> cat input_file >> output_file
>>
>> it start copying.. and we see waiter goeg a bit up,secs but then they 
>> flushes back to 0, so we xcan say that the copy proceed well...
>>
>>
>> if now we do the same from another client ( or just another shell on 
>> the same client) *client B* :
>>
>> cat input_file >> output_file
>>
>>
>>  ( in other words we are trying to write to the same destination) all 
>> the waiters gets up until one node get expelled.
>>
>>
>> Now, while its understandable that the destination file is locked for 
>> one of the "cat", so have to wait ( and since the file is BIG , have 
>> to wait for a while), its not understandable why it stop the renewal 
>> lease.
>> Why its doen't return just a timeout error on the copy instead to 
>> expel the node? We can reproduce this every time, and since our users 
>> to operations like this on files over 100GB each you can imagine the 
>> result.
>>
>>
>>
>> As you can imagine even if its a bit silly to write at the same time 
>> to the same destination, its also quite common if we want to dump to 
>> a log file logs and for some reason one of the writers, write for a 
>> lot of time keeping the file locked.
>> Our expels are not due to network congestion, but because a write 
>> attempts have to wait another one. What i really dont understand is 
>> why to take a so expreme mesure to expell jest because a process is 
>> waiteing "to too much time".
>>
>>
>> I have ticket opened to IBM for this and the issue is under 
>> investigation, but no luck so far..
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>> Hi there,
>>>
>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>>> better term now GSS is out) and seen ping 'working', but alongside 
>>> ejections from the cluster.
>>> The GPFS internode 'ping' is somewhat more circumspect than unix 
>>> ping - and rightly so.
>>>
>>> In my experience this has _always_ been a network issue of one sort 
>>> of another.  If the network is experiencing issues, nodes will be 
>>> ejected.
>>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>>> seen that only twice in 10 years over many versions of GPFS.
>>>
>>> You need to follow the logs through from each machine in time order 
>>> to determine who could not see who and in what order.
>>> Your best way forward is to log a SEV2 case with IBM support, 
>>> directly or via your OEM and collect and supply a snap and traces as 
>>> required by support.
>>>
>>> Without knowing your full setup, it's hard to help further.
>>>
>>> Jez
>>>
>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>> Still problems. Here some more detailed examples:
>>>>
>>>> *EXAMPLE 1:*
>>>>
>>>>             *EBI5-220**( CLIENT)**
>>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>>             reply from node <GSS02B IP> gss02b*
>>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>>             IP> (gss02a in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) to
>>>>             expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:04.982 2014: This node will be
>>>>             expelled from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220
>>>>             IP> (ebi5-220)
>>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager
>>>>             connection broke. Probing cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.
>>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>>             in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) is now the
>>>>             Group Leader.
>>>>
>>>>             *GSS02B ( NSD SERVER)*
>>>>             ...
>>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>>             ...
>>>>
>>>>             *GSS02a ( NSD SERVER)*
>>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP>
>>>>             (gss02b) request from <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>).
>>>>             Expelling: <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>)
>>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>>
>>>>
>>>> ===============================================
>>>> *EXAMPLE 2*:
>>>>
>>>>             *EBI5-038*
>>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period
>>>>             expired in cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>. Attempting to reacquire lease.*
>>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue.
>>>>             Probing cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>*
>>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>>             Attempting reconnect.
>>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             ...
>>>>             LOT MORE RESETS BY PEER
>>>>             ...
>>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1n2>
>>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>>             <GSS02A IP> gss02a <c1n2>
>>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.*
>>>>
>>>>             *GSS02a*
>>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk
>>>>             <http://ebi-cluster.ebi.ac.uk>) *is being expelled
>>>>             because of an expired lease.* Pings sent: 60. Replies
>>>>             received: 60.
>>>>
>>>>
>>>>
>>>>
>>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>>> the servers seems working fine.. how can i trace better ( to solve) 
>>>> the problem?
>>>>
>>>> In example 2 it seems to me that for some reason the manager are 
>>>> not renewing the lease in time. when this happens , its not a 
>>>> single client.
>>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>>> how can i trace to the source of the problem?
>>>>
>>>>
>>>>
>>>> Thanks in advance for any tips.
>>>>
>>>> Regards,
>>>> Salvatore
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/bf1a6c40/attachment-0003.htm>


More information about the gpfsug-discuss mailing list