From zander at ebi.ac.uk  Fri Aug  1 14:44:49 2014
From: zander at ebi.ac.uk (Zander Mears)
Date: Fri, 01 Aug 2014 14:44:49 +0100
Subject: [gpfsug-discuss] Hello!
In-Reply-To: <53D981EF.3020000@gpfsug.org>
References: <53D8C897.9000902@ebi.ac.uk> <53D981EF.3020000@gpfsug.org>
Message-ID: <53DB99D1.8050304@ebi.ac.uk>

Hi Jez

We're just monitoring the standard OS stuff, some interface errors, 
throughput, number of network and gpfs connections due to previous 
issues. We don't really know as yet what is good to monitor GPFS wise.

cheers

Zander

On 31/07/2014 00:38, Jez Tucker (Chair) wrote:
> Hi Zander,
>
>    We have a git repository.  Would you be interested in adding any
> Zabbix custom metrics gathering to GPFS to it?
>
> https://github.com/gpfsug/gpfsug-tools
>
> Best,
>
> Jez


From sfadden at us.ibm.com  Tue Aug  5 18:55:20 2014
From: sfadden at us.ibm.com (Scott Fadden)
Date: Tue, 5 Aug 2014 10:55:20 -0700
Subject: [gpfsug-discuss] GPFS and Lustre on same node
Message-ID: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>


Is anyone running GPFS and Lustre on the same nodes. I have seen it work, I
have heard people are doing it, I am looking for some confirmation.

Thanks

Scott Fadden
GPFS Technical Marketing
Phone: (503) 880-5833
sfadden at us.ibm.com
http://www.ibm.com/systems/gpfs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140805/a2400b55/attachment.htm>

From u.sibiller at science-computing.de  Wed Aug  6 08:46:31 2014
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Wed, 06 Aug 2014 09:46:31 +0200
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
Message-ID: <53E1DD57.90103@science-computing.de>

Am 05.08.2014 19:55, schrieb Scott Fadden:
> Is anyone running GPFS and Lustre on the same nodes. I have seen it work, I have heard people are
> doing it, I am looking for some confirmation.

I have some nodes running lustre 2.1.6 or 2.5.58 and gpfs 3.5.0.17 on RHEL5.8 and RHEL6.5. None of 
them are servers.

Kind regards,

Ulrich Sibiller

-- 
______________________________________creating IT solutions
Dipl.-Inf. Ulrich Sibiller           science + computing ag
System Administration                    Hagellocher Weg 73
mail nfz at science-computing.de      72070 Tuebingen, Germany
hotline +49 7071 9457 674   http://www.science-computing.de
-- 
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs, Dr. Arno Steitz
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From frederik.ferner at diamond.ac.uk  Wed Aug  6 10:19:35 2014
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Wed, 6 Aug 2014 10:19:35 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
Message-ID: <53E1F327.1000605@diamond.ac.uk>

On 05/08/14 18:55, Scott Fadden wrote:
> Is anyone running GPFS and Lustre on the same nodes. I have seen it
> work, I have heard people are doing it, I am looking for some confirmation.

Most of our compute cluster nodes are clients for Lustre and GPFS at the 
same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on servers 
(GPFS NSD server or Lustre OSS/MDS servers).

HTH,
Frederik

-- 
Frederik Ferner
Senior Computer Systems Administrator   phone: +44 1235 77 8624
Diamond Light Source Ltd.               mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
 

From sdinardo at ebi.ac.uk  Wed Aug  6 10:57:44 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 06 Aug 2014 10:57:44 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E1F327.1000605@diamond.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
	<53E1F327.1000605@diamond.ac.uk>
Message-ID: <53E1FC18.6080707@ebi.ac.uk>

Sorry for this little ot, but recetly i'm looking to Lustre to 
understand how it is comparable to GPFS in terms of performance, 
reliability and easy to use.
Could anyone share their experience ?

My company just recently got a first GPFS system , based on IBM GSS, but 
while its good performance wise, there are few unresolved problems and 
the IBM support is almost unexistent, so I'm starting to wonder if its 
work to look somewhere else  eventual future purchases.


Salvatore

On 06/08/14 10:19, Frederik Ferner wrote:
> On 05/08/14 18:55, Scott Fadden wrote:
>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>> work, I have heard people are doing it, I am looking for some 
>> confirmation.
>
> Most of our compute cluster nodes are clients for Lustre and GPFS at 
> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
> servers (GPFS NSD server or Lustre OSS/MDS servers).
>
> HTH,
> Frederik
>


From chair at gpfsug.org  Wed Aug  6 11:19:24 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Wed, 06 Aug 2014 11:19:24 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E1FC18.6080707@ebi.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>
Message-ID: <53E2012C.9040402@gpfsug.org>

"IBM support is almost unexistent"

I don't find that at all.
Do you log directly via ESC or via your OEM/integrator or are you only 
referring to GSS support rather than pure GPFS?

If you are having response issues, your IBM rep (or a few folks on here) 
can accelerate issues for you.

Jez


On 06/08/14 10:57, Salvatore Di Nardo wrote:
> Sorry for this little ot, but recetly i'm looking to Lustre to 
> understand how it is comparable to GPFS in terms of performance, 
> reliability and easy to use.
> Could anyone share their experience ?
>
> My company just recently got a first GPFS system , based on IBM GSS, 
> but while its good performance wise, there are few unresolved problems 
> and the IBM support is almost unexistent, so I'm starting to wonder if 
> its work to look somewhere else  eventual future purchases.
>
>
> Salvatore
>
> On 06/08/14 10:19, Frederik Ferner wrote:
>> On 05/08/14 18:55, Scott Fadden wrote:
>>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>>> work, I have heard people are doing it, I am looking for some 
>>> confirmation.
>>
>> Most of our compute cluster nodes are clients for Lustre and GPFS at 
>> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
>> servers (GPFS NSD server or Lustre OSS/MDS servers).
>>
>> HTH,
>> Frederik
>>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From service at metamodul.com  Wed Aug  6 14:26:47 2014
From: service at metamodul.com (service at metamodul.com)
Date: Wed, 6 Aug 2014 15:26:47 +0200 (CEST)
Subject: [gpfsug-discuss] Hi , i am new to this list
Message-ID: <1366482624.222989.1407331607965.open-xchange@oxbaltgw55.schlund.de>

Hi @ALL
i am Hajo Ehlers , an AIX and GPFS specialist ( Unix System Engineer ). You find
me at the IBM GPFS Forum and sometimes at news:c.u.a  and I am addicted to
cluster filesystems

My latest idee is an SAP-HANA light system ( DBMS on an in-memory cluster posix
FS ) which could be extended to a "reinvented" Cluster based AS/400 ^_^
I wrote also a small script to do a sequential backup of GPFS filesystems since
i got never used to mmbackup - i named it "pdsmc" for parallel dsmc".


Cheers
Hajo

BTW: Please let me know - service (at) metamodul (dot) com - In case somebody is
looking for a GPFS specialist.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140806/3c01d53a/attachment.htm>

From sdinardo at ebi.ac.uk  Fri Aug  8 10:53:36 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 08 Aug 2014 10:53:36 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E2012C.9040402@gpfsug.org>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>	<53E1FC18.6080707@ebi.ac.uk>
	<53E2012C.9040402@gpfsug.org>
Message-ID: <53E49E20.1090905@ebi.ac.uk>

Well, i didn't wanted to start a rant against IBM, and I'm referring 
specifically to GSS.

Since GSS its an appliance, we have to refer to GSS support for both 
hardware and software issues. Hardware support in total crap. It took 1 
mounth of chasing and shouting to get a drawer replacement that was 
causing some issues. Meanwhile 10 disks in that drawer got faulty. 
Finally we got the drawer replace but the disks are still faulty. Now 
its 3 days i'm triing to get them fixed or replaced ( its not clear if 
they disks are broken of they was just marked to be replaced because of 
the drawer). Right now i dont have any answer about how to put them 
online ( mmchcarrier don't work because it recognize that the disk where 
not replaced)

There are also few other cases ( gpfs related)  open that are still not 
answered. I have no experience with direct GPFS support, but if i open a 
case to GSS for a GPFS problem, the cases seems never get an answer.

The only reason that GSS is working its because _*I*_**installed it 
spending few months studying gpfs. So now I'm wondering if its worth at 
all rely in future on the whole appliance concept.

I'm wondering if in future its better just purchase the hardware and 
install GPFS by our own, or in alternatively even try Lustre.


Now, skipping all this GSS rant, which have nothing to do with the file 
system anyway  and  going back to my question:

Could someone point the main differences between GPFS and Lustre?

I found some documentation about Lustre and i'm going to have a look, 
but oddly enough have not found any practical comparison between them.


On 06/08/14 11:19, Jez Tucker (Chair) wrote:
> "IBM support is almost unexistent"
>
> I don't find that at all.
> Do you log directly via ESC or via your OEM/integrator or are you only 
> referring to GSS support rather than pure GPFS?
>
> If you are having response issues, your IBM rep (or a few folks on 
> here) can accelerate issues for you.
>
> Jez
>
>
> On 06/08/14 10:57, Salvatore Di Nardo wrote:
>> Sorry for this little ot, but recetly i'm looking to Lustre to 
>> understand how it is comparable to GPFS in terms of performance, 
>> reliability and easy to use.
>> Could anyone share their experience ?
>>
>> My company just recently got a first GPFS system , based on IBM GSS, 
>> but while its good performance wise, there are few unresolved 
>> problems and the IBM support is almost unexistent, so I'm starting to 
>> wonder if its work to look somewhere else eventual future purchases.
>>
>>
>> Salvatore
>>
>> On 06/08/14 10:19, Frederik Ferner wrote:
>>> On 05/08/14 18:55, Scott Fadden wrote:
>>>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>>>> work, I have heard people are doing it, I am looking for some 
>>>> confirmation.
>>>
>>> Most of our compute cluster nodes are clients for Lustre and GPFS at 
>>> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
>>> servers (GPFS NSD server or Lustre OSS/MDS servers).
>>>
>>> HTH,
>>> Frederik
>>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/04e3e4ae/attachment.htm>

From jpro at bas.ac.uk  Fri Aug  8 12:40:00 2014
From: jpro at bas.ac.uk (Jeremy Robst)
Date: Fri, 8 Aug 2014 12:40:00 +0100 (BST)
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E49E20.1090905@ebi.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
	<53E1F327.1000605@diamond.ac.uk> <53E1FC18.6080707@ebi.ac.uk>
	<53E2012C.9040402@gpfsug.org> <53E49E20.1090905@ebi.ac.uk>
Message-ID: <alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>

On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:

> Now, skipping all this GSS rant, which have nothing to do with the file
> system anyway? and? going back to my question:
> 
> Could someone point the main differences between GPFS and Lustre?

I'm looking at making the same decision here - to buy GPFS or to roll our 
own Lustre configuration. I'm in the process of setting up test systems, 
and so far the main difference seems to be in the that in GPFS each server 
sees the full filesystem, and so you can run other applications (e.g 
backup) on a GPFS server whereas the Luste OSS (object storage servers) 
see only a portion of the storage (the filesystem is striped across the 
OSSes), so you need a Lustre client to mount the full filesystem for 
things like backup.

However I have very little practical experience of either and would also 
be interested in any comments.

Thanks

Jeremy
-- 
jpro at bas.ac.uk | (work) 01223 221402 (fax) 01223 362616
Unix System Administrator - British Antarctic Survey
#include <disclaimer.std>

From keith at ocf.co.uk  Fri Aug  8 14:12:39 2014
From: keith at ocf.co.uk (Keith Vickers)
Date: Fri, 8 Aug 2014 14:12:39 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
Message-ID: <A42128435E851644B9B011BB824F6C816F56CAF8F0@MAIL.ocf.local>

http://www.pdsw.org/pdsw10/resources/posters/parallelNASFSs.pdf

Has a good direct apples to apples comparison between Lustre and GPFS. It's pretty much abstractable from the hardware used.

Keith Vickers
Business Development Manager
OCF plc
Mobile: 07974 397863


From sergi.more at bsc.es  Fri Aug  8 14:14:33 2014
From: sergi.more at bsc.es (=?ISO-8859-1?Q?Sergi_Mor=E9_Codina?=)
Date: Fri, 08 Aug 2014 15:14:33 +0200
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>
	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
Message-ID: <53E4CD39.7080808@bsc.es>

Hi all,

About main differences between GPFS and Lustre, here you have some bits 
from our experience:

-Reliability: GPFS its been proved to be more stable and reliable. Also 
offers more flexibility in terms of fail-over. It have no restriction in 
number of servers. As far as I know, an NSD can have as many secondary 
servers as you want (we are using 8).

-Metadata: In Lustre each file system is restricted to two servers. No 
restriction in GPFS.

-Updates: In GPFS you can update the whole storage cluster without 
stopping production, one server at a time.

-Server/Client role: As Jeremy said, in GPFS every server act as a 
client as well. Useful for administrative tasks.

-Troubleshooting: Problems with GPFS are easier to track down. Logs are 
more clear, and offers better tools than Lustre.

-Support: No problems at all with GPFS support. It is true that it could 
take time to go up within all support levels, but we always got a good 
solution. Quite different in terms of hardware. IBM support quality has 
drop a lot since about last year an a half. Really slow and tedious 
process to get replacements. Moreover, we keep receiving bad "certified 
reutilitzed parts" hardware, which slow the whole process even more.


These are the main differences I would stand out after some years of 
experience with both file systems, but do not take it as a fact.

PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI 
a couple of months ago, and has experience working with both file 
systems here at BSC.

Best Regards,
Sergi.


On 08/08/2014 01:40 PM, Jeremy Robst wrote:
> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>
>> Now, skipping all this GSS rant, which have nothing to do with the file
>> system anyway  and  going back to my question:
>>
>> Could someone point the main differences between GPFS and Lustre?
>
> I'm looking at making the same decision here - to buy GPFS or to roll
> our own Lustre configuration. I'm in the process of setting up test
> systems, and so far the main difference seems to be in the that in GPFS
> each server sees the full filesystem, and so you can run other
> applications (e.g backup) on a GPFS server whereas the Luste OSS (object
> storage servers) see only a portion of the storage (the filesystem is
> striped across the OSSes), so you need a Lustre client to mount the full
> filesystem for things like backup.
>
> However I have very little practical experience of either and would also
> be interested in any comments.
>
> Thanks
>
> Jeremy
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 

------------------------------------------------------------------------

      Sergi More Codina
      Barcelona Supercomputing Center
      Centro Nacional de Supercomputacion
      WWW: http://www.bsc.es      Tel: +34-93-405 42 27
      e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21

------------------------------------------------------------------------

WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3242 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/ccba0783/attachment.bin>

From viccornell at gmail.com  Fri Aug  8 18:15:30 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Fri, 8 Aug 2014 18:15:30 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E4CD39.7080808@bsc.es>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>
	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
	<53E4CD39.7080808@bsc.es>
Message-ID: <4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>

Disclaimers - I work for DDN - we sell lustre and GPFS. I know GPFS much better than I know Lustre.

The biggest difference we find between GPFS and Lustre is that GPFS - can usually achieve 90% of the bandwidth available to a single client with a single thread.

Lustre needs multiple parallel streams to saturate - say an Infiniband connection.

Lustre is often faster than GPFS and often has superior metadata performance - particularly where lots of files are created in a single directory.

GPFS can support Windows - Lustre cannot. I think GPFS is better integrated and easier to deploy than Lustre - some people disagree with me.

Regards,

Vic


On 8 Aug 2014, at 14:14, Sergi Mor? Codina <sergi.more at bsc.es> wrote:

> Hi all,
> 
> About main differences between GPFS and Lustre, here you have some bits from our experience:
> 
> -Reliability: GPFS its been proved to be more stable and reliable. Also offers more flexibility in terms of fail-over. It have no restriction in number of servers. As far as I know, an NSD can have as many secondary servers as you want (we are using 8).
> 
> -Metadata: In Lustre each file system is restricted to two servers. No restriction in GPFS.
> 
> -Updates: In GPFS you can update the whole storage cluster without stopping production, one server at a time.
> 
> -Server/Client role: As Jeremy said, in GPFS every server act as a client as well. Useful for administrative tasks.
> 
> -Troubleshooting: Problems with GPFS are easier to track down. Logs are more clear, and offers better tools than Lustre.
> 
> -Support: No problems at all with GPFS support. It is true that it could take time to go up within all support levels, but we always got a good solution. Quite different in terms of hardware. IBM support quality has drop a lot since about last year an a half. Really slow and tedious process to get replacements. Moreover, we keep receiving bad "certified reutilitzed parts" hardware, which slow the whole process even more.
> 
> 
> These are the main differences I would stand out after some years of experience with both file systems, but do not take it as a fact.
> 
> PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI a couple of months ago, and has experience working with both file systems here at BSC.
> 
> Best Regards,
> Sergi.
> 
> 
> On 08/08/2014 01:40 PM, Jeremy Robst wrote:
>> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>> 
>>> Now, skipping all this GSS rant, which have nothing to do with the file
>>> system anyway  and  going back to my question:
>>> 
>>> Could someone point the main differences between GPFS and Lustre?
>> 
>> I'm looking at making the same decision here - to buy GPFS or to roll
>> our own Lustre configuration. I'm in the process of setting up test
>> systems, and so far the main difference seems to be in the that in GPFS
>> each server sees the full filesystem, and so you can run other
>> applications (e.g backup) on a GPFS server whereas the Luste OSS (object
>> storage servers) see only a portion of the storage (the filesystem is
>> striped across the OSSes), so you need a Lustre client to mount the full
>> filesystem for things like backup.
>> 
>> However I have very little practical experience of either and would also
>> be interested in any comments.
>> 
>> Thanks
>> 
>> Jeremy
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
> 
> 
> -- 
> 
> ------------------------------------------------------------------------
> 
>     Sergi More Codina
>     Barcelona Supercomputing Center
>     Centro Nacional de Supercomputacion
>     WWW: http://www.bsc.es      Tel: +34-93-405 42 27
>     e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21
> 
> ------------------------------------------------------------------------
> 
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
> 
> http://www.bsc.es/disclaimer.htm
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at us.ibm.com  Fri Aug  8 20:09:44 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Fri, 8 Aug 2014 12:09:44 -0700
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>	<53E4CD39.7080808@bsc.es>
	<4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>
Message-ID: <OFA962BCA7.ED55EAB6-ON88257D2E.00665F04-88257D2E.00694311@us.ibm.com>

Vic, Sergi,

you can not compare Lustre and GPFS without providing a clear usecase as 
otherwise you compare apple with oranges. 
the reason for this is quite simple, Lustre plays well in pretty much one 
usecase - HPC, GPFS on the other hand is used in many forms of deployments 
from Storage for Virtual Machines, HPC, Scale-Out NAS, Solutions in 
digital media, to hosting some of the biggest, most business critical 
Transactional database installations in the world. you look at 2 products 
with completely different usability spectrum, functions and features 
unless as said above you narrow it down to a very specific usecase with a 
lot of details.
even just HPC has a very large spectrum and not everybody is working in a 
single directory, which is the main scale point for Lustre compared to 
GPFS and the reason is obvious, if you have only 1 active metadata server 
(which is what 99% of all lustre systems run) some operations like single 
directory contention is simpler to make fast, but only up to the limit of 
your one node, but what happens when you need to go beyond that and only a 
real distributed architecture can support your workload ? 
for example look at most chip design workloads, which is a form of HPC, it 
is something thats extremely metadata and small file dominated, you talk 
about 100's of millions (in some cases even billions) of files, majority 
of them <4k, the rest larger files , majority of it with random access 
patterns that benefit from massive client side caching and distributed 
data coherency models supported by GPFS token manager infrastructure 
across 10's or 100's of metadata server and 1000's of compute nodes. 
you also need to look at the rich feature set GPFS provides, which not all 
may be important for some environments but are for others like Snapshot, 
Clones, Hierarchical Storage Management (ILM) , Local Cache acceleration 
(LROC), Global Namespace Wan Integration (AFM), Encryption, etc just to 
name a few. 

Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Vic Cornell <viccornell at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   08/08/2014 10:16 AM
Subject:        Re: [gpfsug-discuss] GPFS and Lustre on same node
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Disclaimers - I work for DDN - we sell lustre and GPFS. I know GPFS much 
better than I know Lustre.

The biggest difference we find between GPFS and Lustre is that GPFS - can 
usually achieve 90% of the bandwidth available to a single client with a 
single thread.

Lustre needs multiple parallel streams to saturate - say an Infiniband 
connection.

Lustre is often faster than GPFS and often has superior metadata 
performance - particularly where lots of files are created in a single 
directory.

GPFS can support Windows - Lustre cannot. I think GPFS is better 
integrated and easier to deploy than Lustre - some people disagree with 
me.

Regards,

Vic


On 8 Aug 2014, at 14:14, Sergi Mor? Codina <sergi.more at bsc.es> wrote:

> Hi all,
> 
> About main differences between GPFS and Lustre, here you have some bits 
from our experience:
> 
> -Reliability: GPFS its been proved to be more stable and reliable. Also 
offers more flexibility in terms of fail-over. It have no restriction in 
number of servers. As far as I know, an NSD can have as many secondary 
servers as you want (we are using 8).
> 
> -Metadata: In Lustre each file system is restricted to two servers. No 
restriction in GPFS.
> 
> -Updates: In GPFS you can update the whole storage cluster without 
stopping production, one server at a time.
> 
> -Server/Client role: As Jeremy said, in GPFS every server act as a 
client as well. Useful for administrative tasks.
> 
> -Troubleshooting: Problems with GPFS are easier to track down. Logs are 
more clear, and offers better tools than Lustre.
> 
> -Support: No problems at all with GPFS support. It is true that it could 
take time to go up within all support levels, but we always got a good 
solution. Quite different in terms of hardware. IBM support quality has 
drop a lot since about last year an a half. Really slow and tedious 
process to get replacements. Moreover, we keep receiving bad "certified 
reutilitzed parts" hardware, which slow the whole process even more.
> 
> 
> These are the main differences I would stand out after some years of 
experience with both file systems, but do not take it as a fact.
> 
> PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI 
a couple of months ago, and has experience working with both file systems 
here at BSC.
> 
> Best Regards,
> Sergi.
> 
> 
> On 08/08/2014 01:40 PM, Jeremy Robst wrote:
>> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>> 
>>> Now, skipping all this GSS rant, which have nothing to do with the 
file
>>> system anyway  and  going back to my question:
>>> 
>>> Could someone point the main differences between GPFS and Lustre?
>> 
>> I'm looking at making the same decision here - to buy GPFS or to roll
>> our own Lustre configuration. I'm in the process of setting up test
>> systems, and so far the main difference seems to be in the that in GPFS
>> each server sees the full filesystem, and so you can run other
>> applications (e.g backup) on a GPFS server whereas the Luste OSS 
(object
>> storage servers) see only a portion of the storage (the filesystem is
>> striped across the OSSes), so you need a Lustre client to mount the 
full
>> filesystem for things like backup.
>> 
>> However I have very little practical experience of either and would 
also
>> be interested in any comments.
>> 
>> Thanks
>> 
>> Jeremy
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
> 
> 
> -- 
> 
> ------------------------------------------------------------------------
> 
>     Sergi More Codina
>     Barcelona Supercomputing Center
>     Centro Nacional de Supercomputacion
>     WWW: http://www.bsc.es      Tel: +34-93-405 42 27
>     e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21
> 
> ------------------------------------------------------------------------
> 
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
> 
> http://www.bsc.es/disclaimer.htm
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/f4de4ccd/attachment.htm>

From kraemerf at de.ibm.com  Sat Aug  9 15:03:02 2014
From: kraemerf at de.ibm.com (Frank Kraemer)
Date: Sat, 9 Aug 2014 16:03:02 +0200
Subject: [gpfsug-discuss] GPFS and Lustre
In-Reply-To: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>
References: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>
Message-ID: <OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>

Vic, Sergi,

from my point of view for real High-End workloads the complete I/O stack
needs to be fine tuned and well understood in order to provide a good
system to the users.

- Application(s)
+ I/O Lib(s)
+ MPI
+ Parallel Filesystem (e.g. GPFS)
+ Hardware (Networks, Servers, Disks, etc.)

One of the best solutions to bring your application very efficently to work
with a Parallel FS is Sionlib from FZ Juelich:

Sionlib is a scalable I/O library for the parallel access to task-local
files. The library not only supports writing and reading binary data to or
from from several thousands of processors into a single or a small number
of physical files but also provides for global open and close functions to
access SIONlib file in parallel. SIONlib provides different interfaces:
parallel access using MPI, OpenMp, or their combination and sequential
access for post-processing utilities.

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/SIONlib/_node.html
http://apps.fz-juelich.de/jsc/sionlib/html/sionlib_tutorial_2013.pdf

-frank-

P.S. Nice blog from Nils

https://www.ibm.com/developerworks/community/blogs/storageneers/entry/scale_out_backup_with_tsm_and_gss_performance_test_results?lang=en

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany


From ewahl at osc.edu  Mon Aug 11 14:55:48 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Mon, 11 Aug 2014 13:55:48 +0000
Subject: [gpfsug-discuss] GPFS and Lustre
In-Reply-To: <OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>
References: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>,
	<OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>
Message-ID: <C59E5201836F7147BAD35189FFBB35D101164CE365@USOAPP09V04P.si.lan>

In a similar vein, IBM has an application transparent "File Cache Library" as well.  I believe it IS licensed and the only requirement is that it is for use on IBM hardware only.  Saw some presentations that mention it in some BioSci talks @SC13 and the numbers for a couple of selected small read applications were awesome. 

I probably have the contact info for it around here somewhere.  In addition to the pdf/user manual.

Ed Wahl
Ohio Supercomputer Center

________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Frank Kraemer [kraemerf at de.ibm.com]
Sent: Saturday, August 09, 2014 10:03 AM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] GPFS and Lustre

Vic, Sergi,

from my point of view for real High-End workloads the complete I/O stack
needs to be fine tuned and well understood in order to provide a good
system to the users.

- Application(s)
+ I/O Lib(s)
+ MPI
+ Parallel Filesystem (e.g. GPFS)
+ Hardware (Networks, Servers, Disks, etc.)

One of the best solutions to bring your application very efficently to work
with a Parallel FS is Sionlib from FZ Juelich:

Sionlib is a scalable I/O library for the parallel access to task-local
files. The library not only supports writing and reading binary data to or
from from several thousands of processors into a single or a small number
of physical files but also provides for global open and close functions to
access SIONlib file in parallel. SIONlib provides different interfaces:
parallel access using MPI, OpenMp, or their combination and sequential
access for post-processing utilities.

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/SIONlib/_node.html
http://apps.fz-juelich.de/jsc/sionlib/html/sionlib_tutorial_2013.pdf

-frank-

P.S. Nice blog from Nils

https://www.ibm.com/developerworks/community/blogs/storageneers/entry/scale_out_backup_with_tsm_and_gss_performance_test_results?lang=en

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From sabujp at gmail.com  Tue Aug 12 23:16:22 2014
From: sabujp at gmail.com (Sabuj Pattanayek)
Date: Tue, 12 Aug 2014 17:16:22 -0500
Subject: [gpfsug-discuss] reduce cnfs failover time to a few seconds
Message-ID: <CAEeMGHvSCrCW-3i6_+xQK5A+6P7wfj_4gOia8iWyyQwe0KA-tQ@mail.gmail.com>

Hi all,

Is there anyway to reduce CNFS failover time to just a few seconds?
Currently it seems like it's taking 5 - 10 minutes. We're using virtual
ip's, i.e. interface bond1.1550:0 has one of the cnfs vips, so it should
be fast, but it takes a long time and sometimes causes processes to
crash due to NFS timeouts (some have 600 second soft mount timeouts).
We've also noticed that it sometimes takes even longer unless the cnfs
system on which we're calling mmshutdown is completely shutdown and
isn't returning pings. Even 1 min seems too long.

For comparison, I'm running ctdb + samba on the other NSDs and it's
able to failover in a few seconds after mmshutdown completes.

Thanks,
Sabuj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140812/3495802f/attachment.htm>

From sdinardo at ebi.ac.uk  Fri Aug 15 14:31:29 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 15 Aug 2014 14:31:29 +0100
Subject: [gpfsug-discuss] gpfs client expels, fs hangind and waiters
Message-ID: <53EE0BB1.8000005@ebi.ac.uk>

Hello people,
Its quite a bit of time that i'm triing to solve a problem to our GPFS 
system, without much luck so i think its time to ask some help.

*First of a bit of introduction:**
*
Our GPFS system is made by 3xgss-26, In other words its made with 6x 
servers ( 4x10g links each) and several disk enclosures SAS attacked. 
The todal amount of spare its roughly 2PB, and the disks are SATA ( 
except few SSD dedicated to logtip ). My metadata and on dedicated 
vdisks, but both data and metadata vdiosks are in the same declustered 
arrays and recovery groups, so in the end they share the same spindles.

The clients its a LSF farm configured as another cluster ( standard 
multiclustering configuration) of  roughly 600 nodes .


*The issue:**
*
Recently we became aware that when some massive io request has been done 
we experience a lot of client expells. Heres an example of our logs:

        Fri Aug 15 12:40:24.680 2014: Expel 10.7.28.34 (gss03a) request
        from 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:41.652 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:45.754 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:52.305 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:17.069 2014: Expel 10.7.28.35 (gss03b) request
        from 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:23.555 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.258 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.540 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:42:57.288 2014: Expel 10.7.35.5 (ebi5-132 in
        ebi-cluster.ebi.ac.uk) request from 10.7.28.34 (gss03a).
        Expelling: 10.7.35.5 (ebi5-132 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:43:24.327 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:44:54.202 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:15:54.450 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:20:16.524 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:26:54.177 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:27:53.900 2014: Expel 10.7.28.3 (gss01b) request
        from 10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:28:24.297 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:29:23.913 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk)

at the same time we experience also long waiters queue (1000+ lines). An 
example in case of massive writes ( dd ) :

        0x7F522E1EEF90 waiting 1.861233182 seconds, NSDThread: on ThCond
        0x7F5158019B08 (0x7F5158019B08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.101 <c0n362>
        0x7F522E1EC9B0 waiting 1.490567470 seconds, NSDThread: on ThCond
        0x7F50F4038BA8 (0x7F50F4038BA8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1EB6C0 waiting 1.077098046 seconds, NSDThread: on ThCond
        0x7F50B40011F8 (0x7F50B40011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.156 <c0n647>
        0x7F522E1EA3D0 waiting 7.714968554 seconds, NSDThread: on ThCond
        0x7F50BC0078B8 (0x7F50BC0078B8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.107 <c0n455>
        0x7F522E1E90E0 waiting 4.774379417 seconds, NSDThread: on ThCond
        0x7F506801B1F8 (0x7F506801B1F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.23 <c0n493>
        0x7F522E1E7DF0 waiting 0.746172444 seconds, NSDThread: on ThCond
        0x7F5094007D78 (0x7F5094007D78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.84 <c0n18>
        0x7F522E1E6B00 waiting 1.553030487 seconds, NSDThread: on ThCond
        0x7F51C0004C78 (0x7F51C0004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.63 <c0n153>
        0x7F522E1E5810 waiting 2.165307633 seconds, NSDThread: on ThCond
        0x7F5178016A08 (0x7F5178016A08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.29 <c0n686>
        0x7F522E1E4520 waiting 1.128089273 seconds, NSDThread: on ThCond
        0x7F5074004D98 (0x7F5074004D98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1E3230 waiting 2.515214328 seconds, NSDThread: on ThCond
        0x7F51F400EF08 (0x7F51F400EF08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.90 <c0n494>
        0x7F522E1E1F40 waiting*162.966840834* seconds, NSDThread: on
        ThCond 0x7F51840207A8 (0x7F51840207A8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.34.97 <c0n6>
        0x7F522E1E0C50 waiting 1.140787288 seconds, NSDThread: on ThCond
        0x7F51AC005C08 (0x7F51AC005C08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.94 <c0n664>
        0x7F522E1DF960 waiting 41.907415248 seconds, NSDThread: on
        ThCond 0x7F5160019038 (0x7F5160019038) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1DE670 waiting 0.466560418 seconds, NSDThread: on ThCond
        0x7F513802B258 (0x7F513802B258) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.168 <c0n598>
        0x7F522E1DD380 waiting 3.102803621 seconds, NSDThread: on ThCond
        0x7F516C0106C8 (0x7F516C0106C8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.91 <c0n143>
        0x7F522E1DC090 waiting 2.751614295 seconds, NSDThread: on ThCond
        0x7F504C0011F8 (0x7F504C0011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.25 <c0n651>
        0x7F522E1DADA0 waiting 5.083691891 seconds, NSDThread: on ThCond
        0x7F507401BE88 (0x7F507401BE88) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1D9AB0 waiting 2.263374184 seconds, NSDThread: on ThCond
        0x7F5080003B98 (0x7F5080003B98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.36 <c0n225>
        0x7F522E1D87C0 waiting 0.206989639 seconds, NSDThread: on ThCond
        0x7F505801F0D8 (0x7F505801F0D8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.55 <c0n498>
        0x7F522E1D74D0 waiting *41.841279897* seconds, NSDThread: on
        ThCond 0x7F5194008B88 (0x7F5194008B88) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1D61E0 waiting 5.618652361 seconds, NSDThread: on ThCond
        0x1BAB868 (0x1BAB868) (MsgRecordCondvar), reason 'RPC wait' for
        getData on node 10.7.35.59 <c0n532>
        0x7F522E1D4EF0 waiting 6.185658427 seconds, NSDThread: on ThCond
        0x7F513802AAE8 (0x7F513802AAE8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.6 <c0n330>
        0x7F522E1D3C00 waiting 2.652370892 seconds, NSDThread: on ThCond
        0x7F5130004C78 (0x7F5130004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1D2910 waiting 11.396142225 seconds, NSDThread: on
        ThCond 0x7F51A401C0C8 (0x7F51A401C0C8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.169 <c0n549>
        0x7F522E1D1620 waiting 63.710723043 seconds, NSDThread: on
        ThCond 0x7F5038004D08 (0x7F5038004D08) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.37.120 <c0n8>


or for massive reads:

        0x7FBCE69A8C20 waiting 29.262629530 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE699CEC0 waiting 29.260869141 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE698C5A0 waiting 29.124824888 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6984110 waiting 22.729479654 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69512C0 waiting 29.272805926 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69409A0 waiting 28.833650198 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6924320 waiting 29.237067128 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6921D40 waiting 29.237953228 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6915FE0 waiting 29.046721161 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6913A00 waiting 29.264534710 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6900B00 waiting 29.267691105 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68F7380 waiting 29.266402464 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68D2870 waiting 29.276298231 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68BADB0 waiting 28.665700576 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68B61F0 waiting 29.236878611 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6885980 waiting *144*.530487248 seconds, NSDThread: on
        ThMutex 0x1803396A670 (0xFFFFC9003396A670) (DiskSchedulingMutex)
        0x7FBCE68833A0 waiting 29.231066610 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68820B0 waiting 29.269954514 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE686A5F0 waiting *140*.662994256 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE6864740 waiting 29.254180742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE683FC30 waiting 29.271840565 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE682E020 waiting 29.200969209 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6825B90 waiting 19.136732919 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6805C40 waiting 29.236055550 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FEAA0 waiting 29.283264161 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FC4C0 waiting 29.268992663 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67DFE40 waiting 29.150900786 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D2DF0 waiting 29.199058463 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D1B00 waiting 29.203199738 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67768D0 waiting 29.208231742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6768590 waiting 5.228192589 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE67672A0 waiting 29.252839376 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6757C70 waiting 28.869359044 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6748640 waiting 29.289284179 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6734450 waiting 29.253591817 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6730B80 waiting 29.289987273 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6720260 waiting 26.597589551 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66F32C0 waiting 29.177692849 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66E3C90 waiting 29.160268518 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66CC1D0 waiting 5.334330188 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE66B3420 waiting 34.274433161 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE668E910 waiting 27.699999488 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6689D50 waiting 34.279090465 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE66805D0 waiting 24.688626241 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6675B60 waiting 35.367745840 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE665E0A0 waiting 29.235994598 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE663CE60 waiting 29.162911979 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'


Another example with mmfsadm in case of massive reads:

        [root at gss02b ~]# mmfsadm dump waiters
        0x7F519000AEA0 waiting 28.915010347 seconds, replyCleanupThread:
        on ThCond 0x7F51101B27B8 (0x7F51101B27B8) (MsgRecordCondvar),
        reason 'RPC wait'
        0x7F511C012A10 waiting 279.522206863 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F52000095F8 (0x7F52000095F8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5120000B80 waiting 279.524782437 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5214000EE8 (0x7F5214000EE8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5154006310 waiting 138.164386224 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5174003F08 (0x7F5174003F08)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F522E1EB6C0 waiting 23.060703000 seconds, NSDThread: for poll
        on sock 85
        0x7F522E1E6B00 waiting 0.068456104 seconds, NSDThread: on ThCond
        0x7F50CC00E478 (0x7F50CC00E478) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1D0330 waiting 17.207907857 seconds, NSDThread: on
        ThCond 0x7F5078001688 (0x7F5078001688) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E1BFA10 waiting 0.181011711 seconds, NSDThread: on ThCond
        0x7F504000E558 (0x7F504000E558) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B4FA0 waiting 0.021780338 seconds, NSDThread: on ThCond
        0x7F522000E488 (0x7F522000E488) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B3CB0 waiting 0.794718000 seconds, NSDThread: for poll
        on sock 799
        0x7F522E186D10 waiting 0.191606803 seconds, NSDThread: on ThCond
        0x7F5184015D58 (0x7F5184015D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E184730 waiting 0.025562000 seconds, NSDThread: for poll
        on sock 867
        0x7F522E12CDD0 waiting 0.008921000 seconds, NSDThread: for poll
        on sock 543
        0x7F522E126F20 waiting 1.459531000 seconds, NSDThread: for poll
        on sock 983
        0x7F522E10F460 waiting 17.177936972 seconds, NSDThread: on
        ThCond 0x7F51EC002CE8 (0x7F51EC002CE8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E101120 waiting 17.232580316 seconds, NSDThread: on
        ThCond 0x7F51BC005BB8 (0x7F51BC005BB8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0F1AF0 waiting 438.556030000 seconds, NSDThread: for
        poll on sock 496
        0x7F522E0E7080 waiting 393.702839774 seconds, NSDThread: on
        ThCond 0x7F5164013668 (0x7F5164013668) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E09DA60 waiting 52.746984660 seconds, NSDThread: on
        ThCond 0x7F506C008858 (0x7F506C008858) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E084CB0 waiting 23.096688206 seconds, NSDThread: on
        ThCond 0x7F521C008E18 (0x7F521C008E18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0839C0 waiting 0.093456000 seconds, NSDThread: for poll
        on sock 962
        0x7F522E076970 waiting 2.236659731 seconds, NSDThread: on ThCond
        0x7F51E0027538 (0x7F51E0027538) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E044E10 waiting 52.752497765 seconds, NSDThread: on
        ThCond 0x7F513802BDD8 (0x7F513802BDD8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E033200 waiting 16.157355796 seconds, NSDThread: on
        ThCond 0x7F5104240D58 (0x7F5104240D58) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E02AD70 waiting 436.025203220 seconds, NSDThread: on
        ThCond 0x7F50E0016C28 (0x7F50E0016C28) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E01A450 waiting 393.673252777 seconds, NSDThread: on
        ThCond 0x7F50A8009C18 (0x7F50A8009C18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DFE0460 waiting 1.781358358 seconds, NSDThread: on ThCond
        0x7F51E0027638 (0x7F51E0027638) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF99420 waiting 0.038405427 seconds, NSDThread: on ThCond
        0x7F50F0172B18 (0x7F50F0172B18) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF7CDA0 waiting 438.204625355 seconds, NSDThread: on
        ThCond 0x7F50900023D8 (0x7F50900023D8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF76EF0 waiting 435.903645734 seconds, NSDThread: on
        ThCond 0x7F5084004BC8 (0x7F5084004BC8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF74910 waiting 21.749325022 seconds, NSDThread: on
        ThCond 0x7F507C011F48 (0x7F507C011F48) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF71040 waiting 1.027274000 seconds, NSDThread: for poll
        on sock 866
        0x7F522DF536D0 waiting 52.953847324 seconds, NSDThread: on
        ThCond 0x7F5200006FF8 (0x7F5200006FF8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF510F0 waiting 0.039278000 seconds, NSDThread: for poll
        on sock 837
        0x7F522DF4EB10 waiting 0.085745937 seconds, NSDThread: on ThCond
        0x7F51F0006828 (0x7F51F0006828) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF4C530 waiting 21.850733000 seconds, NSDThread: for poll
        on sock 986
        0x7F522DF4B240 waiting 0.054739884 seconds, NSDThread: on ThCond
        0x7F51EC0168D8 (0x7F51EC0168D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF48C60 waiting 0.186409714 seconds, NSDThread: on ThCond
        0x7F51E4000908 (0x7F51E4000908) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF41AC0 waiting 438.942861290 seconds, NSDThread: on
        ThCond 0x7F51CC010168 (0x7F51CC010168) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF3F4E0 waiting 0.060235106 seconds, NSDThread: on ThCond
        0x7F51C400A438 (0x7F51C400A438) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF22E60 waiting 0.361288000 seconds, NSDThread: for poll
        on sock 518
        0x7F522DF21B70 waiting 0.060722464 seconds, NSDThread: on ThCond
        0x7F51580162D8 (0x7F51580162D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF12540 waiting 23.077564448 seconds, NSDThread: on
        ThCond 0x7F512C13E1E8 (0x7F512C13E1E8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEFD060 waiting 0.723370000 seconds, NSDThread: for poll
        on sock 503
        0x7F522DEE09E0 waiting 1.565799175 seconds, NSDThread: on ThCond
        0x7F5084004D58 (0x7F5084004D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDF6F0 waiting 22.063017342 seconds, NSDThread: on
        ThCond 0x7F5078003E08 (0x7F5078003E08) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEDD110 waiting 0.049108780 seconds, NSDThread: on ThCond
        0x7F5070001D78 (0x7F5070001D78) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDAB30 waiting 229.603224376 seconds, NSDThread: on
        ThCond 0x7F50680221B8 (0x7F50680221B8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DED7260 waiting 0.071855457 seconds, NSDThread: on ThCond
        0x7F506400A5A8 (0x7F506400A5A8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DED5F70 waiting 0.648324000 seconds, NSDThread: for poll
        on sock 766
        0x7F522DEC3070 waiting 1.809205756 seconds, NSDThread: on ThCond
        0x7F522000E518 (0x7F522000E518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEB1460 waiting 436.017396645 seconds, NSDThread: on
        ThCond 0x7F51E4000978 (0x7F51E4000978) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEAC8A0 waiting 393.734102000 seconds, NSDThread: for
        poll on sock 609
        0x7F522DEA3120 waiting 17.960778837 seconds, NSDThread: on
        ThCond 0x7F51B4001708 (0x7F51B4001708) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE86AA0 waiting 23.112060045 seconds, NSDThread: on
        ThCond 0x7F5154096118 (0x7F5154096118) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE64570 waiting 0.076167410 seconds, NSDThread: on ThCond
        0x7F50D8005EF8 (0x7F50D8005EF8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DE1AF50 waiting 17.460836000 seconds, NSDThread: for poll
        on sock 737
        0x7F522DE104E0 waiting 0.205037000 seconds, NSDThread: for poll
        on sock 865
        0x7F522DDB8B80 waiting 0.106192000 seconds, NSDThread: for poll
        on sock 78
        0x7F522DDA36A0 waiting 0.738921180 seconds, NSDThread: on ThCond
        0x7F505400E048 (0x7F505400E048) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD9C500 waiting 0.731118367 seconds, NSDThread: on ThCond
        0x7F503C00B518 (0x7F503C00B518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD89600 waiting 229.609363000 seconds, NSDThread: for
        poll on sock 515
        0x7F522DD567B0 waiting 1.508489195 seconds, NSDThread: on ThCond
        0x7F514C021F88 (0x7F514C021F88) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'


Another thing worth to mention is that the filesystem its totaly 
unresponsive. Even a simple "cd" to a directory or an ls to a directory 
just hangs for several minutes ( litterally). This happens also if i try 
from the NSD servers.


*Few things i have looked into:*
* Our network seems fine, there might be some bottleneck on part of 
them, and this could explain the waiters, but doesnt explain why ad some 
poit those client ask to expel the NSD servers. THis also doesn't 
justify why the FS is slow even on NSD itself.

* Disk bottleneck? i dont think so. NSD servers have cpu usage  (and io 
wait ) very low. Also mmdiag --iohist seems condirming that the 
operation on the disks are reasonable fast:


        === mmdiag: iohist ===

        I/O history:

          I/O start time RW    Buf type disk:sectorNum     nSec  time
        ms  Type  Device/NSD ID         NSD server
        --------------- -- ----------- ----------------- ----- ------- 
        ---- ------------------ ---------------
        13:54:29.209276  W        data   34:5066338808    2056 88.307 
        lcl  sdtu
        13:54:29.209277  W        data   55:5095698936    2056 27.592 
        lcl  sdaab
        13:54:29.209278  W        data  171:5104087544    2056 22.801 
        lcl  sdtg
        13:54:29.209279  W        data  116:5011812856    2056 65.983 
        lcl  sdqr
        13:54:29.209280  W        data   98:4860817912    2056 17.892 
        lcl  sddl
        13:54:29.209281  W        data  159:4999229944    2056 21.324 
        lcl  sdjg
        13:54:29.209282  W        data   84:5049561592    2056 31.932 
        lcl  sdqz
        13:54:29.209283  W        data    8:5003424248    2056 30.912 
        lcl  sdcw
        13:54:29.209284  W        data   23:4965675512    2056 27.366 
        lcl  sdpt
        13:54:29.297715  W  vdiskMDLog    2:144008496        1 0.236 
        lcl  sdkr
        13:54:29.297717  W  vdiskMDLog    0:331703600        1 0.230 
        lcl  sdcm
        13:54:29.297718  W  vdiskMDLog    1:273769776        1 0.241 
        lcl  sdbp
        13:54:29.244902  W        data   51:3857589752    2056 35.566 
        lcl  sdyi
        13:54:29.244904  W        data   10:3773703672    2056 28.512 
        lcl  sdma
        13:54:29.244905  W        data   48:3639485944    2056 24.124 
        lcl  sdel
        13:54:29.244906  W        data   25:3777897976    2056 18.691 
        lcl  sdgt
        13:54:29.244908  W        data   91:3832423928    2056 20.699 
        lcl  sdlc
        13:54:29.244909  W        data  115:3723372024    2056 30.783 
        lcl  sdho
        13:54:29.244910  W        data  173:3882755576    2056 53.241 
        lcl  sdti
        13:54:29.244911  W        data   42:3782092280    2056 22.785 
        lcl  sddz
        13:54:29.244912  W        data   45:3647874552    2056 24.289 
        lcl  sdei
        13:54:29.244913  W        data   32:3652068856    2056 17.220 
        lcl  sdbn
        13:54:29.244914  W        data   39:3677234680    2056 26.017 
        lcl  sddw
        13:54:29.298273  W  vdiskMDLog    2:144008497        1 2.522 
        lcl  sduf
        13:54:29.298274  W  vdiskMDLog    0:331703601        1 1.025 
        lcl  sdlo
        13:54:29.298275  W  vdiskMDLog    1:273769777        1 2.586 
        lcl  sdtt
        13:54:29.288275  W        data   27:2249588200    2056 20.071 
        lcl  sdhb
        13:54:29.288279  W        data   33:2224422376    2056 19.682 
        lcl  sdts
        13:54:29.288281  W        data   47:2115370472    2056 21.667 
        lcl  sdwo
        13:54:29.288282  W        data   82:2316697064    2056 21.524 
        lcl  sdxy
        13:54:29.288283  W        data   85:2232810984    2056 17.467 
        lcl  sdra
        13:54:29.288285  W        data   30:2127953384    2056 18.475 
        lcl  sdqg
        13:54:29.288286  W        data   67:1876295144    2056 16.383 
        lcl  sdmx
        13:54:29.288287  W        data   64:2127953384    2056 21.908 
        lcl  sduh
        13:54:29.288288  W        data   38:2253782504    2056 19.775 
        lcl  sddv
        13:54:29.288290  W        data   15:2207645160    2056 20.599 
        lcl  sdet
        13:54:29.288291  W        data  157:2283142632    2056 21.198 
        lcl  sdiy


Bonding problem on the interfaces? Mellanox ( interface card prodicer) 
drivers and firmware updated, and we even tested the system with a 
single link ( without bonding).


Could someone help me with this? in particular:

* What exactly are client are looking to decide that another node is 
unresponsive? Ping? i dont think so because both NSD servers and clients 
can be pinged, so what they look? if comeone can also specify what port 
are they using i can try to tcpdump what exactly is cauding this expell.

* How can i monitor metadata operations to understand where EXACTLY is 
the bottleneck that causes this:

        [sdinardo at ebi5-001 ~]$ time ls /gpfs/nobackup/sdinardo

        1                   ebi3-054.ebi.ac.uk ebi3-154           
        ebi5-019.ebi.ac.uk  ebi5-052 ebi5-101           
        ebi5-156            ebi5-197 ebi5-228            ebi5-262.ebi.ac.uk
        10                  ebi3-055 ebi3-155           
        ebi5-021.ebi.ac.uk  ebi5-053 ebi5-104.ebi.ac.uk 
        ebi5-160.ebi.ac.uk  ebi5-198 ebi5-229            ebi5-263
        2                   ebi3-056.ebi.ac.uk ebi3-156           
        ebi5-022            ebi5-054.ebi.ac.uk ebi5-106           
        ebi5-161            ebi5-200 ebi5-230.ebi.ac.uk  ebi5-264
        3                   ebi3-057 ebi3-157           
        ebi5-023            ebi5-056 ebi5-109           
        ebi5-162.ebi.ac.uk  ebi5-201 ebi5-231.ebi.ac.uk  ebi5-265
        4                   ebi3-058 ebi3-158.ebi.ac.uk 
        ebi5-024.ebi.ac.uk  ebi5-057 ebi5-110.ebi.ac.uk 
        ebi5-163.ebi.ac.uk  ebi5-202.ebi.ac.uk ebi5-232           
        ebi5-266.ebi.ac.uk
        5                   ebi3-059.ebi.ac.uk ebi3-160           
        ebi5-025            ebi5-060 ebi5-111.ebi.ac.uk 
        ebi5-164            ebi5-204 ebi5-233            ebi5-267
        6                   ebi3-132 ebi3-161.ebi.ac.uk 
        ebi5-026            ebi5-061.ebi.ac.uk ebi5-112.ebi.ac.uk 
        ebi5-165            ebi5-205 ebi5-234            ebi5-269.ebi.ac.uk
        7                   ebi3-133 ebi3-163.ebi.ac.uk 
        ebi5-028            ebi5-062.ebi.ac.uk ebi5-129.ebi.ac.uk 
        ebi5-166            ebi5-206.ebi.ac.uk ebi5-236            ebi5-270
        8                   ebi3-134 ebi3-165           
        ebi5-030            ebi5-064 ebi5-131.ebi.ac.uk 
        ebi5-169.ebi.ac.uk  ebi5-207 ebi5-237            ebi5-271
        9                   ebi3-135 ebi3-166.ebi.ac.uk 
        ebi5-031            ebi5-065 ebi5-132           
        ebi5-170.ebi.ac.uk  ebi5-209 ebi5-239.ebi.ac.uk  launcher.sh

        _*real    21m14.948s*_( WTH ?!?!?!)
        user    0m0.004s
        sys    0m0.014s


I know that the question are not easy to answer, and i need to dig more, 
but could be very helpful if someone give me some hints about where to 
look at. My gpfs skills are limited since this is our first system and 
is in production for just few months, and the things stated to worsen 
just recenlty. In past we could get over 200Gb/s ( both read and write) 
without any issue. Now some clients get expelled even when data 
thoughuput is ad 4-5Gb/s.

Thanks in advance for any help.

Regards,
Salvatore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140815/3eaa5bc1/attachment.htm>

From mail at arif-ali.co.uk  Tue Aug 19 11:18:10 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 11:18:10 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
Message-ID: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>

Hi all,

You may be aware that the website has been down for about a week now. This
is due to the amount of traffic to the website and the amount of people on
the mailing list, we had seen a few issues on the system.

In order to counter the issues, we are moving to a new system to counter
any future issues, and ease of management. We are hoping to do this tonight
( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
please let me know.

I will, as part of the move over, will be sending a few test mails to make
sure that mailing list is working correctly.

Thanks for your patience

--
Arif Ali
gpfsug Admin

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/bac2c92c/attachment.htm>

From sdinardo at ebi.ac.uk  Tue Aug 19 12:11:00 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 19 Aug 2014 12:11:00 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F330C4.808@ebi.ac.uk>

Still problems. Here some more detailed examples:

*EXAMPLE 1:*

            *EBI5-220**( CLIENT)**
            *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
            reply from node <GSS02B IP> gss02b*
            Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
            (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
            GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:04.982 2014: This node will be expelled
            from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
            IP> (ebi5-220)
            Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
            broke. Probing cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.
            Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked. 
            File system: gpfs1  Reason: SGPanic
            Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
            GSS.ebi.ac.uk) is now the Group Leader.

            *GSS02B ( NSD SERVER)*
            ...
            Tue Aug 19 11:03:17.070 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:25.016 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:28.080 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:36.019 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:39.083 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:47.023 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:50.088 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:52.218 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:58.030 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:01.092 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:03.220 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:09.034 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:12.096 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:14.224 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:20.037 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:23.103 2014: Accepted and connected to
            *<EBI5-220 IP>* ebi5-220 <c0n618>
            ...

            *GSS02a ( NSD SERVER)*
            Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
            request from <EBI5-220 IP> (ebi5-220 in
            ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
            in ebi-cluster.ebi.ac.uk)
            Tue Aug 19 11:03:12.069 2014: Accepted and connected to
            <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
*EXAMPLE 2*:

            *EBI5-038*
            Tue Aug 19 11:32:34.227 2014: *Disk lease period expired in
            cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
            Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
            cluster GSS.ebi.ac.uk*
            Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014
            IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting
            reconnect.
            ...
            LOT MORE RESETS BY PEER
            ...
            Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167
            IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
            gss02a <c1n2>
            Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
            IP> gss02a <c1n2>
            Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.*

            *GSS02a*
            Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
            in ebi-cluster.ebi.ac.uk) *is being expelled because of an
            expired lease.* Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the 
servers seems working fine.. how can i trace better ( to solve) the 
problem?

In example 2 it seems to me that for some reason the manager are not 
renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how 
can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/9b7c2042/attachment.htm>

From mail at arif-ali.co.uk  Tue Aug 19 20:59:47 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 20:59:47 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
In-Reply-To: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
References: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
Message-ID: <CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>

This is a test mail to the mailing list

please do not reply

--
Arif Ali

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali


On 19 August 2014 11:18, Arif Ali <mail at arif-ali.co.uk> wrote:

> Hi all,
>
> You may be aware that the website has been down for about a week now. This
> is due to the amount of traffic to the website and the amount of people on
> the mailing list, we had seen a few issues on the system.
>
> In order to counter the issues, we are moving to a new system to counter
> any future issues, and ease of management. We are hoping to do this tonight
> ( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
> please let me know.
>
> I will, as part of the move over, will be sending a few test mails to make
> sure that mailing list is working correctly.
>
> Thanks for your patience
>
> --
> Arif Ali
> gpfsug Admin
>
> IRC: arif-ali at freenode
> LinkedIn: http://uk.linkedin.com/in/arifali
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/c2067414/attachment.htm>

From mail at arif-ali.co.uk  Tue Aug 19 23:41:48 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 23:41:48 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
In-Reply-To: <CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>
References: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
	<CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>
Message-ID: <CAM0VtDa4pX8hi8VGkjkYYuxrW=tySdaXScOeBayHxwhcuUkAjg@mail.gmail.com>

Thanks for all your patience,

The service should all be back up again

--
Arif Ali

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali


On 19 August 2014 20:59, Arif Ali <mail at arif-ali.co.uk> wrote:

> This is a test mail to the mailing list
>
> please do not reply
>
> --
> Arif Ali
>
> IRC: arif-ali at freenode
> LinkedIn: http://uk.linkedin.com/in/arifali
>
>
> On 19 August 2014 11:18, Arif Ali <mail at arif-ali.co.uk> wrote:
>
>> Hi all,
>>
>> You may be aware that the website has been down for about a week now.
>> This is due to the amount of traffic to the website and the amount of
>> people on the mailing list, we had seen a few issues on the system.
>>
>> In order to counter the issues, we are moving to a new system to counter
>> any future issues, and ease of management. We are hoping to do this tonight
>> ( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
>> please let me know.
>>
>> I will, as part of the move over, will be sending a few test mails to
>> make sure that mailing list is working correctly.
>>
>> Thanks for your patience
>>
>> --
>> Arif Ali
>> gpfsug Admin
>>
>> IRC: arif-ali at freenode
>> LinkedIn: http://uk.linkedin.com/in/arifali
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/a82bb0f9/attachment.htm>

From sdinardo at ebi.ac.uk  Wed Aug 20 08:57:23 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 20 Aug 2014 08:57:23 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F454E3.40803@ebi.ac.uk>

Still problems. Here some more detailed examples:

*EXAMPLE 1:*

            *EBI5-220**( CLIENT)**
            *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
            reply from node <GSS02B IP> gss02b*
            Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
            (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
            GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:04.982 2014: This node will be expelled
            from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
            IP> (ebi5-220)
            Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
            broke. Probing cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.
            Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked. 
            File system: gpfs1  Reason: SGPanic
            Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
            GSS.ebi.ac.uk) is now the Group Leader.

            *GSS02B ( NSD SERVER)*
            ...
            Tue Aug 19 11:03:17.070 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:25.016 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:28.080 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:36.019 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:39.083 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:47.023 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:50.088 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:52.218 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:58.030 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:01.092 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:03.220 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:09.034 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:12.096 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:14.224 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:20.037 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:23.103 2014: Accepted and connected to
            *<EBI5-220 IP>* ebi5-220 <c0n618>
            ...

            *GSS02a ( NSD SERVER)*
            Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
            request from <EBI5-220 IP> (ebi5-220 in
            ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
            in ebi-cluster.ebi.ac.uk)
            Tue Aug 19 11:03:12.069 2014: Accepted and connected to
            <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
*EXAMPLE 2*:

            *EBI5-038*
            Tue Aug 19 11:32:34.227 2014: *Disk lease period expired in
            cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
            Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
            cluster GSS.ebi.ac.uk*
            Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014
            IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting
            reconnect.
            ...
            LOT MORE RESETS BY PEER
            ...
            Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167
            IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
            gss02a <c1n2>
            Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
            IP> gss02a <c1n2>
            Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.*

            *GSS02a*
            Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
            in ebi-cluster.ebi.ac.uk) *is being expelled because of an
            expired lease.* Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the 
servers seems working fine.. how can i trace better ( to solve) the 
problem?

In example 2 it seems to me that for some reason the manager are not 
renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how 
can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140820/b9977ac0/attachment.htm>

From sdinardo at ebi.ac.uk  Wed Aug 20 09:03:03 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 20 Aug 2014 09:03:03 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F454E3.40803@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
Message-ID: <53F45637.8080000@ebi.ac.uk>

Another interesting case about a specific waiter:

was looking the waiters on GSS until i found those( i got those info 
collecting from all the servers with a script i did, so i was able to 
trace hanging connection while they was happening):

                gss03b.ebi.ac.uk:*235.373993397*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss03b.ebi.ac.uk:*235.152271998*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss02a.ebi.ac.uk:*214.079093620 *(MsgRecordCondvar),
                reason 'RPC wait' for tmMsgRevoke on node 10.7.34.109
                <c0n656>
                gss02a.ebi.ac.uk:*213.580199240 *(MsgRecordCondvar),
                reason 'RPC wait' for tmMsgRevoke on node 10.7.37.109
                <c0n675>
                gss03b.ebi.ac.uk:*132.375138082*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss03b.ebi.ac.uk:*132.374973884 *(MsgRecordCondvar),
                reason 'RPC wait' for commMsgCheckMessages on node
                10.7.37.109 <c0n675>


the bolted number are seconds. looking at this page:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/Interpreting+GPFS+Waiter+Information

The web page claim that's, probably a network congestion, but i managed 
to login quick enough to the client and there the waiters was:

                [root at ebi5-236 ~]# mmdiag --waiters

                === mmdiag: waiters ===
                0x7F6690073460 waiting 147.973009173 seconds,
                RangeRevokeWorkerThread: on ThCond 0x1801E43F6A0
                (0xFFFFC9001E43F6A0) (LkObjCondvar), reason 'waiting for
                LX lock'
                0x7F65100036D0 waiting 140.458589856 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6500000F98
                (0x7F6500000F98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F63A0001080 waiting 245.153055801 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65D8002CF8
                (0x7F65D8002CF8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C03D3D0 waiting 245.750977203 seconds,
                CleanBufferThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait for buffer write to complete'
                0x7F674802E360 waiting 244.159861966 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65E0002358
                (0x7F65E0002358) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C038810 waiting 251.086748430 seconds,
                SGExceptionLogBufferFullThread: on ThCond 0x7F64EC001398
                (0x7F64EC001398) (MsgRecordCondvar), reason 'RPC wait'
                for I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C036230 waiting 139.556735095 seconds,
                CleanBufferThread: on ThCond 0x7F65CC004C78
                (0x7F65CC004C78) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C031670 waiting 144.327593052 seconds,
                WritebehindWorkerThread: on ThCond 0x7F672402D1A8
                (0x7F672402D1A8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C02A4D0 waiting 145.202712821 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65440018F8
                (0x7F65440018F8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C0291E0 waiting 247.131569232 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65740016C8
                (0x7F65740016C8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6748025BD0 waiting 11.631381523 seconds,
                replyCleanupThread: on ThCond 0x7F65E000A1F8
                (0x7F65E000A1F8) (MsgRecordCondvar), reason 'RPC wait'
                0x7F6748022300 waiting 245.616267612 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6470001468
                (0x7F6470001468) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6748021010 waiting 230.769670930 seconds,
                InodeAllocRevokeWorkerThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait for buffer write to complete'
                0x7F674801B160 waiting 245.830554594 seconds,
                UnusedInodePrefetchThread: on ThCond 0x7F65B8004438
                (0x7F65B8004438) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674800A820 waiting 252.332932000 seconds, Msg
                handler getData: for poll on sock 109
                0x7F63F4023090 waiting 253.073535042 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65C4000CC8
                (0x7F65C4000CC8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64A4000CE0 waiting 145.049659249 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6560000A98
                (0x7F6560000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6778006D00 waiting 142.124664264 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63DC000C08
                (0x7F63DC000C08) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780046D0 waiting 251.751439453 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6454000A98
                (0x7F6454000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780E4B70 waiting 142.431051232 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63C80010D8
                (0x7F63C80010D8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780E50D0 waiting 244.339624817 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65BC001B98
                (0x7F65BC001B98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6434000B40 waiting 145.343700410 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63B00036E8
                (0x7F63B00036E8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F670C0187A0 waiting 244.903963969 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F0000FB8
                (0x7F65F0000FB8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C04E2F0 waiting 245.837137631 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65A4000A98
                (0x7F65A4000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C04AA20 waiting 139.713993908 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6454002478
                (0x7F6454002478) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C049730 waiting 252.434187472 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F4003708
                (0x7F65F4003708) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C044B70 waiting 131.515829048 seconds, Msg
                handler ccMsgPing: on ThCond 0x7F64DC1D4888
                (0x7F64DC1D4888) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F6758008DE0 waiting 149.548547226 seconds, Msg
                handler getData: on ThCond 0x7F645C002458
                (0x7F645C002458) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F67580071D0 waiting 149.548543118 seconds, Msg
                handler commMsgCheckMessages: on ThCond 0x7F6450001C48
                (0x7F6450001C48) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F65A40052B0 waiting 11.498507001 seconds, Msg handler
                ccMsgPing: on ThCond 0x7F644C103F88 (0x7F644C103F88)
                (InuseCondvar), reason 'waiting for exclusive use of
                connection for sending msg'
                0x7F6448001620 waiting 139.844870446 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F0003098
                (0x7F65F0003098) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F63F4000F80 waiting 245.044791905 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6450001188
                (0x7F6450001188) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F659C0033A0 waiting 243.464399305 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6554002598
                (0x7F6554002598) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6514001690 waiting 245.826160463 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65A4004558
                (0x7F65A4004558) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64800012B0 waiting 253.174835511 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65E0000FB8
                (0x7F65E0000FB8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6510000EE0 waiting 140.746696039 seconds,
                WritebehindWorkerThread: on ThCond 0x7F647C000CC8
                (0x7F647C000CC8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6754001BB0 waiting 246.336055629 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6594002498
                (0x7F6594002498) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6420000930 waiting 140.606777450 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6578002498
                (0x7F6578002498) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6744009110 waiting 137.466372831 seconds,
                FileBlockReadFetchHandlerThread: on ThCond
                0x7F65F4007158 (0x7F65F4007158) (MsgRecordCondvar),
                reason 'RPC wait' for NSD I/O completion on node
                10.7.28.35 <c1n5>
                0x7F67280119F0 waiting 144.173427360 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6504000AE8
                (0x7F6504000AE8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F672800BB40 waiting 145.804301887 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6550001038
                (0x7F6550001038) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6728000910 waiting 252.601993452 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6450000A98
                (0x7F6450000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6744007E20 waiting 251.603329204 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6570004C18
                (0x7F6570004C18) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64AC002EF0 waiting 139.205774422 seconds,
                FileBlockWriteFetchHandlerThread: on ThCond
                0x18020AF0260 (0xFFFFC90020AF0260)
                (FetchFlowControlCondvar), reason 'wait for buffer for
                fetch'
                0x7F6724013050 waiting 71.501580932 seconds, Msg handler
                ccMsgPing: on ThCond 0x7F6580006608 (0x7F6580006608)
                (InuseCondvar), reason 'waiting for exclusive use of
                connection for sending msg'
                0x7F661C000DA0 waiting 245.654985276 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6570005288
                (0x7F6570005288) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C00F440 waiting 251.096002003 seconds,
                FileBlockReadFetchHandlerThread: on ThCond
                0x7F65BC002878 (0x7F65BC002878) (MsgRecordCondvar),
                reason 'RPC wait' for NSD I/O completion on node
                10.7.28.35 <c1n5>
                0x7F671C00E150 waiting 144.034006970 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6528001548
                (0x7F6528001548) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A02FCD20 waiting 142.324070945 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6580002A98
                (0x7F6580002A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A02FA330 waiting 200.670114385 seconds,
                EEWatchDogThread: on ThCond 0x7F65B0000A98
                (0x7F65B0000A98) (MsgRecordCondvar), reason 'RPC wait'
                0x7F67A02BF050 waiting 252.276161189 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6584003998
                (0x7F6584003998) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A0004160 waiting 251.173651822 seconds,
                SyncHandlerThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait on force active buffer write'


So from the client side its the client that's waiting the server. I 
managed also to ping, ssh, and   tcpdump each other before the node got 
expelled and discovered that ping works fine, ssh work fine , beside my 
tests there are  0 packet passing between them, LITERALLY.

So there is no congestion, no network issues, but the server waits for 
the client and the client waits the server. This happens until we reach 
350 secs ( 10 times the lease time) , then client get expelled.
There are no local io waiters that indicates that gss is struggling, 
there is plenty of bandwith and CPU resources and no network congestion.

Seems some sort of deadlock to me, but how can this be explained and 
hopefully fixed?

Regards,
Salvatore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140820/77aceb5a/attachment.htm>

From chair at gpfsug.org  Thu Aug 21 09:20:39 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Thu, 21 Aug 2014 09:20:39 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F454E3.40803@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
Message-ID: <53F5ABD7.80107@gpfsug.org>

Hi there,

   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
better term now GSS is out) and seen ping 'working', but alongside 
ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - 
and rightly so.

In my experience this has _always_ been a network issue of one sort of 
another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen 
that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to 
determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly 
or via your OEM and collect and supply a snap and traces as required by 
support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
> Still problems. Here some more detailed examples:
>
> *EXAMPLE 1:*
>
>             *EBI5-220**( CLIENT)**
>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>             reply from node <GSS02B IP> gss02b*
>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
>             (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
>             GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>             from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
>             IP> (ebi5-220)
>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>             broke. Probing cluster GSS.ebi.ac.uk
>             Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
>             nodes during cluster probe.
>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>             GSS.ebi.ac.uk. Unmounting file systems.
>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>             invoked.  File system: gpfs1  Reason: SGPanic
>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>             gss02a <c1p687>
>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>             gss02a <c1p687>
>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>             gss02b <c1p686>
>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>             gss03b <c1p685>
>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>             gss03a <c1p684>
>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>             gss01b <c1p683>
>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>             gss01a <c1p1>
>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>             gss02b <c1p686>
>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>             gss03b <c1p685>
>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>             gss03a <c1p684>
>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>             gss01b <c1p683>
>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>             gss01a <c1p1>
>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
>             GSS.ebi.ac.uk) is now the Group Leader.
>
>             *GSS02B ( NSD SERVER)*
>             ...
>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>             ...
>
>             *GSS02a ( NSD SERVER)*
>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>             request from <EBI5-220 IP> (ebi5-220 in
>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
>             in ebi-cluster.ebi.ac.uk)
>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>             <EBI5-220 IP> ebi5-220 <c0n618>
>
>
> ===============================================
> *EXAMPLE 2*:
>
>             *EBI5-038*
>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>             cluster GSS.ebi.ac.uk*
>             Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection reset by peer). Attempting
>             reconnect.
>             Tue Aug 19 11:35:24.865 2014: Close connection to
>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>             peer). Attempting reconnect.
>             ...
>             LOT MORE RESETS BY PEER
>             ...
>             Tue Aug 19 11:35:25.096 2014: Close connection to
>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>             peer). Attempting reconnect.
>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>             gss02a <c1n2>
>             Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection failed because destination
>             is still processing previous node failure)
>             Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
>             IP> gss02a <c1n2>
>             Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection failed because destination
>             is still processing previous node failure)
>             Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
>             nodes during cluster probe.
>             Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
>             GSS.ebi.ac.uk. Unmounting file systems.*
>
>             *GSS02a*
>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
>             in ebi-cluster.ebi.ac.uk) *is being expelled because of an
>             expired lease.* Pings sent: 60. Replies received: 60.
>
>
>
>
> In example 1 seems that an NSD was not repliyng to the client, but the 
> servers seems working fine.. how can i trace better ( to solve) the 
> problem?
>
> In example 2 it seems to me that for some reason the manager are not 
> renewing the lease in time. when this happens , its not a single client.
> Loads of them fail to get the lease renewed. Why this is happening? 
> how can i trace to the source of the problem?
>
>
>
> Thanks in advance for any tips.
>
> Regards,
> Salvatore
>
>
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/9039306e/attachment.htm>

From sdinardo at ebi.ac.uk  Thu Aug 21 10:04:47 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 10:04:47 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5ABD7.80107@gpfsug.org>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>
Message-ID: <53F5B62F.1060305@ebi.ac.uk>

Thanks for the feedback, but we managed to find a scenario that excludes 
network problems.

we have a file called */input_file/* of nearly 100GB:

if from *client A* we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they 
flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the 
same client) *client B* :

cat input_file >> output_file


  ( in other words we are trying to write to the same destination) all 
the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for 
one of the "cat", so have to wait ( and since the file is BIG , have to 
wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel 
the node? We can reproduce this every time, and since our users to 
operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to 
the same destination, its also quite common if we want to dump to a log 
file logs and for some reason one of the writers, write for a lot of 
time keeping the file locked.
Our expels are not due to network congestion, but because a write 
attempts have to wait another one. What i really dont understand is why 
to take a so expreme mesure to expell jest because a process is waiteing 
"to too much time".


I have ticket opened to IBM for this and the issue is under 
investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
> Hi there,
>
>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
> better term now GSS is out) and seen ping 'working', but alongside 
> ejections from the cluster.
> The GPFS internode 'ping' is somewhat more circumspect than unix ping 
> - and rightly so.
>
> In my experience this has _always_ been a network issue of one sort of 
> another.  If the network is experiencing issues, nodes will be ejected.
> Of course it could be unresponsive mmfsd or high loadavg, but I've 
> seen that only twice in 10 years over many versions of GPFS.
>
> You need to follow the logs through from each machine in time order to 
> determine who could not see who and in what order.
> Your best way forward is to log a SEV2 case with IBM support, directly 
> or via your OEM and collect and supply a snap and traces as required 
> by support.
>
> Without knowing your full setup, it's hard to help further.
>
> Jez
>
> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>> Still problems. Here some more detailed examples:
>>
>> *EXAMPLE 1:*
>>
>>             *EBI5-220**( CLIENT)**
>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>             reply from node <GSS02B IP> gss02b*
>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
>>             (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
>>             GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>             from cluster GSS.ebi.ac.uk due to expel msg from
>>             <EBI5-220 IP> (ebi5-220)
>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>             broke. Probing cluster GSS.ebi.ac.uk
>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>             quorum nodes during cluster probe.
>>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>             GSS.ebi.ac.uk. Unmounting file systems.
>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>             invoked.  File system: gpfs1  Reason: SGPanic
>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>             gss02a <c1p687>
>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>             gss02a <c1p687>
>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>             gss02b <c1p686>
>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>             gss03b <c1p685>
>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>             gss03a <c1p684>
>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>             gss01b <c1p683>
>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>             gss01a <c1p1>
>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>             gss02b <c1p686>
>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>             gss03b <c1p685>
>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>             gss03a <c1p684>
>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>             gss01b <c1p683>
>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>             gss01a <c1p1>
>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
>>             GSS.ebi.ac.uk) is now the Group Leader.
>>
>>             *GSS02B ( NSD SERVER)*
>>             ...
>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>             ...
>>
>>             *GSS02a ( NSD SERVER)*
>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>             request from <EBI5-220 IP> (ebi5-220 in
>>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>             (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>
>>
>> ===============================================
>> *EXAMPLE 2*:
>>
>>             *EBI5-038*
>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>             cluster GSS.ebi.ac.uk*
>>             Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection reset by peer). Attempting
>>             reconnect.
>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>             peer). Attempting reconnect.
>>             ...
>>             LOT MORE RESETS BY PEER
>>             ...
>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>             peer). Attempting reconnect.
>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>             gss02a <c1n2>
>>             Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection failed because destination
>>             is still processing previous node failure)
>>             Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
>>             IP> gss02a <c1n2>
>>             Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection failed because destination
>>             is still processing previous node failure)
>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>             quorum nodes during cluster probe.
>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
>>             GSS.ebi.ac.uk. Unmounting file systems.*
>>
>>             *GSS02a*
>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>             (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>             because of an expired lease.* Pings sent: 60. Replies
>>             received: 60.
>>
>>
>>
>>
>> In example 1 seems that an NSD was not repliyng to the client, but 
>> the servers seems working fine.. how can i trace better ( to solve) 
>> the problem?
>>
>> In example 2 it seems to me that for some reason the manager are not 
>> renewing the lease in time. when this happens , its not a single client.
>> Loads of them fail to get the lease renewed. Why this is happening? 
>> how can i trace to the source of the problem?
>>
>>
>>
>> Thanks in advance for any tips.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/a0a8b3b7/attachment.htm>

From bbanister at jumptrading.com  Thu Aug 21 13:48:38 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 21 Aug 2014 12:48:38 +0000
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>,<53F5B62F.1060305@ebi.ac.uk>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>

As I understand GPFS distributed locking semantics, GPFS will not allow one node to hold a write lock for a file indefinitely.  Once Client B opens the file for writing it would have contacted the File System Manager to obtain the lock.  The FS manager would have told Client B that Client A has the lock and that Client B would have to contact Client A and revoke the write lock token.  If Client A does not respond to Client B's request to revoke the write token, then Client B will ask that Client A be expelled from the cluster for NOT adhering to the proper protocol for write lock contention.


[cid:2fb2253c-3ffb-4ac6-88a8-d019b1a24f66]


Have you checked the communication path between the two clients at this point?

I could not follow the logs that you provided.  You should definitely look at the exact sequence of log events on the two clients and the file system manager (as reported by mmlsmgr).

Hope that helps,
-Bryan

________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk]
Sent: Thursday, August 21, 2014 4:04 AM
To: chair at gpfsug.org; gpfsug main discussion list
Subject: Re: [gpfsug-discuss] gpfs client expels

Thanks for the feedback, but we managed to find a scenario that excludes network problems.

we have a file called input_file of nearly 100GB:

if from client A we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the same client) client B :

cat input_file >> output_file


 ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".


I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
Hi there,

  I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.

In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
Still problems. Here some more detailed examples:

EXAMPLE 1:
EBI5-220 ( CLIENT)
Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.

GSS02B ( NSD SERVER)
...
Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
...

GSS02a ( NSD SERVER)
Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
EXAMPLE 2:

EBI5-038
Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
...
LOT MORE RESETS BY PEER
...
Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.

GSS02a
Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem?

In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/416902ff/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GPFS_Token_Protocol.png
Type: image/png
Size: 249179 bytes
Desc: GPFS_Token_Protocol.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/416902ff/attachment.png>

From jbernard at jumptrading.com  Thu Aug 21 13:52:05 2014
From: jbernard at jumptrading.com (Jon Bernard)
Date: Thu, 21 Aug 2014 12:52:05 +0000
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>, <53F5B62F.1060305@ebi.ac.uk>,
	<21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <D3287279-9A7C-4645-B41F-E2B36DCDBA85@jumptrading.com>

Where is that from?

On Aug 21, 2014, at 7:49, "Bryan Banister" <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:

As I understand GPFS distributed locking semantics, GPFS will not allow one node to hold a write lock for a file indefinitely.  Once Client B opens the file for writing it would have contacted the File System Manager to obtain the lock.  The FS manager would have told Client B that Client A has the lock and that Client B would have to contact Client A and revoke the write lock token.  If Client A does not respond to Client B's request to revoke the write token, then Client B will ask that Client A be expelled from the cluster for NOT adhering to the proper protocol for write lock contention.


<GPFS_Token_Protocol.png>


Have you checked the communication path between the two clients at this point?

I could not follow the logs that you provided.  You should definitely look at the exact sequence of log events on the two clients and the file system manager (as reported by mmlsmgr).

Hope that helps,
-Bryan

________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk<mailto:sdinardo at ebi.ac.uk>]
Sent: Thursday, August 21, 2014 4:04 AM
To: chair at gpfsug.org<mailto:chair at gpfsug.org>; gpfsug main discussion list
Subject: Re: [gpfsug-discuss] gpfs client expels

Thanks for the feedback, but we managed to find a scenario that excludes network problems.

we have a file called input_file of nearly 100GB:

if from client A we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the same client) client B :

cat input_file >> output_file


 ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".


I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
Hi there,

  I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.

In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
Still problems. Here some more detailed examples:

EXAMPLE 1:
EBI5-220 ( CLIENT)
Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220 IP> (ebi5-220)
Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Unmounting file systems.
Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) is now the Group Leader.

GSS02B ( NSD SERVER)
...
Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
...

GSS02a ( NSD SERVER)
Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>)
Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
EXAMPLE 2:

EBI5-038
Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Attempting to reacquire lease.
Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
...
LOT MORE RESETS BY PEER
...
Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Unmounting file systems.

GSS02a
Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem?

In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/ede5a53a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GPFS_Token_Protocol.png
Type: image/png
Size: 249179 bytes
Desc: GPFS_Token_Protocol.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/ede5a53a/attachment.png>

From viccornell at gmail.com  Thu Aug 21 14:03:14 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Thu, 21 Aug 2014 14:03:14 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org> <53F5B62F.1060305@ebi.ac.uk>
Message-ID: <9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>

Hi Salvatore,

Are you using ethernet or infiniband as the GPFS interconnect to your clients?

If 10/40GbE - do you have a separate admin network?

I have seen behaviour similar to this where the storage traffic causes congestion and the "admin" traffic gets lost or delayed causing expels.

Vic


On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:

> Thanks for the feedback, but we managed to find a scenario that excludes network problems.
> 
> we have a file called input_file of nearly 100GB:
> 
> if from client A we do:
> 
> cat input_file >> output_file
> 
> it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...
> 
> 
> if now we do the same from another client ( or just another shell on the same client) client B :
> 
> cat input_file >> output_file
> 
> 
>  ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.
> 
> 
> Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease. 
> Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.
> 
> 
> 
> As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
> Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".
> 
> 
> I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..
> 
> Regards,
> Salvatore
> 
> 
> 
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>> 
>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.
>> 
>> In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.
>> 
>> You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.
>> 
>> Without knowing your full setup, it's hard to help further.
>> 
>> Jez
>> 
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>> 
>>> EXAMPLE 1:
>>> EBI5-220 ( CLIENT)
>>> Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.
>>> 
>>> GSS02B ( NSD SERVER)
>>> ...
>>> Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>> ...
>>> 
>>> GSS02a ( NSD SERVER)
>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>> 
>>> 
>>> ===============================================
>>> EXAMPLE 2:
>>> 
>>> EBI5-038
>>> Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
>>> Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
>>> Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
>>> ...
>>> LOT MORE RESETS BY PEER
>>> ...
>>> Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
>>> Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>> Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
>>> Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
>>> Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>> 
>>> GSS02a
>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.
>>> 
>>> 
>>> 
>>> In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem? 
>>> 
>>> In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client. 
>>> Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?
>>> 
>>> 
>>> 
>>> Thanks in advance for any tips.
>>> 
>>> Regards,
>>> Salvatore
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/8ebcc5bd/attachment.htm>

From sdinardo at ebi.ac.uk  Thu Aug 21 14:04:59 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 14:04:59 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>,
	<53F5B62F.1060305@ebi.ac.uk>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <53F5EE7B.2080306@ebi.ac.uk>

Thanks for the info...  it helps a bit understanding whats going on, but 
i think you missed the part that Node A and Node B could also be the 
same machine.

If for instance i ran 2 cp on the same machine, hence Client B cannot 
have problems contacting Client A since they are the same machine.....

BTW i did the same also using 2 clients and the result its the same. 
Nonetheless your description is made me understand a bit better what's 
going on


Regards,
Salvatore

On 21/08/14 13:48, Bryan Banister wrote:
> As I understand GPFS distributed locking semantics, GPFS will not 
> allow one node to hold a write lock for a file indefinitely.  Once 
> Client B opens the file for writing it would have contacted the File 
> System Manager to obtain the lock.  The FS manager would have told 
> Client B that Client A has the lock and that Client B would have to 
> contact Client A and revoke the write lock token.  If Client A does 
> not respond to Client B's request to revoke the write token, then 
> Client B will ask that Client A be expelled from the cluster for NOT 
> adhering to the proper protocol for write lock contention.
>
>
>
> Have you checked the communication path between the two clients at 
> this point?
>
> I could not follow the logs that you provided.  You should definitely 
> look at the exact sequence of log events on the two clients and the 
> file system manager (as reported by mmlsmgr).
>
> Hope that helps,
> -Bryan
>
> ------------------------------------------------------------------------
> *From:* gpfsug-discuss-bounces at gpfsug.org 
> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo 
> [sdinardo at ebi.ac.uk]
> *Sent:* Thursday, August 21, 2014 4:04 AM
> *To:* chair at gpfsug.org; gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] gpfs client expels
>
> Thanks for the feedback, but we managed to find a scenario that 
> excludes network problems.
>
> we have a file called */input_file/* of nearly 100GB:
>
> if from *client A* we do:
>
> cat input_file >> output_file
>
> it start copying.. and we see waiter goeg a bit up,secs but then they 
> flushes back to 0, so we xcan say that the copy proceed well...
>
>
> if now we do the same from another client ( or just another shell on 
> the same client) *client B* :
>
> cat input_file >> output_file
>
>
>  ( in other words we are trying to write to the same destination) all 
> the waiters gets up until one node get expelled.
>
>
> Now, while its understandable that the destination file is locked for 
> one of the "cat", so have to wait ( and since the file is BIG , have 
> to wait for a while), its not understandable why it stop the renewal 
> lease.
> Why its doen't return just a timeout error on the copy instead to 
> expel the node? We can reproduce this every time, and since our users 
> to operations like this on files over 100GB each you can imagine the 
> result.
>
>
>
> As you can imagine even if its a bit silly to write at the same time 
> to the same destination, its also quite common if we want to dump to a 
> log file logs and for some reason one of the writers, write for a lot 
> of time keeping the file locked.
> Our expels are not due to network congestion, but because a write 
> attempts have to wait another one. What i really dont understand is 
> why to take a so expreme mesure to expell jest because a process is 
> waiteing "to too much time".
>
>
> I have ticket opened to IBM for this and the issue is under 
> investigation, but no luck so far..
>
> Regards,
> Salvatore
>
>
>
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>>
>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>> better term now GSS is out) and seen ping 'working', but alongside 
>> ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping 
>> - and rightly so.
>>
>> In my experience this has _always_ been a network issue of one sort 
>> of another.  If the network is experiencing issues, nodes will be 
>> ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>> seen that only twice in 10 years over many versions of GPFS.
>>
>> You need to follow the logs through from each machine in time order 
>> to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support, 
>> directly or via your OEM and collect and supply a snap and traces as 
>> required by support.
>>
>> Without knowing your full setup, it's hard to help further.
>>
>> Jez
>>
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>>
>>> *EXAMPLE 1:*
>>>
>>>             *EBI5-220**( CLIENT)**
>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>             reply from node <GSS02B IP> gss02b*
>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>             IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP>
>>>             (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>>             from cluster GSS.ebi.ac.uk due to expel msg from
>>>             <EBI5-220 IP> (ebi5-220)
>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>>             broke. Probing cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>>             GSS.ebi.ac.uk. Unmounting file systems.
>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>             in GSS.ebi.ac.uk) is now the Group Leader.
>>>
>>>             *GSS02B ( NSD SERVER)*
>>>             ...
>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>             ...
>>>
>>>             *GSS02a ( NSD SERVER)*
>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>>             request from <EBI5-220 IP> (ebi5-220 in
>>>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>>             (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>
>>>
>>> ===============================================
>>> *EXAMPLE 2*:
>>>
>>>             *EBI5-038*
>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>>             cluster GSS.ebi.ac.uk*
>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>             Attempting reconnect.
>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             ...
>>>             LOT MORE RESETS BY PEER
>>>             ...
>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1n2>
>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>             <GSS02A IP> gss02a <c1n2>
>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>             cluster GSS.ebi.ac.uk. Unmounting file systems.*
>>>
>>>             *GSS02a*
>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>>             because of an expired lease.* Pings sent: 60. Replies
>>>             received: 60.
>>>
>>>
>>>
>>>
>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>> the servers seems working fine.. how can i trace better ( to solve) 
>>> the problem?
>>>
>>> In example 2 it seems to me that for some reason the manager are not 
>>> renewing the lease in time. when this happens , its not a single 
>>> client.
>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>> how can i trace to the source of the problem?
>>>
>>>
>>>
>>> Thanks in advance for any tips.
>>>
>>> Regards,
>>> Salvatore
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 249179 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment.png>

From sdinardo at ebi.ac.uk  Thu Aug 21 14:18:19 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 14:18:19 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>
	<53F5B62F.1060305@ebi.ac.uk>
	<9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
Message-ID: <53F5F19B.1010603@ebi.ac.uk>

This is an interesting point!

We use ethernet ( 10g links on the clients) but we dont have a separate 
network for the admin network.

Could you explain this a bit further, because the clients and the 
servers we have are on different subnet so the packet are routed.. I 
don't see a practical way to separate them. The clients are blades in a 
chassis so even if i create 2 interfaces, they will physically use the 
came "cable" to go to the first switch. even the clients ( 600 clients) 
have different subsets.

I will forward this consideration to our network admin , so see if we 
can work on a dedicated network.

thanks for your tip.

Regards,
Salvatore


On 21/08/14 14:03, Vic Cornell wrote:
> Hi Salvatore,
>
> Are you using ethernet or infiniband as the GPFS interconnect to your 
> clients?
>
> If 10/40GbE - do you have a separate admin network?
>
> I have seen behaviour similar to this where the storage traffic causes 
> congestion and the "admin" traffic gets lost or delayed causing expels.
>
> Vic
>
>
>
> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk 
> <mailto:sdinardo at ebi.ac.uk>> wrote:
>
>> Thanks for the feedback, but we managed to find a scenario that 
>> excludes network problems.
>>
>> we have a file called */input_file/* of nearly 100GB:
>>
>> if from *client A* we do:
>>
>> cat input_file >> output_file
>>
>> it start copying.. and we see waiter goeg a bit up,secs but then they 
>> flushes back to 0, so we xcan say that the copy proceed well...
>>
>>
>> if now we do the same from another client ( or just another shell on 
>> the same client) *client B* :
>>
>> cat input_file >> output_file
>>
>>
>>  ( in other words we are trying to write to the same destination) all 
>> the waiters gets up until one node get expelled.
>>
>>
>> Now, while its understandable that the destination file is locked for 
>> one of the "cat", so have to wait ( and since the file is BIG , have 
>> to wait for a while), its not understandable why it stop the renewal 
>> lease.
>> Why its doen't return just a timeout error on the copy instead to 
>> expel the node? We can reproduce this every time, and since our users 
>> to operations like this on files over 100GB each you can imagine the 
>> result.
>>
>>
>>
>> As you can imagine even if its a bit silly to write at the same time 
>> to the same destination, its also quite common if we want to dump to 
>> a log file logs and for some reason one of the writers, write for a 
>> lot of time keeping the file locked.
>> Our expels are not due to network congestion, but because a write 
>> attempts have to wait another one. What i really dont understand is 
>> why to take a so expreme mesure to expell jest because a process is 
>> waiteing "to too much time".
>>
>>
>> I have ticket opened to IBM for this and the issue is under 
>> investigation, but no luck so far..
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>> Hi there,
>>>
>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>>> better term now GSS is out) and seen ping 'working', but alongside 
>>> ejections from the cluster.
>>> The GPFS internode 'ping' is somewhat more circumspect than unix 
>>> ping - and rightly so.
>>>
>>> In my experience this has _always_ been a network issue of one sort 
>>> of another.  If the network is experiencing issues, nodes will be 
>>> ejected.
>>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>>> seen that only twice in 10 years over many versions of GPFS.
>>>
>>> You need to follow the logs through from each machine in time order 
>>> to determine who could not see who and in what order.
>>> Your best way forward is to log a SEV2 case with IBM support, 
>>> directly or via your OEM and collect and supply a snap and traces as 
>>> required by support.
>>>
>>> Without knowing your full setup, it's hard to help further.
>>>
>>> Jez
>>>
>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>> Still problems. Here some more detailed examples:
>>>>
>>>> *EXAMPLE 1:*
>>>>
>>>>             *EBI5-220**( CLIENT)**
>>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>>             reply from node <GSS02B IP> gss02b*
>>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>>             IP> (gss02a in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) to
>>>>             expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:04.982 2014: This node will be
>>>>             expelled from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220
>>>>             IP> (ebi5-220)
>>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager
>>>>             connection broke. Probing cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.
>>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>>             in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) is now the
>>>>             Group Leader.
>>>>
>>>>             *GSS02B ( NSD SERVER)*
>>>>             ...
>>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>>             ...
>>>>
>>>>             *GSS02a ( NSD SERVER)*
>>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP>
>>>>             (gss02b) request from <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>).
>>>>             Expelling: <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>)
>>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>>
>>>>
>>>> ===============================================
>>>> *EXAMPLE 2*:
>>>>
>>>>             *EBI5-038*
>>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period
>>>>             expired in cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>. Attempting to reacquire lease.*
>>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue.
>>>>             Probing cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>*
>>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>>             Attempting reconnect.
>>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             ...
>>>>             LOT MORE RESETS BY PEER
>>>>             ...
>>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1n2>
>>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>>             <GSS02A IP> gss02a <c1n2>
>>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.*
>>>>
>>>>             *GSS02a*
>>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk
>>>>             <http://ebi-cluster.ebi.ac.uk>) *is being expelled
>>>>             because of an expired lease.* Pings sent: 60. Replies
>>>>             received: 60.
>>>>
>>>>
>>>>
>>>>
>>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>>> the servers seems working fine.. how can i trace better ( to solve) 
>>>> the problem?
>>>>
>>>> In example 2 it seems to me that for some reason the manager are 
>>>> not renewing the lease in time. when this happens , its not a 
>>>> single client.
>>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>>> how can i trace to the source of the problem?
>>>>
>>>>
>>>>
>>>> Thanks in advance for any tips.
>>>>
>>>> Regards,
>>>> Salvatore
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/bf1a6c40/attachment.htm>

From service at metamodul.com  Thu Aug 21 14:19:33 2014
From: service at metamodul.com (service at metamodul.com)
Date: Thu, 21 Aug 2014 15:19:33 +0200 (CEST)
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org> <53F5B62F.1060305@ebi.ac.uk>
Message-ID: <1481989063.92260.1408627173332.open-xchange@oxbaltgw09.schlund.de>

> Now, while its understandable that the destination file is locked for one of
> the "cat", so have to wait

If GPFS is posix compatible i do not understand why a cat should block the other
cat completly meanings on a standard FS you can "cat" from many source to the
same target. Of course the result is not predictable.

>From this point of view i would expect that both "cat" would start writing
immediately thus i would expect a GPFS bug.

All imho.
Hajo

Note: You might test which the input_file in a different directory and i would
test the behaviour if the output_file is on a local FS like /tmp.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/f02dd903/attachment.htm>

From viccornell at gmail.com  Thu Aug 21 14:22:22 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Thu, 21 Aug 2014 14:22:22 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5F19B.1010603@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>
	<53F5B62F.1060305@ebi.ac.uk>
	<9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
	<53F5F19B.1010603@ebi.ac.uk>
Message-ID: <0F03996A-2008-4076-9A2B-B4B2BB89E959@gmail.com>

For my system I always use a dedicated admin network - as described in the gpfs manuals - for a gpfs cluster on 10/40GbE where the system will be heavily loaded.

The difference in the stability of the system is very noticeable.

Not sure how/if this would work on GSS - IBM ought to know :-)

Vic


On 21 Aug 2014, at 14:18, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:

> This is an interesting point!
> 
> We use ethernet ( 10g links on the clients) but we dont have a separate network for the admin network. 
> 
> Could you explain this a bit further, because the clients and the servers we have are on different subnet so the packet are routed.. I don't see a practical way to separate them. The clients are blades in a chassis so even if i create 2 interfaces, they will physically use the came "cable" to go to the first switch. even the clients ( 600 clients) have different subsets.
> 
> I will forward this consideration to our network admin , so see if we can work on a dedicated network.
> 
> thanks for your tip.
> 
> Regards,
> Salvatore
> 
> 
> 
> 
> On 21/08/14 14:03, Vic Cornell wrote:
>> Hi Salvatore,
>> 
>> Are you using ethernet or infiniband as the GPFS interconnect to your clients?
>> 
>> If 10/40GbE - do you have a separate admin network?
>> 
>> I have seen behaviour similar to this where the storage traffic causes congestion and the "admin" traffic gets lost or delayed causing expels.
>> 
>> Vic
>> 
>> 
>> 
>> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:
>> 
>>> Thanks for the feedback, but we managed to find a scenario that excludes network problems.
>>> 
>>> we have a file called input_file of nearly 100GB:
>>> 
>>> if from client A we do:
>>> 
>>> cat input_file >> output_file
>>> 
>>> it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...
>>> 
>>> 
>>> if now we do the same from another client ( or just another shell on the same client) client B :
>>> 
>>> cat input_file >> output_file
>>> 
>>> 
>>>  ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.
>>> 
>>> 
>>> Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease. 
>>> Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.
>>> 
>>> 
>>> 
>>> As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
>>> Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".
>>> 
>>> 
>>> I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..
>>> 
>>> Regards,
>>> Salvatore
>>> 
>>> 
>>> 
>>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>>> Hi there,
>>>> 
>>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
>>>> The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.
>>>> 
>>>> In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
>>>> Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.
>>>> 
>>>> You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
>>>> Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.
>>>> 
>>>> Without knowing your full setup, it's hard to help further.
>>>> 
>>>> Jez
>>>> 
>>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>>> Still problems. Here some more detailed examples:
>>>>> 
>>>>> EXAMPLE 1:
>>>>> EBI5-220 ( CLIENT)
>>>>> Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
>>>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
>>>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
>>>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
>>>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.
>>>>> 
>>>>> GSS02B ( NSD SERVER)
>>>>> ...
>>>>> Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>>>> ...
>>>>> 
>>>>> GSS02a ( NSD SERVER)
>>>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>>>> 
>>>>> 
>>>>> ===============================================
>>>>> EXAMPLE 2:
>>>>> 
>>>>> EBI5-038
>>>>> Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
>>>>> Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
>>>>> ...
>>>>> LOT MORE RESETS BY PEER
>>>>> ...
>>>>> Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
>>>>> Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>>>> Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
>>>>> Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
>>>>> Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>>>> 
>>>>> GSS02a
>>>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.
>>>>> 
>>>>> 
>>>>> 
>>>>> In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem? 
>>>>> 
>>>>> In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client. 
>>>>> Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks in advance for any tips.
>>>>> 
>>>>> Regards,
>>>>> Salvatore
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at gpfsug.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at gpfsug.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/a46c2c76/attachment.htm>

From sdinardo at ebi.ac.uk  Fri Aug 22 10:37:42 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 22 Aug 2014 10:37:42 +0100
Subject: [gpfsug-discuss] gpfs client expels, fs hangind and waiters
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F70F66.2010405@ebi.ac.uk>

Hello everyone,

Just to let you know, we found the cause of our problems.

We discovered that not all of the recommend kernel setting was 
configured on the clients ( on server was everything ok, but the clients 
had some setting  missing ), and
IBM support pointed to this document that describes perfectly our issues 
and the fix wich suggest to raise some parameters even higher than the 
standard "best practice" :


http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5091222


Thanks to everyone for the replies.


Regards,
Salvatore


From ewahl at osc.edu  Mon Aug 25 19:55:08 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Mon, 25 Aug 2014 18:55:08 +0000
Subject: [gpfsug-discuss] CNFS using NFS over RDMA?
Message-ID: <C59E5201836F7147BAD35189FFBB35D101164D42DF@USOAPP09V04P.si.lan>

Anyone out there doing CNFS with NFS over RDMA?  Is this even possible?

We currently have been delivering some CNFS services using TCP over IB, but that layer tends to have a large number of bugs all the time.  Like to take a look at moving back down to verbs...

Ed Wahl
OSC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140825/bd329ccd/attachment.htm>

From zander at ebi.ac.uk  Fri Aug  1 14:44:49 2014
From: zander at ebi.ac.uk (Zander Mears)
Date: Fri, 01 Aug 2014 14:44:49 +0100
Subject: [gpfsug-discuss] Hello!
In-Reply-To: <53D981EF.3020000@gpfsug.org>
References: <53D8C897.9000902@ebi.ac.uk> <53D981EF.3020000@gpfsug.org>
Message-ID: <53DB99D1.8050304@ebi.ac.uk>

Hi Jez

We're just monitoring the standard OS stuff, some interface errors, 
throughput, number of network and gpfs connections due to previous 
issues. We don't really know as yet what is good to monitor GPFS wise.

cheers

Zander

On 31/07/2014 00:38, Jez Tucker (Chair) wrote:
> Hi Zander,
>
>    We have a git repository.  Would you be interested in adding any
> Zabbix custom metrics gathering to GPFS to it?
>
> https://github.com/gpfsug/gpfsug-tools
>
> Best,
>
> Jez


From sfadden at us.ibm.com  Tue Aug  5 18:55:20 2014
From: sfadden at us.ibm.com (Scott Fadden)
Date: Tue, 5 Aug 2014 10:55:20 -0700
Subject: [gpfsug-discuss] GPFS and Lustre on same node
Message-ID: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>


Is anyone running GPFS and Lustre on the same nodes. I have seen it work, I
have heard people are doing it, I am looking for some confirmation.

Thanks

Scott Fadden
GPFS Technical Marketing
Phone: (503) 880-5833
sfadden at us.ibm.com
http://www.ibm.com/systems/gpfs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140805/a2400b55/attachment-0001.htm>

From u.sibiller at science-computing.de  Wed Aug  6 08:46:31 2014
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Wed, 06 Aug 2014 09:46:31 +0200
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
Message-ID: <53E1DD57.90103@science-computing.de>

Am 05.08.2014 19:55, schrieb Scott Fadden:
> Is anyone running GPFS and Lustre on the same nodes. I have seen it work, I have heard people are
> doing it, I am looking for some confirmation.

I have some nodes running lustre 2.1.6 or 2.5.58 and gpfs 3.5.0.17 on RHEL5.8 and RHEL6.5. None of 
them are servers.

Kind regards,

Ulrich Sibiller

-- 
______________________________________creating IT solutions
Dipl.-Inf. Ulrich Sibiller           science + computing ag
System Administration                    Hagellocher Weg 73
mail nfz at science-computing.de      72070 Tuebingen, Germany
hotline +49 7071 9457 674   http://www.science-computing.de
-- 
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs, Dr. Arno Steitz
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From frederik.ferner at diamond.ac.uk  Wed Aug  6 10:19:35 2014
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Wed, 6 Aug 2014 10:19:35 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
Message-ID: <53E1F327.1000605@diamond.ac.uk>

On 05/08/14 18:55, Scott Fadden wrote:
> Is anyone running GPFS and Lustre on the same nodes. I have seen it
> work, I have heard people are doing it, I am looking for some confirmation.

Most of our compute cluster nodes are clients for Lustre and GPFS at the 
same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on servers 
(GPFS NSD server or Lustre OSS/MDS servers).

HTH,
Frederik

-- 
Frederik Ferner
Senior Computer Systems Administrator   phone: +44 1235 77 8624
Diamond Light Source Ltd.               mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
 

From sdinardo at ebi.ac.uk  Wed Aug  6 10:57:44 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 06 Aug 2014 10:57:44 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E1F327.1000605@diamond.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
	<53E1F327.1000605@diamond.ac.uk>
Message-ID: <53E1FC18.6080707@ebi.ac.uk>

Sorry for this little ot, but recetly i'm looking to Lustre to 
understand how it is comparable to GPFS in terms of performance, 
reliability and easy to use.
Could anyone share their experience ?

My company just recently got a first GPFS system , based on IBM GSS, but 
while its good performance wise, there are few unresolved problems and 
the IBM support is almost unexistent, so I'm starting to wonder if its 
work to look somewhere else  eventual future purchases.


Salvatore

On 06/08/14 10:19, Frederik Ferner wrote:
> On 05/08/14 18:55, Scott Fadden wrote:
>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>> work, I have heard people are doing it, I am looking for some 
>> confirmation.
>
> Most of our compute cluster nodes are clients for Lustre and GPFS at 
> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
> servers (GPFS NSD server or Lustre OSS/MDS servers).
>
> HTH,
> Frederik
>


From chair at gpfsug.org  Wed Aug  6 11:19:24 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Wed, 06 Aug 2014 11:19:24 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E1FC18.6080707@ebi.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>
Message-ID: <53E2012C.9040402@gpfsug.org>

"IBM support is almost unexistent"

I don't find that at all.
Do you log directly via ESC or via your OEM/integrator or are you only 
referring to GSS support rather than pure GPFS?

If you are having response issues, your IBM rep (or a few folks on here) 
can accelerate issues for you.

Jez


On 06/08/14 10:57, Salvatore Di Nardo wrote:
> Sorry for this little ot, but recetly i'm looking to Lustre to 
> understand how it is comparable to GPFS in terms of performance, 
> reliability and easy to use.
> Could anyone share their experience ?
>
> My company just recently got a first GPFS system , based on IBM GSS, 
> but while its good performance wise, there are few unresolved problems 
> and the IBM support is almost unexistent, so I'm starting to wonder if 
> its work to look somewhere else  eventual future purchases.
>
>
> Salvatore
>
> On 06/08/14 10:19, Frederik Ferner wrote:
>> On 05/08/14 18:55, Scott Fadden wrote:
>>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>>> work, I have heard people are doing it, I am looking for some 
>>> confirmation.
>>
>> Most of our compute cluster nodes are clients for Lustre and GPFS at 
>> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
>> servers (GPFS NSD server or Lustre OSS/MDS servers).
>>
>> HTH,
>> Frederik
>>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From service at metamodul.com  Wed Aug  6 14:26:47 2014
From: service at metamodul.com (service at metamodul.com)
Date: Wed, 6 Aug 2014 15:26:47 +0200 (CEST)
Subject: [gpfsug-discuss] Hi , i am new to this list
Message-ID: <1366482624.222989.1407331607965.open-xchange@oxbaltgw55.schlund.de>

Hi @ALL
i am Hajo Ehlers , an AIX and GPFS specialist ( Unix System Engineer ). You find
me at the IBM GPFS Forum and sometimes at news:c.u.a  and I am addicted to
cluster filesystems

My latest idee is an SAP-HANA light system ( DBMS on an in-memory cluster posix
FS ) which could be extended to a "reinvented" Cluster based AS/400 ^_^
I wrote also a small script to do a sequential backup of GPFS filesystems since
i got never used to mmbackup - i named it "pdsmc" for parallel dsmc".


Cheers
Hajo

BTW: Please let me know - service (at) metamodul (dot) com - In case somebody is
looking for a GPFS specialist.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140806/3c01d53a/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Fri Aug  8 10:53:36 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 08 Aug 2014 10:53:36 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E2012C.9040402@gpfsug.org>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>	<53E1FC18.6080707@ebi.ac.uk>
	<53E2012C.9040402@gpfsug.org>
Message-ID: <53E49E20.1090905@ebi.ac.uk>

Well, i didn't wanted to start a rant against IBM, and I'm referring 
specifically to GSS.

Since GSS its an appliance, we have to refer to GSS support for both 
hardware and software issues. Hardware support in total crap. It took 1 
mounth of chasing and shouting to get a drawer replacement that was 
causing some issues. Meanwhile 10 disks in that drawer got faulty. 
Finally we got the drawer replace but the disks are still faulty. Now 
its 3 days i'm triing to get them fixed or replaced ( its not clear if 
they disks are broken of they was just marked to be replaced because of 
the drawer). Right now i dont have any answer about how to put them 
online ( mmchcarrier don't work because it recognize that the disk where 
not replaced)

There are also few other cases ( gpfs related)  open that are still not 
answered. I have no experience with direct GPFS support, but if i open a 
case to GSS for a GPFS problem, the cases seems never get an answer.

The only reason that GSS is working its because _*I*_**installed it 
spending few months studying gpfs. So now I'm wondering if its worth at 
all rely in future on the whole appliance concept.

I'm wondering if in future its better just purchase the hardware and 
install GPFS by our own, or in alternatively even try Lustre.


Now, skipping all this GSS rant, which have nothing to do with the file 
system anyway  and  going back to my question:

Could someone point the main differences between GPFS and Lustre?

I found some documentation about Lustre and i'm going to have a look, 
but oddly enough have not found any practical comparison between them.


On 06/08/14 11:19, Jez Tucker (Chair) wrote:
> "IBM support is almost unexistent"
>
> I don't find that at all.
> Do you log directly via ESC or via your OEM/integrator or are you only 
> referring to GSS support rather than pure GPFS?
>
> If you are having response issues, your IBM rep (or a few folks on 
> here) can accelerate issues for you.
>
> Jez
>
>
> On 06/08/14 10:57, Salvatore Di Nardo wrote:
>> Sorry for this little ot, but recetly i'm looking to Lustre to 
>> understand how it is comparable to GPFS in terms of performance, 
>> reliability and easy to use.
>> Could anyone share their experience ?
>>
>> My company just recently got a first GPFS system , based on IBM GSS, 
>> but while its good performance wise, there are few unresolved 
>> problems and the IBM support is almost unexistent, so I'm starting to 
>> wonder if its work to look somewhere else eventual future purchases.
>>
>>
>> Salvatore
>>
>> On 06/08/14 10:19, Frederik Ferner wrote:
>>> On 05/08/14 18:55, Scott Fadden wrote:
>>>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>>>> work, I have heard people are doing it, I am looking for some 
>>>> confirmation.
>>>
>>> Most of our compute cluster nodes are clients for Lustre and GPFS at 
>>> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
>>> servers (GPFS NSD server or Lustre OSS/MDS servers).
>>>
>>> HTH,
>>> Frederik
>>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/04e3e4ae/attachment-0001.htm>

From jpro at bas.ac.uk  Fri Aug  8 12:40:00 2014
From: jpro at bas.ac.uk (Jeremy Robst)
Date: Fri, 8 Aug 2014 12:40:00 +0100 (BST)
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E49E20.1090905@ebi.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
	<53E1F327.1000605@diamond.ac.uk> <53E1FC18.6080707@ebi.ac.uk>
	<53E2012C.9040402@gpfsug.org> <53E49E20.1090905@ebi.ac.uk>
Message-ID: <alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>

On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:

> Now, skipping all this GSS rant, which have nothing to do with the file
> system anyway? and? going back to my question:
> 
> Could someone point the main differences between GPFS and Lustre?

I'm looking at making the same decision here - to buy GPFS or to roll our 
own Lustre configuration. I'm in the process of setting up test systems, 
and so far the main difference seems to be in the that in GPFS each server 
sees the full filesystem, and so you can run other applications (e.g 
backup) on a GPFS server whereas the Luste OSS (object storage servers) 
see only a portion of the storage (the filesystem is striped across the 
OSSes), so you need a Lustre client to mount the full filesystem for 
things like backup.

However I have very little practical experience of either and would also 
be interested in any comments.

Thanks

Jeremy
-- 
jpro at bas.ac.uk | (work) 01223 221402 (fax) 01223 362616
Unix System Administrator - British Antarctic Survey
#include <disclaimer.std>

From keith at ocf.co.uk  Fri Aug  8 14:12:39 2014
From: keith at ocf.co.uk (Keith Vickers)
Date: Fri, 8 Aug 2014 14:12:39 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
Message-ID: <A42128435E851644B9B011BB824F6C816F56CAF8F0@MAIL.ocf.local>

http://www.pdsw.org/pdsw10/resources/posters/parallelNASFSs.pdf

Has a good direct apples to apples comparison between Lustre and GPFS. It's pretty much abstractable from the hardware used.

Keith Vickers
Business Development Manager
OCF plc
Mobile: 07974 397863


From sergi.more at bsc.es  Fri Aug  8 14:14:33 2014
From: sergi.more at bsc.es (=?ISO-8859-1?Q?Sergi_Mor=E9_Codina?=)
Date: Fri, 08 Aug 2014 15:14:33 +0200
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>
	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
Message-ID: <53E4CD39.7080808@bsc.es>

Hi all,

About main differences between GPFS and Lustre, here you have some bits 
from our experience:

-Reliability: GPFS its been proved to be more stable and reliable. Also 
offers more flexibility in terms of fail-over. It have no restriction in 
number of servers. As far as I know, an NSD can have as many secondary 
servers as you want (we are using 8).

-Metadata: In Lustre each file system is restricted to two servers. No 
restriction in GPFS.

-Updates: In GPFS you can update the whole storage cluster without 
stopping production, one server at a time.

-Server/Client role: As Jeremy said, in GPFS every server act as a 
client as well. Useful for administrative tasks.

-Troubleshooting: Problems with GPFS are easier to track down. Logs are 
more clear, and offers better tools than Lustre.

-Support: No problems at all with GPFS support. It is true that it could 
take time to go up within all support levels, but we always got a good 
solution. Quite different in terms of hardware. IBM support quality has 
drop a lot since about last year an a half. Really slow and tedious 
process to get replacements. Moreover, we keep receiving bad "certified 
reutilitzed parts" hardware, which slow the whole process even more.


These are the main differences I would stand out after some years of 
experience with both file systems, but do not take it as a fact.

PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI 
a couple of months ago, and has experience working with both file 
systems here at BSC.

Best Regards,
Sergi.


On 08/08/2014 01:40 PM, Jeremy Robst wrote:
> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>
>> Now, skipping all this GSS rant, which have nothing to do with the file
>> system anyway  and  going back to my question:
>>
>> Could someone point the main differences between GPFS and Lustre?
>
> I'm looking at making the same decision here - to buy GPFS or to roll
> our own Lustre configuration. I'm in the process of setting up test
> systems, and so far the main difference seems to be in the that in GPFS
> each server sees the full filesystem, and so you can run other
> applications (e.g backup) on a GPFS server whereas the Luste OSS (object
> storage servers) see only a portion of the storage (the filesystem is
> striped across the OSSes), so you need a Lustre client to mount the full
> filesystem for things like backup.
>
> However I have very little practical experience of either and would also
> be interested in any comments.
>
> Thanks
>
> Jeremy
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 

------------------------------------------------------------------------

      Sergi More Codina
      Barcelona Supercomputing Center
      Centro Nacional de Supercomputacion
      WWW: http://www.bsc.es      Tel: +34-93-405 42 27
      e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21

------------------------------------------------------------------------

WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3242 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/ccba0783/attachment-0001.bin>

From viccornell at gmail.com  Fri Aug  8 18:15:30 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Fri, 8 Aug 2014 18:15:30 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E4CD39.7080808@bsc.es>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>
	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
	<53E4CD39.7080808@bsc.es>
Message-ID: <4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>

Disclaimers - I work for DDN - we sell lustre and GPFS. I know GPFS much better than I know Lustre.

The biggest difference we find between GPFS and Lustre is that GPFS - can usually achieve 90% of the bandwidth available to a single client with a single thread.

Lustre needs multiple parallel streams to saturate - say an Infiniband connection.

Lustre is often faster than GPFS and often has superior metadata performance - particularly where lots of files are created in a single directory.

GPFS can support Windows - Lustre cannot. I think GPFS is better integrated and easier to deploy than Lustre - some people disagree with me.

Regards,

Vic


On 8 Aug 2014, at 14:14, Sergi Mor? Codina <sergi.more at bsc.es> wrote:

> Hi all,
> 
> About main differences between GPFS and Lustre, here you have some bits from our experience:
> 
> -Reliability: GPFS its been proved to be more stable and reliable. Also offers more flexibility in terms of fail-over. It have no restriction in number of servers. As far as I know, an NSD can have as many secondary servers as you want (we are using 8).
> 
> -Metadata: In Lustre each file system is restricted to two servers. No restriction in GPFS.
> 
> -Updates: In GPFS you can update the whole storage cluster without stopping production, one server at a time.
> 
> -Server/Client role: As Jeremy said, in GPFS every server act as a client as well. Useful for administrative tasks.
> 
> -Troubleshooting: Problems with GPFS are easier to track down. Logs are more clear, and offers better tools than Lustre.
> 
> -Support: No problems at all with GPFS support. It is true that it could take time to go up within all support levels, but we always got a good solution. Quite different in terms of hardware. IBM support quality has drop a lot since about last year an a half. Really slow and tedious process to get replacements. Moreover, we keep receiving bad "certified reutilitzed parts" hardware, which slow the whole process even more.
> 
> 
> These are the main differences I would stand out after some years of experience with both file systems, but do not take it as a fact.
> 
> PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI a couple of months ago, and has experience working with both file systems here at BSC.
> 
> Best Regards,
> Sergi.
> 
> 
> On 08/08/2014 01:40 PM, Jeremy Robst wrote:
>> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>> 
>>> Now, skipping all this GSS rant, which have nothing to do with the file
>>> system anyway  and  going back to my question:
>>> 
>>> Could someone point the main differences between GPFS and Lustre?
>> 
>> I'm looking at making the same decision here - to buy GPFS or to roll
>> our own Lustre configuration. I'm in the process of setting up test
>> systems, and so far the main difference seems to be in the that in GPFS
>> each server sees the full filesystem, and so you can run other
>> applications (e.g backup) on a GPFS server whereas the Luste OSS (object
>> storage servers) see only a portion of the storage (the filesystem is
>> striped across the OSSes), so you need a Lustre client to mount the full
>> filesystem for things like backup.
>> 
>> However I have very little practical experience of either and would also
>> be interested in any comments.
>> 
>> Thanks
>> 
>> Jeremy
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
> 
> 
> -- 
> 
> ------------------------------------------------------------------------
> 
>     Sergi More Codina
>     Barcelona Supercomputing Center
>     Centro Nacional de Supercomputacion
>     WWW: http://www.bsc.es      Tel: +34-93-405 42 27
>     e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21
> 
> ------------------------------------------------------------------------
> 
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
> 
> http://www.bsc.es/disclaimer.htm
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at us.ibm.com  Fri Aug  8 20:09:44 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Fri, 8 Aug 2014 12:09:44 -0700
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>	<53E4CD39.7080808@bsc.es>
	<4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>
Message-ID: <OFA962BCA7.ED55EAB6-ON88257D2E.00665F04-88257D2E.00694311@us.ibm.com>

Vic, Sergi,

you can not compare Lustre and GPFS without providing a clear usecase as 
otherwise you compare apple with oranges. 
the reason for this is quite simple, Lustre plays well in pretty much one 
usecase - HPC, GPFS on the other hand is used in many forms of deployments 
from Storage for Virtual Machines, HPC, Scale-Out NAS, Solutions in 
digital media, to hosting some of the biggest, most business critical 
Transactional database installations in the world. you look at 2 products 
with completely different usability spectrum, functions and features 
unless as said above you narrow it down to a very specific usecase with a 
lot of details.
even just HPC has a very large spectrum and not everybody is working in a 
single directory, which is the main scale point for Lustre compared to 
GPFS and the reason is obvious, if you have only 1 active metadata server 
(which is what 99% of all lustre systems run) some operations like single 
directory contention is simpler to make fast, but only up to the limit of 
your one node, but what happens when you need to go beyond that and only a 
real distributed architecture can support your workload ? 
for example look at most chip design workloads, which is a form of HPC, it 
is something thats extremely metadata and small file dominated, you talk 
about 100's of millions (in some cases even billions) of files, majority 
of them <4k, the rest larger files , majority of it with random access 
patterns that benefit from massive client side caching and distributed 
data coherency models supported by GPFS token manager infrastructure 
across 10's or 100's of metadata server and 1000's of compute nodes. 
you also need to look at the rich feature set GPFS provides, which not all 
may be important for some environments but are for others like Snapshot, 
Clones, Hierarchical Storage Management (ILM) , Local Cache acceleration 
(LROC), Global Namespace Wan Integration (AFM), Encryption, etc just to 
name a few. 

Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Vic Cornell <viccornell at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   08/08/2014 10:16 AM
Subject:        Re: [gpfsug-discuss] GPFS and Lustre on same node
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Disclaimers - I work for DDN - we sell lustre and GPFS. I know GPFS much 
better than I know Lustre.

The biggest difference we find between GPFS and Lustre is that GPFS - can 
usually achieve 90% of the bandwidth available to a single client with a 
single thread.

Lustre needs multiple parallel streams to saturate - say an Infiniband 
connection.

Lustre is often faster than GPFS and often has superior metadata 
performance - particularly where lots of files are created in a single 
directory.

GPFS can support Windows - Lustre cannot. I think GPFS is better 
integrated and easier to deploy than Lustre - some people disagree with 
me.

Regards,

Vic


On 8 Aug 2014, at 14:14, Sergi Mor? Codina <sergi.more at bsc.es> wrote:

> Hi all,
> 
> About main differences between GPFS and Lustre, here you have some bits 
from our experience:
> 
> -Reliability: GPFS its been proved to be more stable and reliable. Also 
offers more flexibility in terms of fail-over. It have no restriction in 
number of servers. As far as I know, an NSD can have as many secondary 
servers as you want (we are using 8).
> 
> -Metadata: In Lustre each file system is restricted to two servers. No 
restriction in GPFS.
> 
> -Updates: In GPFS you can update the whole storage cluster without 
stopping production, one server at a time.
> 
> -Server/Client role: As Jeremy said, in GPFS every server act as a 
client as well. Useful for administrative tasks.
> 
> -Troubleshooting: Problems with GPFS are easier to track down. Logs are 
more clear, and offers better tools than Lustre.
> 
> -Support: No problems at all with GPFS support. It is true that it could 
take time to go up within all support levels, but we always got a good 
solution. Quite different in terms of hardware. IBM support quality has 
drop a lot since about last year an a half. Really slow and tedious 
process to get replacements. Moreover, we keep receiving bad "certified 
reutilitzed parts" hardware, which slow the whole process even more.
> 
> 
> These are the main differences I would stand out after some years of 
experience with both file systems, but do not take it as a fact.
> 
> PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI 
a couple of months ago, and has experience working with both file systems 
here at BSC.
> 
> Best Regards,
> Sergi.
> 
> 
> On 08/08/2014 01:40 PM, Jeremy Robst wrote:
>> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>> 
>>> Now, skipping all this GSS rant, which have nothing to do with the 
file
>>> system anyway  and  going back to my question:
>>> 
>>> Could someone point the main differences between GPFS and Lustre?
>> 
>> I'm looking at making the same decision here - to buy GPFS or to roll
>> our own Lustre configuration. I'm in the process of setting up test
>> systems, and so far the main difference seems to be in the that in GPFS
>> each server sees the full filesystem, and so you can run other
>> applications (e.g backup) on a GPFS server whereas the Luste OSS 
(object
>> storage servers) see only a portion of the storage (the filesystem is
>> striped across the OSSes), so you need a Lustre client to mount the 
full
>> filesystem for things like backup.
>> 
>> However I have very little practical experience of either and would 
also
>> be interested in any comments.
>> 
>> Thanks
>> 
>> Jeremy
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
> 
> 
> -- 
> 
> ------------------------------------------------------------------------
> 
>     Sergi More Codina
>     Barcelona Supercomputing Center
>     Centro Nacional de Supercomputacion
>     WWW: http://www.bsc.es      Tel: +34-93-405 42 27
>     e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21
> 
> ------------------------------------------------------------------------
> 
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
> 
> http://www.bsc.es/disclaimer.htm
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/f4de4ccd/attachment-0001.htm>

From kraemerf at de.ibm.com  Sat Aug  9 15:03:02 2014
From: kraemerf at de.ibm.com (Frank Kraemer)
Date: Sat, 9 Aug 2014 16:03:02 +0200
Subject: [gpfsug-discuss] GPFS and Lustre
In-Reply-To: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>
References: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>
Message-ID: <OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>

Vic, Sergi,

from my point of view for real High-End workloads the complete I/O stack
needs to be fine tuned and well understood in order to provide a good
system to the users.

- Application(s)
+ I/O Lib(s)
+ MPI
+ Parallel Filesystem (e.g. GPFS)
+ Hardware (Networks, Servers, Disks, etc.)

One of the best solutions to bring your application very efficently to work
with a Parallel FS is Sionlib from FZ Juelich:

Sionlib is a scalable I/O library for the parallel access to task-local
files. The library not only supports writing and reading binary data to or
from from several thousands of processors into a single or a small number
of physical files but also provides for global open and close functions to
access SIONlib file in parallel. SIONlib provides different interfaces:
parallel access using MPI, OpenMp, or their combination and sequential
access for post-processing utilities.

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/SIONlib/_node.html
http://apps.fz-juelich.de/jsc/sionlib/html/sionlib_tutorial_2013.pdf

-frank-

P.S. Nice blog from Nils

https://www.ibm.com/developerworks/community/blogs/storageneers/entry/scale_out_backup_with_tsm_and_gss_performance_test_results?lang=en

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany


From ewahl at osc.edu  Mon Aug 11 14:55:48 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Mon, 11 Aug 2014 13:55:48 +0000
Subject: [gpfsug-discuss] GPFS and Lustre
In-Reply-To: <OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>
References: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>,
	<OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>
Message-ID: <C59E5201836F7147BAD35189FFBB35D101164CE365@USOAPP09V04P.si.lan>

In a similar vein, IBM has an application transparent "File Cache Library" as well.  I believe it IS licensed and the only requirement is that it is for use on IBM hardware only.  Saw some presentations that mention it in some BioSci talks @SC13 and the numbers for a couple of selected small read applications were awesome. 

I probably have the contact info for it around here somewhere.  In addition to the pdf/user manual.

Ed Wahl
Ohio Supercomputer Center

________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Frank Kraemer [kraemerf at de.ibm.com]
Sent: Saturday, August 09, 2014 10:03 AM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] GPFS and Lustre

Vic, Sergi,

from my point of view for real High-End workloads the complete I/O stack
needs to be fine tuned and well understood in order to provide a good
system to the users.

- Application(s)
+ I/O Lib(s)
+ MPI
+ Parallel Filesystem (e.g. GPFS)
+ Hardware (Networks, Servers, Disks, etc.)

One of the best solutions to bring your application very efficently to work
with a Parallel FS is Sionlib from FZ Juelich:

Sionlib is a scalable I/O library for the parallel access to task-local
files. The library not only supports writing and reading binary data to or
from from several thousands of processors into a single or a small number
of physical files but also provides for global open and close functions to
access SIONlib file in parallel. SIONlib provides different interfaces:
parallel access using MPI, OpenMp, or their combination and sequential
access for post-processing utilities.

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/SIONlib/_node.html
http://apps.fz-juelich.de/jsc/sionlib/html/sionlib_tutorial_2013.pdf

-frank-

P.S. Nice blog from Nils

https://www.ibm.com/developerworks/community/blogs/storageneers/entry/scale_out_backup_with_tsm_and_gss_performance_test_results?lang=en

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From sabujp at gmail.com  Tue Aug 12 23:16:22 2014
From: sabujp at gmail.com (Sabuj Pattanayek)
Date: Tue, 12 Aug 2014 17:16:22 -0500
Subject: [gpfsug-discuss] reduce cnfs failover time to a few seconds
Message-ID: <CAEeMGHvSCrCW-3i6_+xQK5A+6P7wfj_4gOia8iWyyQwe0KA-tQ@mail.gmail.com>

Hi all,

Is there anyway to reduce CNFS failover time to just a few seconds?
Currently it seems like it's taking 5 - 10 minutes. We're using virtual
ip's, i.e. interface bond1.1550:0 has one of the cnfs vips, so it should
be fast, but it takes a long time and sometimes causes processes to
crash due to NFS timeouts (some have 600 second soft mount timeouts).
We've also noticed that it sometimes takes even longer unless the cnfs
system on which we're calling mmshutdown is completely shutdown and
isn't returning pings. Even 1 min seems too long.

For comparison, I'm running ctdb + samba on the other NSDs and it's
able to failover in a few seconds after mmshutdown completes.

Thanks,
Sabuj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140812/3495802f/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Fri Aug 15 14:31:29 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 15 Aug 2014 14:31:29 +0100
Subject: [gpfsug-discuss] gpfs client expels, fs hangind and waiters
Message-ID: <53EE0BB1.8000005@ebi.ac.uk>

Hello people,
Its quite a bit of time that i'm triing to solve a problem to our GPFS 
system, without much luck so i think its time to ask some help.

*First of a bit of introduction:**
*
Our GPFS system is made by 3xgss-26, In other words its made with 6x 
servers ( 4x10g links each) and several disk enclosures SAS attacked. 
The todal amount of spare its roughly 2PB, and the disks are SATA ( 
except few SSD dedicated to logtip ). My metadata and on dedicated 
vdisks, but both data and metadata vdiosks are in the same declustered 
arrays and recovery groups, so in the end they share the same spindles.

The clients its a LSF farm configured as another cluster ( standard 
multiclustering configuration) of  roughly 600 nodes .


*The issue:**
*
Recently we became aware that when some massive io request has been done 
we experience a lot of client expells. Heres an example of our logs:

        Fri Aug 15 12:40:24.680 2014: Expel 10.7.28.34 (gss03a) request
        from 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:41.652 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:45.754 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:52.305 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:17.069 2014: Expel 10.7.28.35 (gss03b) request
        from 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:23.555 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.258 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.540 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:42:57.288 2014: Expel 10.7.35.5 (ebi5-132 in
        ebi-cluster.ebi.ac.uk) request from 10.7.28.34 (gss03a).
        Expelling: 10.7.35.5 (ebi5-132 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:43:24.327 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:44:54.202 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:15:54.450 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:20:16.524 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:26:54.177 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:27:53.900 2014: Expel 10.7.28.3 (gss01b) request
        from 10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:28:24.297 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:29:23.913 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk)

at the same time we experience also long waiters queue (1000+ lines). An 
example in case of massive writes ( dd ) :

        0x7F522E1EEF90 waiting 1.861233182 seconds, NSDThread: on ThCond
        0x7F5158019B08 (0x7F5158019B08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.101 <c0n362>
        0x7F522E1EC9B0 waiting 1.490567470 seconds, NSDThread: on ThCond
        0x7F50F4038BA8 (0x7F50F4038BA8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1EB6C0 waiting 1.077098046 seconds, NSDThread: on ThCond
        0x7F50B40011F8 (0x7F50B40011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.156 <c0n647>
        0x7F522E1EA3D0 waiting 7.714968554 seconds, NSDThread: on ThCond
        0x7F50BC0078B8 (0x7F50BC0078B8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.107 <c0n455>
        0x7F522E1E90E0 waiting 4.774379417 seconds, NSDThread: on ThCond
        0x7F506801B1F8 (0x7F506801B1F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.23 <c0n493>
        0x7F522E1E7DF0 waiting 0.746172444 seconds, NSDThread: on ThCond
        0x7F5094007D78 (0x7F5094007D78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.84 <c0n18>
        0x7F522E1E6B00 waiting 1.553030487 seconds, NSDThread: on ThCond
        0x7F51C0004C78 (0x7F51C0004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.63 <c0n153>
        0x7F522E1E5810 waiting 2.165307633 seconds, NSDThread: on ThCond
        0x7F5178016A08 (0x7F5178016A08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.29 <c0n686>
        0x7F522E1E4520 waiting 1.128089273 seconds, NSDThread: on ThCond
        0x7F5074004D98 (0x7F5074004D98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1E3230 waiting 2.515214328 seconds, NSDThread: on ThCond
        0x7F51F400EF08 (0x7F51F400EF08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.90 <c0n494>
        0x7F522E1E1F40 waiting*162.966840834* seconds, NSDThread: on
        ThCond 0x7F51840207A8 (0x7F51840207A8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.34.97 <c0n6>
        0x7F522E1E0C50 waiting 1.140787288 seconds, NSDThread: on ThCond
        0x7F51AC005C08 (0x7F51AC005C08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.94 <c0n664>
        0x7F522E1DF960 waiting 41.907415248 seconds, NSDThread: on
        ThCond 0x7F5160019038 (0x7F5160019038) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1DE670 waiting 0.466560418 seconds, NSDThread: on ThCond
        0x7F513802B258 (0x7F513802B258) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.168 <c0n598>
        0x7F522E1DD380 waiting 3.102803621 seconds, NSDThread: on ThCond
        0x7F516C0106C8 (0x7F516C0106C8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.91 <c0n143>
        0x7F522E1DC090 waiting 2.751614295 seconds, NSDThread: on ThCond
        0x7F504C0011F8 (0x7F504C0011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.25 <c0n651>
        0x7F522E1DADA0 waiting 5.083691891 seconds, NSDThread: on ThCond
        0x7F507401BE88 (0x7F507401BE88) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1D9AB0 waiting 2.263374184 seconds, NSDThread: on ThCond
        0x7F5080003B98 (0x7F5080003B98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.36 <c0n225>
        0x7F522E1D87C0 waiting 0.206989639 seconds, NSDThread: on ThCond
        0x7F505801F0D8 (0x7F505801F0D8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.55 <c0n498>
        0x7F522E1D74D0 waiting *41.841279897* seconds, NSDThread: on
        ThCond 0x7F5194008B88 (0x7F5194008B88) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1D61E0 waiting 5.618652361 seconds, NSDThread: on ThCond
        0x1BAB868 (0x1BAB868) (MsgRecordCondvar), reason 'RPC wait' for
        getData on node 10.7.35.59 <c0n532>
        0x7F522E1D4EF0 waiting 6.185658427 seconds, NSDThread: on ThCond
        0x7F513802AAE8 (0x7F513802AAE8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.6 <c0n330>
        0x7F522E1D3C00 waiting 2.652370892 seconds, NSDThread: on ThCond
        0x7F5130004C78 (0x7F5130004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1D2910 waiting 11.396142225 seconds, NSDThread: on
        ThCond 0x7F51A401C0C8 (0x7F51A401C0C8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.169 <c0n549>
        0x7F522E1D1620 waiting 63.710723043 seconds, NSDThread: on
        ThCond 0x7F5038004D08 (0x7F5038004D08) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.37.120 <c0n8>


or for massive reads:

        0x7FBCE69A8C20 waiting 29.262629530 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE699CEC0 waiting 29.260869141 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE698C5A0 waiting 29.124824888 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6984110 waiting 22.729479654 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69512C0 waiting 29.272805926 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69409A0 waiting 28.833650198 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6924320 waiting 29.237067128 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6921D40 waiting 29.237953228 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6915FE0 waiting 29.046721161 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6913A00 waiting 29.264534710 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6900B00 waiting 29.267691105 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68F7380 waiting 29.266402464 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68D2870 waiting 29.276298231 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68BADB0 waiting 28.665700576 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68B61F0 waiting 29.236878611 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6885980 waiting *144*.530487248 seconds, NSDThread: on
        ThMutex 0x1803396A670 (0xFFFFC9003396A670) (DiskSchedulingMutex)
        0x7FBCE68833A0 waiting 29.231066610 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68820B0 waiting 29.269954514 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE686A5F0 waiting *140*.662994256 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE6864740 waiting 29.254180742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE683FC30 waiting 29.271840565 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE682E020 waiting 29.200969209 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6825B90 waiting 19.136732919 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6805C40 waiting 29.236055550 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FEAA0 waiting 29.283264161 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FC4C0 waiting 29.268992663 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67DFE40 waiting 29.150900786 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D2DF0 waiting 29.199058463 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D1B00 waiting 29.203199738 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67768D0 waiting 29.208231742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6768590 waiting 5.228192589 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE67672A0 waiting 29.252839376 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6757C70 waiting 28.869359044 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6748640 waiting 29.289284179 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6734450 waiting 29.253591817 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6730B80 waiting 29.289987273 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6720260 waiting 26.597589551 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66F32C0 waiting 29.177692849 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66E3C90 waiting 29.160268518 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66CC1D0 waiting 5.334330188 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE66B3420 waiting 34.274433161 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE668E910 waiting 27.699999488 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6689D50 waiting 34.279090465 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE66805D0 waiting 24.688626241 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6675B60 waiting 35.367745840 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE665E0A0 waiting 29.235994598 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE663CE60 waiting 29.162911979 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'


Another example with mmfsadm in case of massive reads:

        [root at gss02b ~]# mmfsadm dump waiters
        0x7F519000AEA0 waiting 28.915010347 seconds, replyCleanupThread:
        on ThCond 0x7F51101B27B8 (0x7F51101B27B8) (MsgRecordCondvar),
        reason 'RPC wait'
        0x7F511C012A10 waiting 279.522206863 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F52000095F8 (0x7F52000095F8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5120000B80 waiting 279.524782437 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5214000EE8 (0x7F5214000EE8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5154006310 waiting 138.164386224 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5174003F08 (0x7F5174003F08)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F522E1EB6C0 waiting 23.060703000 seconds, NSDThread: for poll
        on sock 85
        0x7F522E1E6B00 waiting 0.068456104 seconds, NSDThread: on ThCond
        0x7F50CC00E478 (0x7F50CC00E478) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1D0330 waiting 17.207907857 seconds, NSDThread: on
        ThCond 0x7F5078001688 (0x7F5078001688) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E1BFA10 waiting 0.181011711 seconds, NSDThread: on ThCond
        0x7F504000E558 (0x7F504000E558) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B4FA0 waiting 0.021780338 seconds, NSDThread: on ThCond
        0x7F522000E488 (0x7F522000E488) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B3CB0 waiting 0.794718000 seconds, NSDThread: for poll
        on sock 799
        0x7F522E186D10 waiting 0.191606803 seconds, NSDThread: on ThCond
        0x7F5184015D58 (0x7F5184015D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E184730 waiting 0.025562000 seconds, NSDThread: for poll
        on sock 867
        0x7F522E12CDD0 waiting 0.008921000 seconds, NSDThread: for poll
        on sock 543
        0x7F522E126F20 waiting 1.459531000 seconds, NSDThread: for poll
        on sock 983
        0x7F522E10F460 waiting 17.177936972 seconds, NSDThread: on
        ThCond 0x7F51EC002CE8 (0x7F51EC002CE8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E101120 waiting 17.232580316 seconds, NSDThread: on
        ThCond 0x7F51BC005BB8 (0x7F51BC005BB8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0F1AF0 waiting 438.556030000 seconds, NSDThread: for
        poll on sock 496
        0x7F522E0E7080 waiting 393.702839774 seconds, NSDThread: on
        ThCond 0x7F5164013668 (0x7F5164013668) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E09DA60 waiting 52.746984660 seconds, NSDThread: on
        ThCond 0x7F506C008858 (0x7F506C008858) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E084CB0 waiting 23.096688206 seconds, NSDThread: on
        ThCond 0x7F521C008E18 (0x7F521C008E18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0839C0 waiting 0.093456000 seconds, NSDThread: for poll
        on sock 962
        0x7F522E076970 waiting 2.236659731 seconds, NSDThread: on ThCond
        0x7F51E0027538 (0x7F51E0027538) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E044E10 waiting 52.752497765 seconds, NSDThread: on
        ThCond 0x7F513802BDD8 (0x7F513802BDD8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E033200 waiting 16.157355796 seconds, NSDThread: on
        ThCond 0x7F5104240D58 (0x7F5104240D58) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E02AD70 waiting 436.025203220 seconds, NSDThread: on
        ThCond 0x7F50E0016C28 (0x7F50E0016C28) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E01A450 waiting 393.673252777 seconds, NSDThread: on
        ThCond 0x7F50A8009C18 (0x7F50A8009C18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DFE0460 waiting 1.781358358 seconds, NSDThread: on ThCond
        0x7F51E0027638 (0x7F51E0027638) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF99420 waiting 0.038405427 seconds, NSDThread: on ThCond
        0x7F50F0172B18 (0x7F50F0172B18) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF7CDA0 waiting 438.204625355 seconds, NSDThread: on
        ThCond 0x7F50900023D8 (0x7F50900023D8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF76EF0 waiting 435.903645734 seconds, NSDThread: on
        ThCond 0x7F5084004BC8 (0x7F5084004BC8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF74910 waiting 21.749325022 seconds, NSDThread: on
        ThCond 0x7F507C011F48 (0x7F507C011F48) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF71040 waiting 1.027274000 seconds, NSDThread: for poll
        on sock 866
        0x7F522DF536D0 waiting 52.953847324 seconds, NSDThread: on
        ThCond 0x7F5200006FF8 (0x7F5200006FF8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF510F0 waiting 0.039278000 seconds, NSDThread: for poll
        on sock 837
        0x7F522DF4EB10 waiting 0.085745937 seconds, NSDThread: on ThCond
        0x7F51F0006828 (0x7F51F0006828) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF4C530 waiting 21.850733000 seconds, NSDThread: for poll
        on sock 986
        0x7F522DF4B240 waiting 0.054739884 seconds, NSDThread: on ThCond
        0x7F51EC0168D8 (0x7F51EC0168D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF48C60 waiting 0.186409714 seconds, NSDThread: on ThCond
        0x7F51E4000908 (0x7F51E4000908) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF41AC0 waiting 438.942861290 seconds, NSDThread: on
        ThCond 0x7F51CC010168 (0x7F51CC010168) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF3F4E0 waiting 0.060235106 seconds, NSDThread: on ThCond
        0x7F51C400A438 (0x7F51C400A438) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF22E60 waiting 0.361288000 seconds, NSDThread: for poll
        on sock 518
        0x7F522DF21B70 waiting 0.060722464 seconds, NSDThread: on ThCond
        0x7F51580162D8 (0x7F51580162D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF12540 waiting 23.077564448 seconds, NSDThread: on
        ThCond 0x7F512C13E1E8 (0x7F512C13E1E8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEFD060 waiting 0.723370000 seconds, NSDThread: for poll
        on sock 503
        0x7F522DEE09E0 waiting 1.565799175 seconds, NSDThread: on ThCond
        0x7F5084004D58 (0x7F5084004D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDF6F0 waiting 22.063017342 seconds, NSDThread: on
        ThCond 0x7F5078003E08 (0x7F5078003E08) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEDD110 waiting 0.049108780 seconds, NSDThread: on ThCond
        0x7F5070001D78 (0x7F5070001D78) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDAB30 waiting 229.603224376 seconds, NSDThread: on
        ThCond 0x7F50680221B8 (0x7F50680221B8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DED7260 waiting 0.071855457 seconds, NSDThread: on ThCond
        0x7F506400A5A8 (0x7F506400A5A8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DED5F70 waiting 0.648324000 seconds, NSDThread: for poll
        on sock 766
        0x7F522DEC3070 waiting 1.809205756 seconds, NSDThread: on ThCond
        0x7F522000E518 (0x7F522000E518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEB1460 waiting 436.017396645 seconds, NSDThread: on
        ThCond 0x7F51E4000978 (0x7F51E4000978) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEAC8A0 waiting 393.734102000 seconds, NSDThread: for
        poll on sock 609
        0x7F522DEA3120 waiting 17.960778837 seconds, NSDThread: on
        ThCond 0x7F51B4001708 (0x7F51B4001708) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE86AA0 waiting 23.112060045 seconds, NSDThread: on
        ThCond 0x7F5154096118 (0x7F5154096118) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE64570 waiting 0.076167410 seconds, NSDThread: on ThCond
        0x7F50D8005EF8 (0x7F50D8005EF8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DE1AF50 waiting 17.460836000 seconds, NSDThread: for poll
        on sock 737
        0x7F522DE104E0 waiting 0.205037000 seconds, NSDThread: for poll
        on sock 865
        0x7F522DDB8B80 waiting 0.106192000 seconds, NSDThread: for poll
        on sock 78
        0x7F522DDA36A0 waiting 0.738921180 seconds, NSDThread: on ThCond
        0x7F505400E048 (0x7F505400E048) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD9C500 waiting 0.731118367 seconds, NSDThread: on ThCond
        0x7F503C00B518 (0x7F503C00B518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD89600 waiting 229.609363000 seconds, NSDThread: for
        poll on sock 515
        0x7F522DD567B0 waiting 1.508489195 seconds, NSDThread: on ThCond
        0x7F514C021F88 (0x7F514C021F88) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'


Another thing worth to mention is that the filesystem its totaly 
unresponsive. Even a simple "cd" to a directory or an ls to a directory 
just hangs for several minutes ( litterally). This happens also if i try 
from the NSD servers.


*Few things i have looked into:*
* Our network seems fine, there might be some bottleneck on part of 
them, and this could explain the waiters, but doesnt explain why ad some 
poit those client ask to expel the NSD servers. THis also doesn't 
justify why the FS is slow even on NSD itself.

* Disk bottleneck? i dont think so. NSD servers have cpu usage  (and io 
wait ) very low. Also mmdiag --iohist seems condirming that the 
operation on the disks are reasonable fast:


        === mmdiag: iohist ===

        I/O history:

          I/O start time RW    Buf type disk:sectorNum     nSec  time
        ms  Type  Device/NSD ID         NSD server
        --------------- -- ----------- ----------------- ----- ------- 
        ---- ------------------ ---------------
        13:54:29.209276  W        data   34:5066338808    2056 88.307 
        lcl  sdtu
        13:54:29.209277  W        data   55:5095698936    2056 27.592 
        lcl  sdaab
        13:54:29.209278  W        data  171:5104087544    2056 22.801 
        lcl  sdtg
        13:54:29.209279  W        data  116:5011812856    2056 65.983 
        lcl  sdqr
        13:54:29.209280  W        data   98:4860817912    2056 17.892 
        lcl  sddl
        13:54:29.209281  W        data  159:4999229944    2056 21.324 
        lcl  sdjg
        13:54:29.209282  W        data   84:5049561592    2056 31.932 
        lcl  sdqz
        13:54:29.209283  W        data    8:5003424248    2056 30.912 
        lcl  sdcw
        13:54:29.209284  W        data   23:4965675512    2056 27.366 
        lcl  sdpt
        13:54:29.297715  W  vdiskMDLog    2:144008496        1 0.236 
        lcl  sdkr
        13:54:29.297717  W  vdiskMDLog    0:331703600        1 0.230 
        lcl  sdcm
        13:54:29.297718  W  vdiskMDLog    1:273769776        1 0.241 
        lcl  sdbp
        13:54:29.244902  W        data   51:3857589752    2056 35.566 
        lcl  sdyi
        13:54:29.244904  W        data   10:3773703672    2056 28.512 
        lcl  sdma
        13:54:29.244905  W        data   48:3639485944    2056 24.124 
        lcl  sdel
        13:54:29.244906  W        data   25:3777897976    2056 18.691 
        lcl  sdgt
        13:54:29.244908  W        data   91:3832423928    2056 20.699 
        lcl  sdlc
        13:54:29.244909  W        data  115:3723372024    2056 30.783 
        lcl  sdho
        13:54:29.244910  W        data  173:3882755576    2056 53.241 
        lcl  sdti
        13:54:29.244911  W        data   42:3782092280    2056 22.785 
        lcl  sddz
        13:54:29.244912  W        data   45:3647874552    2056 24.289 
        lcl  sdei
        13:54:29.244913  W        data   32:3652068856    2056 17.220 
        lcl  sdbn
        13:54:29.244914  W        data   39:3677234680    2056 26.017 
        lcl  sddw
        13:54:29.298273  W  vdiskMDLog    2:144008497        1 2.522 
        lcl  sduf
        13:54:29.298274  W  vdiskMDLog    0:331703601        1 1.025 
        lcl  sdlo
        13:54:29.298275  W  vdiskMDLog    1:273769777        1 2.586 
        lcl  sdtt
        13:54:29.288275  W        data   27:2249588200    2056 20.071 
        lcl  sdhb
        13:54:29.288279  W        data   33:2224422376    2056 19.682 
        lcl  sdts
        13:54:29.288281  W        data   47:2115370472    2056 21.667 
        lcl  sdwo
        13:54:29.288282  W        data   82:2316697064    2056 21.524 
        lcl  sdxy
        13:54:29.288283  W        data   85:2232810984    2056 17.467 
        lcl  sdra
        13:54:29.288285  W        data   30:2127953384    2056 18.475 
        lcl  sdqg
        13:54:29.288286  W        data   67:1876295144    2056 16.383 
        lcl  sdmx
        13:54:29.288287  W        data   64:2127953384    2056 21.908 
        lcl  sduh
        13:54:29.288288  W        data   38:2253782504    2056 19.775 
        lcl  sddv
        13:54:29.288290  W        data   15:2207645160    2056 20.599 
        lcl  sdet
        13:54:29.288291  W        data  157:2283142632    2056 21.198 
        lcl  sdiy


Bonding problem on the interfaces? Mellanox ( interface card prodicer) 
drivers and firmware updated, and we even tested the system with a 
single link ( without bonding).


Could someone help me with this? in particular:

* What exactly are client are looking to decide that another node is 
unresponsive? Ping? i dont think so because both NSD servers and clients 
can be pinged, so what they look? if comeone can also specify what port 
are they using i can try to tcpdump what exactly is cauding this expell.

* How can i monitor metadata operations to understand where EXACTLY is 
the bottleneck that causes this:

        [sdinardo at ebi5-001 ~]$ time ls /gpfs/nobackup/sdinardo

        1                   ebi3-054.ebi.ac.uk ebi3-154           
        ebi5-019.ebi.ac.uk  ebi5-052 ebi5-101           
        ebi5-156            ebi5-197 ebi5-228            ebi5-262.ebi.ac.uk
        10                  ebi3-055 ebi3-155           
        ebi5-021.ebi.ac.uk  ebi5-053 ebi5-104.ebi.ac.uk 
        ebi5-160.ebi.ac.uk  ebi5-198 ebi5-229            ebi5-263
        2                   ebi3-056.ebi.ac.uk ebi3-156           
        ebi5-022            ebi5-054.ebi.ac.uk ebi5-106           
        ebi5-161            ebi5-200 ebi5-230.ebi.ac.uk  ebi5-264
        3                   ebi3-057 ebi3-157           
        ebi5-023            ebi5-056 ebi5-109           
        ebi5-162.ebi.ac.uk  ebi5-201 ebi5-231.ebi.ac.uk  ebi5-265
        4                   ebi3-058 ebi3-158.ebi.ac.uk 
        ebi5-024.ebi.ac.uk  ebi5-057 ebi5-110.ebi.ac.uk 
        ebi5-163.ebi.ac.uk  ebi5-202.ebi.ac.uk ebi5-232           
        ebi5-266.ebi.ac.uk
        5                   ebi3-059.ebi.ac.uk ebi3-160           
        ebi5-025            ebi5-060 ebi5-111.ebi.ac.uk 
        ebi5-164            ebi5-204 ebi5-233            ebi5-267
        6                   ebi3-132 ebi3-161.ebi.ac.uk 
        ebi5-026            ebi5-061.ebi.ac.uk ebi5-112.ebi.ac.uk 
        ebi5-165            ebi5-205 ebi5-234            ebi5-269.ebi.ac.uk
        7                   ebi3-133 ebi3-163.ebi.ac.uk 
        ebi5-028            ebi5-062.ebi.ac.uk ebi5-129.ebi.ac.uk 
        ebi5-166            ebi5-206.ebi.ac.uk ebi5-236            ebi5-270
        8                   ebi3-134 ebi3-165           
        ebi5-030            ebi5-064 ebi5-131.ebi.ac.uk 
        ebi5-169.ebi.ac.uk  ebi5-207 ebi5-237            ebi5-271
        9                   ebi3-135 ebi3-166.ebi.ac.uk 
        ebi5-031            ebi5-065 ebi5-132           
        ebi5-170.ebi.ac.uk  ebi5-209 ebi5-239.ebi.ac.uk  launcher.sh

        _*real    21m14.948s*_( WTH ?!?!?!)
        user    0m0.004s
        sys    0m0.014s


I know that the question are not easy to answer, and i need to dig more, 
but could be very helpful if someone give me some hints about where to 
look at. My gpfs skills are limited since this is our first system and 
is in production for just few months, and the things stated to worsen 
just recenlty. In past we could get over 200Gb/s ( both read and write) 
without any issue. Now some clients get expelled even when data 
thoughuput is ad 4-5Gb/s.

Thanks in advance for any help.

Regards,
Salvatore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140815/3eaa5bc1/attachment-0001.htm>

From mail at arif-ali.co.uk  Tue Aug 19 11:18:10 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 11:18:10 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
Message-ID: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>

Hi all,

You may be aware that the website has been down for about a week now. This
is due to the amount of traffic to the website and the amount of people on
the mailing list, we had seen a few issues on the system.

In order to counter the issues, we are moving to a new system to counter
any future issues, and ease of management. We are hoping to do this tonight
( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
please let me know.

I will, as part of the move over, will be sending a few test mails to make
sure that mailing list is working correctly.

Thanks for your patience

--
Arif Ali
gpfsug Admin

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/bac2c92c/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Tue Aug 19 12:11:00 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 19 Aug 2014 12:11:00 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F330C4.808@ebi.ac.uk>

Still problems. Here some more detailed examples:

*EXAMPLE 1:*

            *EBI5-220**( CLIENT)**
            *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
            reply from node <GSS02B IP> gss02b*
            Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
            (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
            GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:04.982 2014: This node will be expelled
            from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
            IP> (ebi5-220)
            Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
            broke. Probing cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.
            Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked. 
            File system: gpfs1  Reason: SGPanic
            Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
            GSS.ebi.ac.uk) is now the Group Leader.

            *GSS02B ( NSD SERVER)*
            ...
            Tue Aug 19 11:03:17.070 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:25.016 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:28.080 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:36.019 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:39.083 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:47.023 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:50.088 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:52.218 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:58.030 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:01.092 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:03.220 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:09.034 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:12.096 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:14.224 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:20.037 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:23.103 2014: Accepted and connected to
            *<EBI5-220 IP>* ebi5-220 <c0n618>
            ...

            *GSS02a ( NSD SERVER)*
            Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
            request from <EBI5-220 IP> (ebi5-220 in
            ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
            in ebi-cluster.ebi.ac.uk)
            Tue Aug 19 11:03:12.069 2014: Accepted and connected to
            <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
*EXAMPLE 2*:

            *EBI5-038*
            Tue Aug 19 11:32:34.227 2014: *Disk lease period expired in
            cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
            Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
            cluster GSS.ebi.ac.uk*
            Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014
            IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting
            reconnect.
            ...
            LOT MORE RESETS BY PEER
            ...
            Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167
            IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
            gss02a <c1n2>
            Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
            IP> gss02a <c1n2>
            Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.*

            *GSS02a*
            Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
            in ebi-cluster.ebi.ac.uk) *is being expelled because of an
            expired lease.* Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the 
servers seems working fine.. how can i trace better ( to solve) the 
problem?

In example 2 it seems to me that for some reason the manager are not 
renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how 
can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/9b7c2042/attachment-0001.htm>

From mail at arif-ali.co.uk  Tue Aug 19 20:59:47 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 20:59:47 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
In-Reply-To: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
References: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
Message-ID: <CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>

This is a test mail to the mailing list

please do not reply

--
Arif Ali

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali


On 19 August 2014 11:18, Arif Ali <mail at arif-ali.co.uk> wrote:

> Hi all,
>
> You may be aware that the website has been down for about a week now. This
> is due to the amount of traffic to the website and the amount of people on
> the mailing list, we had seen a few issues on the system.
>
> In order to counter the issues, we are moving to a new system to counter
> any future issues, and ease of management. We are hoping to do this tonight
> ( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
> please let me know.
>
> I will, as part of the move over, will be sending a few test mails to make
> sure that mailing list is working correctly.
>
> Thanks for your patience
>
> --
> Arif Ali
> gpfsug Admin
>
> IRC: arif-ali at freenode
> LinkedIn: http://uk.linkedin.com/in/arifali
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/c2067414/attachment-0001.htm>

From mail at arif-ali.co.uk  Tue Aug 19 23:41:48 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 23:41:48 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
In-Reply-To: <CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>
References: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
	<CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>
Message-ID: <CAM0VtDa4pX8hi8VGkjkYYuxrW=tySdaXScOeBayHxwhcuUkAjg@mail.gmail.com>

Thanks for all your patience,

The service should all be back up again

--
Arif Ali

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali


On 19 August 2014 20:59, Arif Ali <mail at arif-ali.co.uk> wrote:

> This is a test mail to the mailing list
>
> please do not reply
>
> --
> Arif Ali
>
> IRC: arif-ali at freenode
> LinkedIn: http://uk.linkedin.com/in/arifali
>
>
> On 19 August 2014 11:18, Arif Ali <mail at arif-ali.co.uk> wrote:
>
>> Hi all,
>>
>> You may be aware that the website has been down for about a week now.
>> This is due to the amount of traffic to the website and the amount of
>> people on the mailing list, we had seen a few issues on the system.
>>
>> In order to counter the issues, we are moving to a new system to counter
>> any future issues, and ease of management. We are hoping to do this tonight
>> ( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
>> please let me know.
>>
>> I will, as part of the move over, will be sending a few test mails to
>> make sure that mailing list is working correctly.
>>
>> Thanks for your patience
>>
>> --
>> Arif Ali
>> gpfsug Admin
>>
>> IRC: arif-ali at freenode
>> LinkedIn: http://uk.linkedin.com/in/arifali
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/a82bb0f9/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Wed Aug 20 08:57:23 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 20 Aug 2014 08:57:23 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F454E3.40803@ebi.ac.uk>

Still problems. Here some more detailed examples:

*EXAMPLE 1:*

            *EBI5-220**( CLIENT)**
            *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
            reply from node <GSS02B IP> gss02b*
            Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
            (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
            GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:04.982 2014: This node will be expelled
            from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
            IP> (ebi5-220)
            Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
            broke. Probing cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.
            Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked. 
            File system: gpfs1  Reason: SGPanic
            Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
            GSS.ebi.ac.uk) is now the Group Leader.

            *GSS02B ( NSD SERVER)*
            ...
            Tue Aug 19 11:03:17.070 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:25.016 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:28.080 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:36.019 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:39.083 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:47.023 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:50.088 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:52.218 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:58.030 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:01.092 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:03.220 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:09.034 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:12.096 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:14.224 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:20.037 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:23.103 2014: Accepted and connected to
            *<EBI5-220 IP>* ebi5-220 <c0n618>
            ...

            *GSS02a ( NSD SERVER)*
            Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
            request from <EBI5-220 IP> (ebi5-220 in
            ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
            in ebi-cluster.ebi.ac.uk)
            Tue Aug 19 11:03:12.069 2014: Accepted and connected to
            <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
*EXAMPLE 2*:

            *EBI5-038*
            Tue Aug 19 11:32:34.227 2014: *Disk lease period expired in
            cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
            Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
            cluster GSS.ebi.ac.uk*
            Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014
            IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting
            reconnect.
            ...
            LOT MORE RESETS BY PEER
            ...
            Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167
            IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
            gss02a <c1n2>
            Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
            IP> gss02a <c1n2>
            Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.*

            *GSS02a*
            Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
            in ebi-cluster.ebi.ac.uk) *is being expelled because of an
            expired lease.* Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the 
servers seems working fine.. how can i trace better ( to solve) the 
problem?

In example 2 it seems to me that for some reason the manager are not 
renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how 
can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140820/b9977ac0/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Wed Aug 20 09:03:03 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 20 Aug 2014 09:03:03 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F454E3.40803@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
Message-ID: <53F45637.8080000@ebi.ac.uk>

Another interesting case about a specific waiter:

was looking the waiters on GSS until i found those( i got those info 
collecting from all the servers with a script i did, so i was able to 
trace hanging connection while they was happening):

                gss03b.ebi.ac.uk:*235.373993397*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss03b.ebi.ac.uk:*235.152271998*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss02a.ebi.ac.uk:*214.079093620 *(MsgRecordCondvar),
                reason 'RPC wait' for tmMsgRevoke on node 10.7.34.109
                <c0n656>
                gss02a.ebi.ac.uk:*213.580199240 *(MsgRecordCondvar),
                reason 'RPC wait' for tmMsgRevoke on node 10.7.37.109
                <c0n675>
                gss03b.ebi.ac.uk:*132.375138082*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss03b.ebi.ac.uk:*132.374973884 *(MsgRecordCondvar),
                reason 'RPC wait' for commMsgCheckMessages on node
                10.7.37.109 <c0n675>


the bolted number are seconds. looking at this page:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/Interpreting+GPFS+Waiter+Information

The web page claim that's, probably a network congestion, but i managed 
to login quick enough to the client and there the waiters was:

                [root at ebi5-236 ~]# mmdiag --waiters

                === mmdiag: waiters ===
                0x7F6690073460 waiting 147.973009173 seconds,
                RangeRevokeWorkerThread: on ThCond 0x1801E43F6A0
                (0xFFFFC9001E43F6A0) (LkObjCondvar), reason 'waiting for
                LX lock'
                0x7F65100036D0 waiting 140.458589856 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6500000F98
                (0x7F6500000F98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F63A0001080 waiting 245.153055801 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65D8002CF8
                (0x7F65D8002CF8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C03D3D0 waiting 245.750977203 seconds,
                CleanBufferThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait for buffer write to complete'
                0x7F674802E360 waiting 244.159861966 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65E0002358
                (0x7F65E0002358) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C038810 waiting 251.086748430 seconds,
                SGExceptionLogBufferFullThread: on ThCond 0x7F64EC001398
                (0x7F64EC001398) (MsgRecordCondvar), reason 'RPC wait'
                for I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C036230 waiting 139.556735095 seconds,
                CleanBufferThread: on ThCond 0x7F65CC004C78
                (0x7F65CC004C78) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C031670 waiting 144.327593052 seconds,
                WritebehindWorkerThread: on ThCond 0x7F672402D1A8
                (0x7F672402D1A8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C02A4D0 waiting 145.202712821 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65440018F8
                (0x7F65440018F8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C0291E0 waiting 247.131569232 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65740016C8
                (0x7F65740016C8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6748025BD0 waiting 11.631381523 seconds,
                replyCleanupThread: on ThCond 0x7F65E000A1F8
                (0x7F65E000A1F8) (MsgRecordCondvar), reason 'RPC wait'
                0x7F6748022300 waiting 245.616267612 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6470001468
                (0x7F6470001468) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6748021010 waiting 230.769670930 seconds,
                InodeAllocRevokeWorkerThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait for buffer write to complete'
                0x7F674801B160 waiting 245.830554594 seconds,
                UnusedInodePrefetchThread: on ThCond 0x7F65B8004438
                (0x7F65B8004438) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674800A820 waiting 252.332932000 seconds, Msg
                handler getData: for poll on sock 109
                0x7F63F4023090 waiting 253.073535042 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65C4000CC8
                (0x7F65C4000CC8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64A4000CE0 waiting 145.049659249 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6560000A98
                (0x7F6560000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6778006D00 waiting 142.124664264 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63DC000C08
                (0x7F63DC000C08) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780046D0 waiting 251.751439453 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6454000A98
                (0x7F6454000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780E4B70 waiting 142.431051232 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63C80010D8
                (0x7F63C80010D8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780E50D0 waiting 244.339624817 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65BC001B98
                (0x7F65BC001B98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6434000B40 waiting 145.343700410 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63B00036E8
                (0x7F63B00036E8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F670C0187A0 waiting 244.903963969 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F0000FB8
                (0x7F65F0000FB8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C04E2F0 waiting 245.837137631 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65A4000A98
                (0x7F65A4000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C04AA20 waiting 139.713993908 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6454002478
                (0x7F6454002478) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C049730 waiting 252.434187472 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F4003708
                (0x7F65F4003708) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C044B70 waiting 131.515829048 seconds, Msg
                handler ccMsgPing: on ThCond 0x7F64DC1D4888
                (0x7F64DC1D4888) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F6758008DE0 waiting 149.548547226 seconds, Msg
                handler getData: on ThCond 0x7F645C002458
                (0x7F645C002458) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F67580071D0 waiting 149.548543118 seconds, Msg
                handler commMsgCheckMessages: on ThCond 0x7F6450001C48
                (0x7F6450001C48) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F65A40052B0 waiting 11.498507001 seconds, Msg handler
                ccMsgPing: on ThCond 0x7F644C103F88 (0x7F644C103F88)
                (InuseCondvar), reason 'waiting for exclusive use of
                connection for sending msg'
                0x7F6448001620 waiting 139.844870446 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F0003098
                (0x7F65F0003098) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F63F4000F80 waiting 245.044791905 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6450001188
                (0x7F6450001188) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F659C0033A0 waiting 243.464399305 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6554002598
                (0x7F6554002598) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6514001690 waiting 245.826160463 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65A4004558
                (0x7F65A4004558) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64800012B0 waiting 253.174835511 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65E0000FB8
                (0x7F65E0000FB8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6510000EE0 waiting 140.746696039 seconds,
                WritebehindWorkerThread: on ThCond 0x7F647C000CC8
                (0x7F647C000CC8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6754001BB0 waiting 246.336055629 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6594002498
                (0x7F6594002498) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6420000930 waiting 140.606777450 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6578002498
                (0x7F6578002498) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6744009110 waiting 137.466372831 seconds,
                FileBlockReadFetchHandlerThread: on ThCond
                0x7F65F4007158 (0x7F65F4007158) (MsgRecordCondvar),
                reason 'RPC wait' for NSD I/O completion on node
                10.7.28.35 <c1n5>
                0x7F67280119F0 waiting 144.173427360 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6504000AE8
                (0x7F6504000AE8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F672800BB40 waiting 145.804301887 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6550001038
                (0x7F6550001038) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6728000910 waiting 252.601993452 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6450000A98
                (0x7F6450000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6744007E20 waiting 251.603329204 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6570004C18
                (0x7F6570004C18) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64AC002EF0 waiting 139.205774422 seconds,
                FileBlockWriteFetchHandlerThread: on ThCond
                0x18020AF0260 (0xFFFFC90020AF0260)
                (FetchFlowControlCondvar), reason 'wait for buffer for
                fetch'
                0x7F6724013050 waiting 71.501580932 seconds, Msg handler
                ccMsgPing: on ThCond 0x7F6580006608 (0x7F6580006608)
                (InuseCondvar), reason 'waiting for exclusive use of
                connection for sending msg'
                0x7F661C000DA0 waiting 245.654985276 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6570005288
                (0x7F6570005288) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C00F440 waiting 251.096002003 seconds,
                FileBlockReadFetchHandlerThread: on ThCond
                0x7F65BC002878 (0x7F65BC002878) (MsgRecordCondvar),
                reason 'RPC wait' for NSD I/O completion on node
                10.7.28.35 <c1n5>
                0x7F671C00E150 waiting 144.034006970 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6528001548
                (0x7F6528001548) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A02FCD20 waiting 142.324070945 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6580002A98
                (0x7F6580002A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A02FA330 waiting 200.670114385 seconds,
                EEWatchDogThread: on ThCond 0x7F65B0000A98
                (0x7F65B0000A98) (MsgRecordCondvar), reason 'RPC wait'
                0x7F67A02BF050 waiting 252.276161189 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6584003998
                (0x7F6584003998) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A0004160 waiting 251.173651822 seconds,
                SyncHandlerThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait on force active buffer write'


So from the client side its the client that's waiting the server. I 
managed also to ping, ssh, and   tcpdump each other before the node got 
expelled and discovered that ping works fine, ssh work fine , beside my 
tests there are  0 packet passing between them, LITERALLY.

So there is no congestion, no network issues, but the server waits for 
the client and the client waits the server. This happens until we reach 
350 secs ( 10 times the lease time) , then client get expelled.
There are no local io waiters that indicates that gss is struggling, 
there is plenty of bandwith and CPU resources and no network congestion.

Seems some sort of deadlock to me, but how can this be explained and 
hopefully fixed?

Regards,
Salvatore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140820/77aceb5a/attachment-0001.htm>

From chair at gpfsug.org  Thu Aug 21 09:20:39 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Thu, 21 Aug 2014 09:20:39 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F454E3.40803@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
Message-ID: <53F5ABD7.80107@gpfsug.org>

Hi there,

   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
better term now GSS is out) and seen ping 'working', but alongside 
ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - 
and rightly so.

In my experience this has _always_ been a network issue of one sort of 
another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen 
that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to 
determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly 
or via your OEM and collect and supply a snap and traces as required by 
support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
> Still problems. Here some more detailed examples:
>
> *EXAMPLE 1:*
>
>             *EBI5-220**( CLIENT)**
>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>             reply from node <GSS02B IP> gss02b*
>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
>             (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
>             GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>             from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
>             IP> (ebi5-220)
>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>             broke. Probing cluster GSS.ebi.ac.uk
>             Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
>             nodes during cluster probe.
>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>             GSS.ebi.ac.uk. Unmounting file systems.
>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>             invoked.  File system: gpfs1  Reason: SGPanic
>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>             gss02a <c1p687>
>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>             gss02a <c1p687>
>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>             gss02b <c1p686>
>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>             gss03b <c1p685>
>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>             gss03a <c1p684>
>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>             gss01b <c1p683>
>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>             gss01a <c1p1>
>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>             gss02b <c1p686>
>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>             gss03b <c1p685>
>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>             gss03a <c1p684>
>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>             gss01b <c1p683>
>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>             gss01a <c1p1>
>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
>             GSS.ebi.ac.uk) is now the Group Leader.
>
>             *GSS02B ( NSD SERVER)*
>             ...
>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>             ...
>
>             *GSS02a ( NSD SERVER)*
>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>             request from <EBI5-220 IP> (ebi5-220 in
>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
>             in ebi-cluster.ebi.ac.uk)
>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>             <EBI5-220 IP> ebi5-220 <c0n618>
>
>
> ===============================================
> *EXAMPLE 2*:
>
>             *EBI5-038*
>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>             cluster GSS.ebi.ac.uk*
>             Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection reset by peer). Attempting
>             reconnect.
>             Tue Aug 19 11:35:24.865 2014: Close connection to
>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>             peer). Attempting reconnect.
>             ...
>             LOT MORE RESETS BY PEER
>             ...
>             Tue Aug 19 11:35:25.096 2014: Close connection to
>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>             peer). Attempting reconnect.
>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>             gss02a <c1n2>
>             Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection failed because destination
>             is still processing previous node failure)
>             Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
>             IP> gss02a <c1n2>
>             Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection failed because destination
>             is still processing previous node failure)
>             Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
>             nodes during cluster probe.
>             Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
>             GSS.ebi.ac.uk. Unmounting file systems.*
>
>             *GSS02a*
>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
>             in ebi-cluster.ebi.ac.uk) *is being expelled because of an
>             expired lease.* Pings sent: 60. Replies received: 60.
>
>
>
>
> In example 1 seems that an NSD was not repliyng to the client, but the 
> servers seems working fine.. how can i trace better ( to solve) the 
> problem?
>
> In example 2 it seems to me that for some reason the manager are not 
> renewing the lease in time. when this happens , its not a single client.
> Loads of them fail to get the lease renewed. Why this is happening? 
> how can i trace to the source of the problem?
>
>
>
> Thanks in advance for any tips.
>
> Regards,
> Salvatore
>
>
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/9039306e/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Thu Aug 21 10:04:47 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 10:04:47 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5ABD7.80107@gpfsug.org>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>
Message-ID: <53F5B62F.1060305@ebi.ac.uk>

Thanks for the feedback, but we managed to find a scenario that excludes 
network problems.

we have a file called */input_file/* of nearly 100GB:

if from *client A* we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they 
flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the 
same client) *client B* :

cat input_file >> output_file


  ( in other words we are trying to write to the same destination) all 
the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for 
one of the "cat", so have to wait ( and since the file is BIG , have to 
wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel 
the node? We can reproduce this every time, and since our users to 
operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to 
the same destination, its also quite common if we want to dump to a log 
file logs and for some reason one of the writers, write for a lot of 
time keeping the file locked.
Our expels are not due to network congestion, but because a write 
attempts have to wait another one. What i really dont understand is why 
to take a so expreme mesure to expell jest because a process is waiteing 
"to too much time".


I have ticket opened to IBM for this and the issue is under 
investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
> Hi there,
>
>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
> better term now GSS is out) and seen ping 'working', but alongside 
> ejections from the cluster.
> The GPFS internode 'ping' is somewhat more circumspect than unix ping 
> - and rightly so.
>
> In my experience this has _always_ been a network issue of one sort of 
> another.  If the network is experiencing issues, nodes will be ejected.
> Of course it could be unresponsive mmfsd or high loadavg, but I've 
> seen that only twice in 10 years over many versions of GPFS.
>
> You need to follow the logs through from each machine in time order to 
> determine who could not see who and in what order.
> Your best way forward is to log a SEV2 case with IBM support, directly 
> or via your OEM and collect and supply a snap and traces as required 
> by support.
>
> Without knowing your full setup, it's hard to help further.
>
> Jez
>
> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>> Still problems. Here some more detailed examples:
>>
>> *EXAMPLE 1:*
>>
>>             *EBI5-220**( CLIENT)**
>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>             reply from node <GSS02B IP> gss02b*
>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
>>             (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
>>             GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>             from cluster GSS.ebi.ac.uk due to expel msg from
>>             <EBI5-220 IP> (ebi5-220)
>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>             broke. Probing cluster GSS.ebi.ac.uk
>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>             quorum nodes during cluster probe.
>>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>             GSS.ebi.ac.uk. Unmounting file systems.
>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>             invoked.  File system: gpfs1  Reason: SGPanic
>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>             gss02a <c1p687>
>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>             gss02a <c1p687>
>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>             gss02b <c1p686>
>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>             gss03b <c1p685>
>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>             gss03a <c1p684>
>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>             gss01b <c1p683>
>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>             gss01a <c1p1>
>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>             gss02b <c1p686>
>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>             gss03b <c1p685>
>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>             gss03a <c1p684>
>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>             gss01b <c1p683>
>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>             gss01a <c1p1>
>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
>>             GSS.ebi.ac.uk) is now the Group Leader.
>>
>>             *GSS02B ( NSD SERVER)*
>>             ...
>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>             ...
>>
>>             *GSS02a ( NSD SERVER)*
>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>             request from <EBI5-220 IP> (ebi5-220 in
>>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>             (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>
>>
>> ===============================================
>> *EXAMPLE 2*:
>>
>>             *EBI5-038*
>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>             cluster GSS.ebi.ac.uk*
>>             Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection reset by peer). Attempting
>>             reconnect.
>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>             peer). Attempting reconnect.
>>             ...
>>             LOT MORE RESETS BY PEER
>>             ...
>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>             peer). Attempting reconnect.
>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>             gss02a <c1n2>
>>             Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection failed because destination
>>             is still processing previous node failure)
>>             Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
>>             IP> gss02a <c1n2>
>>             Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection failed because destination
>>             is still processing previous node failure)
>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>             quorum nodes during cluster probe.
>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
>>             GSS.ebi.ac.uk. Unmounting file systems.*
>>
>>             *GSS02a*
>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>             (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>             because of an expired lease.* Pings sent: 60. Replies
>>             received: 60.
>>
>>
>>
>>
>> In example 1 seems that an NSD was not repliyng to the client, but 
>> the servers seems working fine.. how can i trace better ( to solve) 
>> the problem?
>>
>> In example 2 it seems to me that for some reason the manager are not 
>> renewing the lease in time. when this happens , its not a single client.
>> Loads of them fail to get the lease renewed. Why this is happening? 
>> how can i trace to the source of the problem?
>>
>>
>>
>> Thanks in advance for any tips.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/a0a8b3b7/attachment-0001.htm>

From bbanister at jumptrading.com  Thu Aug 21 13:48:38 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 21 Aug 2014 12:48:38 +0000
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>,<53F5B62F.1060305@ebi.ac.uk>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>

As I understand GPFS distributed locking semantics, GPFS will not allow one node to hold a write lock for a file indefinitely.  Once Client B opens the file for writing it would have contacted the File System Manager to obtain the lock.  The FS manager would have told Client B that Client A has the lock and that Client B would have to contact Client A and revoke the write lock token.  If Client A does not respond to Client B's request to revoke the write token, then Client B will ask that Client A be expelled from the cluster for NOT adhering to the proper protocol for write lock contention.


[cid:2fb2253c-3ffb-4ac6-88a8-d019b1a24f66]


Have you checked the communication path between the two clients at this point?

I could not follow the logs that you provided.  You should definitely look at the exact sequence of log events on the two clients and the file system manager (as reported by mmlsmgr).

Hope that helps,
-Bryan

________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk]
Sent: Thursday, August 21, 2014 4:04 AM
To: chair at gpfsug.org; gpfsug main discussion list
Subject: Re: [gpfsug-discuss] gpfs client expels

Thanks for the feedback, but we managed to find a scenario that excludes network problems.

we have a file called input_file of nearly 100GB:

if from client A we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the same client) client B :

cat input_file >> output_file


 ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".


I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
Hi there,

  I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.

In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
Still problems. Here some more detailed examples:

EXAMPLE 1:
EBI5-220 ( CLIENT)
Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.

GSS02B ( NSD SERVER)
...
Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
...

GSS02a ( NSD SERVER)
Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
EXAMPLE 2:

EBI5-038
Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
...
LOT MORE RESETS BY PEER
...
Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.

GSS02a
Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem?

In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/416902ff/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GPFS_Token_Protocol.png
Type: image/png
Size: 249179 bytes
Desc: GPFS_Token_Protocol.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/416902ff/attachment-0001.png>

From jbernard at jumptrading.com  Thu Aug 21 13:52:05 2014
From: jbernard at jumptrading.com (Jon Bernard)
Date: Thu, 21 Aug 2014 12:52:05 +0000
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>, <53F5B62F.1060305@ebi.ac.uk>,
	<21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <D3287279-9A7C-4645-B41F-E2B36DCDBA85@jumptrading.com>

Where is that from?

On Aug 21, 2014, at 7:49, "Bryan Banister" <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:

As I understand GPFS distributed locking semantics, GPFS will not allow one node to hold a write lock for a file indefinitely.  Once Client B opens the file for writing it would have contacted the File System Manager to obtain the lock.  The FS manager would have told Client B that Client A has the lock and that Client B would have to contact Client A and revoke the write lock token.  If Client A does not respond to Client B's request to revoke the write token, then Client B will ask that Client A be expelled from the cluster for NOT adhering to the proper protocol for write lock contention.


<GPFS_Token_Protocol.png>


Have you checked the communication path between the two clients at this point?

I could not follow the logs that you provided.  You should definitely look at the exact sequence of log events on the two clients and the file system manager (as reported by mmlsmgr).

Hope that helps,
-Bryan

________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk<mailto:sdinardo at ebi.ac.uk>]
Sent: Thursday, August 21, 2014 4:04 AM
To: chair at gpfsug.org<mailto:chair at gpfsug.org>; gpfsug main discussion list
Subject: Re: [gpfsug-discuss] gpfs client expels

Thanks for the feedback, but we managed to find a scenario that excludes network problems.

we have a file called input_file of nearly 100GB:

if from client A we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the same client) client B :

cat input_file >> output_file


 ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".


I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
Hi there,

  I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.

In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
Still problems. Here some more detailed examples:

EXAMPLE 1:
EBI5-220 ( CLIENT)
Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220 IP> (ebi5-220)
Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Unmounting file systems.
Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) is now the Group Leader.

GSS02B ( NSD SERVER)
...
Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
...

GSS02a ( NSD SERVER)
Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>)
Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
EXAMPLE 2:

EBI5-038
Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Attempting to reacquire lease.
Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
...
LOT MORE RESETS BY PEER
...
Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Unmounting file systems.

GSS02a
Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem?

In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/ede5a53a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GPFS_Token_Protocol.png
Type: image/png
Size: 249179 bytes
Desc: GPFS_Token_Protocol.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/ede5a53a/attachment-0001.png>

From viccornell at gmail.com  Thu Aug 21 14:03:14 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Thu, 21 Aug 2014 14:03:14 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org> <53F5B62F.1060305@ebi.ac.uk>
Message-ID: <9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>

Hi Salvatore,

Are you using ethernet or infiniband as the GPFS interconnect to your clients?

If 10/40GbE - do you have a separate admin network?

I have seen behaviour similar to this where the storage traffic causes congestion and the "admin" traffic gets lost or delayed causing expels.

Vic


On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:

> Thanks for the feedback, but we managed to find a scenario that excludes network problems.
> 
> we have a file called input_file of nearly 100GB:
> 
> if from client A we do:
> 
> cat input_file >> output_file
> 
> it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...
> 
> 
> if now we do the same from another client ( or just another shell on the same client) client B :
> 
> cat input_file >> output_file
> 
> 
>  ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.
> 
> 
> Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease. 
> Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.
> 
> 
> 
> As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
> Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".
> 
> 
> I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..
> 
> Regards,
> Salvatore
> 
> 
> 
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>> 
>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.
>> 
>> In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.
>> 
>> You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.
>> 
>> Without knowing your full setup, it's hard to help further.
>> 
>> Jez
>> 
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>> 
>>> EXAMPLE 1:
>>> EBI5-220 ( CLIENT)
>>> Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.
>>> 
>>> GSS02B ( NSD SERVER)
>>> ...
>>> Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>> ...
>>> 
>>> GSS02a ( NSD SERVER)
>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>> 
>>> 
>>> ===============================================
>>> EXAMPLE 2:
>>> 
>>> EBI5-038
>>> Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
>>> Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
>>> Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
>>> ...
>>> LOT MORE RESETS BY PEER
>>> ...
>>> Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
>>> Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>> Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
>>> Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
>>> Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>> 
>>> GSS02a
>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.
>>> 
>>> 
>>> 
>>> In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem? 
>>> 
>>> In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client. 
>>> Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?
>>> 
>>> 
>>> 
>>> Thanks in advance for any tips.
>>> 
>>> Regards,
>>> Salvatore
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/8ebcc5bd/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Thu Aug 21 14:04:59 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 14:04:59 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>,
	<53F5B62F.1060305@ebi.ac.uk>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <53F5EE7B.2080306@ebi.ac.uk>

Thanks for the info...  it helps a bit understanding whats going on, but 
i think you missed the part that Node A and Node B could also be the 
same machine.

If for instance i ran 2 cp on the same machine, hence Client B cannot 
have problems contacting Client A since they are the same machine.....

BTW i did the same also using 2 clients and the result its the same. 
Nonetheless your description is made me understand a bit better what's 
going on


Regards,
Salvatore

On 21/08/14 13:48, Bryan Banister wrote:
> As I understand GPFS distributed locking semantics, GPFS will not 
> allow one node to hold a write lock for a file indefinitely.  Once 
> Client B opens the file for writing it would have contacted the File 
> System Manager to obtain the lock.  The FS manager would have told 
> Client B that Client A has the lock and that Client B would have to 
> contact Client A and revoke the write lock token.  If Client A does 
> not respond to Client B's request to revoke the write token, then 
> Client B will ask that Client A be expelled from the cluster for NOT 
> adhering to the proper protocol for write lock contention.
>
>
>
> Have you checked the communication path between the two clients at 
> this point?
>
> I could not follow the logs that you provided.  You should definitely 
> look at the exact sequence of log events on the two clients and the 
> file system manager (as reported by mmlsmgr).
>
> Hope that helps,
> -Bryan
>
> ------------------------------------------------------------------------
> *From:* gpfsug-discuss-bounces at gpfsug.org 
> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo 
> [sdinardo at ebi.ac.uk]
> *Sent:* Thursday, August 21, 2014 4:04 AM
> *To:* chair at gpfsug.org; gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] gpfs client expels
>
> Thanks for the feedback, but we managed to find a scenario that 
> excludes network problems.
>
> we have a file called */input_file/* of nearly 100GB:
>
> if from *client A* we do:
>
> cat input_file >> output_file
>
> it start copying.. and we see waiter goeg a bit up,secs but then they 
> flushes back to 0, so we xcan say that the copy proceed well...
>
>
> if now we do the same from another client ( or just another shell on 
> the same client) *client B* :
>
> cat input_file >> output_file
>
>
>  ( in other words we are trying to write to the same destination) all 
> the waiters gets up until one node get expelled.
>
>
> Now, while its understandable that the destination file is locked for 
> one of the "cat", so have to wait ( and since the file is BIG , have 
> to wait for a while), its not understandable why it stop the renewal 
> lease.
> Why its doen't return just a timeout error on the copy instead to 
> expel the node? We can reproduce this every time, and since our users 
> to operations like this on files over 100GB each you can imagine the 
> result.
>
>
>
> As you can imagine even if its a bit silly to write at the same time 
> to the same destination, its also quite common if we want to dump to a 
> log file logs and for some reason one of the writers, write for a lot 
> of time keeping the file locked.
> Our expels are not due to network congestion, but because a write 
> attempts have to wait another one. What i really dont understand is 
> why to take a so expreme mesure to expell jest because a process is 
> waiteing "to too much time".
>
>
> I have ticket opened to IBM for this and the issue is under 
> investigation, but no luck so far..
>
> Regards,
> Salvatore
>
>
>
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>>
>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>> better term now GSS is out) and seen ping 'working', but alongside 
>> ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping 
>> - and rightly so.
>>
>> In my experience this has _always_ been a network issue of one sort 
>> of another.  If the network is experiencing issues, nodes will be 
>> ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>> seen that only twice in 10 years over many versions of GPFS.
>>
>> You need to follow the logs through from each machine in time order 
>> to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support, 
>> directly or via your OEM and collect and supply a snap and traces as 
>> required by support.
>>
>> Without knowing your full setup, it's hard to help further.
>>
>> Jez
>>
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>>
>>> *EXAMPLE 1:*
>>>
>>>             *EBI5-220**( CLIENT)**
>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>             reply from node <GSS02B IP> gss02b*
>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>             IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP>
>>>             (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>>             from cluster GSS.ebi.ac.uk due to expel msg from
>>>             <EBI5-220 IP> (ebi5-220)
>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>>             broke. Probing cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>>             GSS.ebi.ac.uk. Unmounting file systems.
>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>             in GSS.ebi.ac.uk) is now the Group Leader.
>>>
>>>             *GSS02B ( NSD SERVER)*
>>>             ...
>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>             ...
>>>
>>>             *GSS02a ( NSD SERVER)*
>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>>             request from <EBI5-220 IP> (ebi5-220 in
>>>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>>             (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>
>>>
>>> ===============================================
>>> *EXAMPLE 2*:
>>>
>>>             *EBI5-038*
>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>>             cluster GSS.ebi.ac.uk*
>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>             Attempting reconnect.
>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             ...
>>>             LOT MORE RESETS BY PEER
>>>             ...
>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1n2>
>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>             <GSS02A IP> gss02a <c1n2>
>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>             cluster GSS.ebi.ac.uk. Unmounting file systems.*
>>>
>>>             *GSS02a*
>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>>             because of an expired lease.* Pings sent: 60. Replies
>>>             received: 60.
>>>
>>>
>>>
>>>
>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>> the servers seems working fine.. how can i trace better ( to solve) 
>>> the problem?
>>>
>>> In example 2 it seems to me that for some reason the manager are not 
>>> renewing the lease in time. when this happens , its not a single 
>>> client.
>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>> how can i trace to the source of the problem?
>>>
>>>
>>>
>>> Thanks in advance for any tips.
>>>
>>> Regards,
>>> Salvatore
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 249179 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment-0001.png>

From sdinardo at ebi.ac.uk  Thu Aug 21 14:18:19 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 14:18:19 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>
	<53F5B62F.1060305@ebi.ac.uk>
	<9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
Message-ID: <53F5F19B.1010603@ebi.ac.uk>

This is an interesting point!

We use ethernet ( 10g links on the clients) but we dont have a separate 
network for the admin network.

Could you explain this a bit further, because the clients and the 
servers we have are on different subnet so the packet are routed.. I 
don't see a practical way to separate them. The clients are blades in a 
chassis so even if i create 2 interfaces, they will physically use the 
came "cable" to go to the first switch. even the clients ( 600 clients) 
have different subsets.

I will forward this consideration to our network admin , so see if we 
can work on a dedicated network.

thanks for your tip.

Regards,
Salvatore


On 21/08/14 14:03, Vic Cornell wrote:
> Hi Salvatore,
>
> Are you using ethernet or infiniband as the GPFS interconnect to your 
> clients?
>
> If 10/40GbE - do you have a separate admin network?
>
> I have seen behaviour similar to this where the storage traffic causes 
> congestion and the "admin" traffic gets lost or delayed causing expels.
>
> Vic
>
>
>
> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk 
> <mailto:sdinardo at ebi.ac.uk>> wrote:
>
>> Thanks for the feedback, but we managed to find a scenario that 
>> excludes network problems.
>>
>> we have a file called */input_file/* of nearly 100GB:
>>
>> if from *client A* we do:
>>
>> cat input_file >> output_file
>>
>> it start copying.. and we see waiter goeg a bit up,secs but then they 
>> flushes back to 0, so we xcan say that the copy proceed well...
>>
>>
>> if now we do the same from another client ( or just another shell on 
>> the same client) *client B* :
>>
>> cat input_file >> output_file
>>
>>
>>  ( in other words we are trying to write to the same destination) all 
>> the waiters gets up until one node get expelled.
>>
>>
>> Now, while its understandable that the destination file is locked for 
>> one of the "cat", so have to wait ( and since the file is BIG , have 
>> to wait for a while), its not understandable why it stop the renewal 
>> lease.
>> Why its doen't return just a timeout error on the copy instead to 
>> expel the node? We can reproduce this every time, and since our users 
>> to operations like this on files over 100GB each you can imagine the 
>> result.
>>
>>
>>
>> As you can imagine even if its a bit silly to write at the same time 
>> to the same destination, its also quite common if we want to dump to 
>> a log file logs and for some reason one of the writers, write for a 
>> lot of time keeping the file locked.
>> Our expels are not due to network congestion, but because a write 
>> attempts have to wait another one. What i really dont understand is 
>> why to take a so expreme mesure to expell jest because a process is 
>> waiteing "to too much time".
>>
>>
>> I have ticket opened to IBM for this and the issue is under 
>> investigation, but no luck so far..
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>> Hi there,
>>>
>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>>> better term now GSS is out) and seen ping 'working', but alongside 
>>> ejections from the cluster.
>>> The GPFS internode 'ping' is somewhat more circumspect than unix 
>>> ping - and rightly so.
>>>
>>> In my experience this has _always_ been a network issue of one sort 
>>> of another.  If the network is experiencing issues, nodes will be 
>>> ejected.
>>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>>> seen that only twice in 10 years over many versions of GPFS.
>>>
>>> You need to follow the logs through from each machine in time order 
>>> to determine who could not see who and in what order.
>>> Your best way forward is to log a SEV2 case with IBM support, 
>>> directly or via your OEM and collect and supply a snap and traces as 
>>> required by support.
>>>
>>> Without knowing your full setup, it's hard to help further.
>>>
>>> Jez
>>>
>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>> Still problems. Here some more detailed examples:
>>>>
>>>> *EXAMPLE 1:*
>>>>
>>>>             *EBI5-220**( CLIENT)**
>>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>>             reply from node <GSS02B IP> gss02b*
>>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>>             IP> (gss02a in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) to
>>>>             expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:04.982 2014: This node will be
>>>>             expelled from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220
>>>>             IP> (ebi5-220)
>>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager
>>>>             connection broke. Probing cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.
>>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>>             in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) is now the
>>>>             Group Leader.
>>>>
>>>>             *GSS02B ( NSD SERVER)*
>>>>             ...
>>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>>             ...
>>>>
>>>>             *GSS02a ( NSD SERVER)*
>>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP>
>>>>             (gss02b) request from <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>).
>>>>             Expelling: <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>)
>>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>>
>>>>
>>>> ===============================================
>>>> *EXAMPLE 2*:
>>>>
>>>>             *EBI5-038*
>>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period
>>>>             expired in cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>. Attempting to reacquire lease.*
>>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue.
>>>>             Probing cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>*
>>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>>             Attempting reconnect.
>>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             ...
>>>>             LOT MORE RESETS BY PEER
>>>>             ...
>>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1n2>
>>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>>             <GSS02A IP> gss02a <c1n2>
>>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.*
>>>>
>>>>             *GSS02a*
>>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk
>>>>             <http://ebi-cluster.ebi.ac.uk>) *is being expelled
>>>>             because of an expired lease.* Pings sent: 60. Replies
>>>>             received: 60.
>>>>
>>>>
>>>>
>>>>
>>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>>> the servers seems working fine.. how can i trace better ( to solve) 
>>>> the problem?
>>>>
>>>> In example 2 it seems to me that for some reason the manager are 
>>>> not renewing the lease in time. when this happens , its not a 
>>>> single client.
>>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>>> how can i trace to the source of the problem?
>>>>
>>>>
>>>>
>>>> Thanks in advance for any tips.
>>>>
>>>> Regards,
>>>> Salvatore
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/bf1a6c40/attachment-0001.htm>

From service at metamodul.com  Thu Aug 21 14:19:33 2014
From: service at metamodul.com (service at metamodul.com)
Date: Thu, 21 Aug 2014 15:19:33 +0200 (CEST)
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org> <53F5B62F.1060305@ebi.ac.uk>
Message-ID: <1481989063.92260.1408627173332.open-xchange@oxbaltgw09.schlund.de>

> Now, while its understandable that the destination file is locked for one of
> the "cat", so have to wait

If GPFS is posix compatible i do not understand why a cat should block the other
cat completly meanings on a standard FS you can "cat" from many source to the
same target. Of course the result is not predictable.

>From this point of view i would expect that both "cat" would start writing
immediately thus i would expect a GPFS bug.

All imho.
Hajo

Note: You might test which the input_file in a different directory and i would
test the behaviour if the output_file is on a local FS like /tmp.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/f02dd903/attachment-0001.htm>

From viccornell at gmail.com  Thu Aug 21 14:22:22 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Thu, 21 Aug 2014 14:22:22 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5F19B.1010603@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>
	<53F5B62F.1060305@ebi.ac.uk>
	<9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
	<53F5F19B.1010603@ebi.ac.uk>
Message-ID: <0F03996A-2008-4076-9A2B-B4B2BB89E959@gmail.com>

For my system I always use a dedicated admin network - as described in the gpfs manuals - for a gpfs cluster on 10/40GbE where the system will be heavily loaded.

The difference in the stability of the system is very noticeable.

Not sure how/if this would work on GSS - IBM ought to know :-)

Vic


On 21 Aug 2014, at 14:18, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:

> This is an interesting point!
> 
> We use ethernet ( 10g links on the clients) but we dont have a separate network for the admin network. 
> 
> Could you explain this a bit further, because the clients and the servers we have are on different subnet so the packet are routed.. I don't see a practical way to separate them. The clients are blades in a chassis so even if i create 2 interfaces, they will physically use the came "cable" to go to the first switch. even the clients ( 600 clients) have different subsets.
> 
> I will forward this consideration to our network admin , so see if we can work on a dedicated network.
> 
> thanks for your tip.
> 
> Regards,
> Salvatore
> 
> 
> 
> 
> On 21/08/14 14:03, Vic Cornell wrote:
>> Hi Salvatore,
>> 
>> Are you using ethernet or infiniband as the GPFS interconnect to your clients?
>> 
>> If 10/40GbE - do you have a separate admin network?
>> 
>> I have seen behaviour similar to this where the storage traffic causes congestion and the "admin" traffic gets lost or delayed causing expels.
>> 
>> Vic
>> 
>> 
>> 
>> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:
>> 
>>> Thanks for the feedback, but we managed to find a scenario that excludes network problems.
>>> 
>>> we have a file called input_file of nearly 100GB:
>>> 
>>> if from client A we do:
>>> 
>>> cat input_file >> output_file
>>> 
>>> it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...
>>> 
>>> 
>>> if now we do the same from another client ( or just another shell on the same client) client B :
>>> 
>>> cat input_file >> output_file
>>> 
>>> 
>>>  ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.
>>> 
>>> 
>>> Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease. 
>>> Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.
>>> 
>>> 
>>> 
>>> As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
>>> Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".
>>> 
>>> 
>>> I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..
>>> 
>>> Regards,
>>> Salvatore
>>> 
>>> 
>>> 
>>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>>> Hi there,
>>>> 
>>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
>>>> The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.
>>>> 
>>>> In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
>>>> Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.
>>>> 
>>>> You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
>>>> Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.
>>>> 
>>>> Without knowing your full setup, it's hard to help further.
>>>> 
>>>> Jez
>>>> 
>>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>>> Still problems. Here some more detailed examples:
>>>>> 
>>>>> EXAMPLE 1:
>>>>> EBI5-220 ( CLIENT)
>>>>> Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
>>>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
>>>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
>>>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
>>>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.
>>>>> 
>>>>> GSS02B ( NSD SERVER)
>>>>> ...
>>>>> Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>>>> ...
>>>>> 
>>>>> GSS02a ( NSD SERVER)
>>>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>>>> 
>>>>> 
>>>>> ===============================================
>>>>> EXAMPLE 2:
>>>>> 
>>>>> EBI5-038
>>>>> Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
>>>>> Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
>>>>> ...
>>>>> LOT MORE RESETS BY PEER
>>>>> ...
>>>>> Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
>>>>> Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>>>> Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
>>>>> Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
>>>>> Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>>>> 
>>>>> GSS02a
>>>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.
>>>>> 
>>>>> 
>>>>> 
>>>>> In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem? 
>>>>> 
>>>>> In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client. 
>>>>> Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks in advance for any tips.
>>>>> 
>>>>> Regards,
>>>>> Salvatore
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at gpfsug.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at gpfsug.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/a46c2c76/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Fri Aug 22 10:37:42 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 22 Aug 2014 10:37:42 +0100
Subject: [gpfsug-discuss] gpfs client expels, fs hangind and waiters
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F70F66.2010405@ebi.ac.uk>

Hello everyone,

Just to let you know, we found the cause of our problems.

We discovered that not all of the recommend kernel setting was 
configured on the clients ( on server was everything ok, but the clients 
had some setting  missing ), and
IBM support pointed to this document that describes perfectly our issues 
and the fix wich suggest to raise some parameters even higher than the 
standard "best practice" :


http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5091222


Thanks to everyone for the replies.


Regards,
Salvatore


From ewahl at osc.edu  Mon Aug 25 19:55:08 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Mon, 25 Aug 2014 18:55:08 +0000
Subject: [gpfsug-discuss] CNFS using NFS over RDMA?
Message-ID: <C59E5201836F7147BAD35189FFBB35D101164D42DF@USOAPP09V04P.si.lan>

Anyone out there doing CNFS with NFS over RDMA?  Is this even possible?

We currently have been delivering some CNFS services using TCP over IB, but that layer tends to have a large number of bugs all the time.  Like to take a look at moving back down to verbs...

Ed Wahl
OSC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140825/bd329ccd/attachment-0001.htm>

From zander at ebi.ac.uk  Fri Aug  1 14:44:49 2014
From: zander at ebi.ac.uk (Zander Mears)
Date: Fri, 01 Aug 2014 14:44:49 +0100
Subject: [gpfsug-discuss] Hello!
In-Reply-To: <53D981EF.3020000@gpfsug.org>
References: <53D8C897.9000902@ebi.ac.uk> <53D981EF.3020000@gpfsug.org>
Message-ID: <53DB99D1.8050304@ebi.ac.uk>

Hi Jez

We're just monitoring the standard OS stuff, some interface errors, 
throughput, number of network and gpfs connections due to previous 
issues. We don't really know as yet what is good to monitor GPFS wise.

cheers

Zander

On 31/07/2014 00:38, Jez Tucker (Chair) wrote:
> Hi Zander,
>
>    We have a git repository.  Would you be interested in adding any
> Zabbix custom metrics gathering to GPFS to it?
>
> https://github.com/gpfsug/gpfsug-tools
>
> Best,
>
> Jez


From sfadden at us.ibm.com  Tue Aug  5 18:55:20 2014
From: sfadden at us.ibm.com (Scott Fadden)
Date: Tue, 5 Aug 2014 10:55:20 -0700
Subject: [gpfsug-discuss] GPFS and Lustre on same node
Message-ID: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>


Is anyone running GPFS and Lustre on the same nodes. I have seen it work, I
have heard people are doing it, I am looking for some confirmation.

Thanks

Scott Fadden
GPFS Technical Marketing
Phone: (503) 880-5833
sfadden at us.ibm.com
http://www.ibm.com/systems/gpfs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140805/a2400b55/attachment-0002.htm>

From u.sibiller at science-computing.de  Wed Aug  6 08:46:31 2014
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Wed, 06 Aug 2014 09:46:31 +0200
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
Message-ID: <53E1DD57.90103@science-computing.de>

Am 05.08.2014 19:55, schrieb Scott Fadden:
> Is anyone running GPFS and Lustre on the same nodes. I have seen it work, I have heard people are
> doing it, I am looking for some confirmation.

I have some nodes running lustre 2.1.6 or 2.5.58 and gpfs 3.5.0.17 on RHEL5.8 and RHEL6.5. None of 
them are servers.

Kind regards,

Ulrich Sibiller

-- 
______________________________________creating IT solutions
Dipl.-Inf. Ulrich Sibiller           science + computing ag
System Administration                    Hagellocher Weg 73
mail nfz at science-computing.de      72070 Tuebingen, Germany
hotline +49 7071 9457 674   http://www.science-computing.de
-- 
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs, Dr. Arno Steitz
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From frederik.ferner at diamond.ac.uk  Wed Aug  6 10:19:35 2014
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Wed, 6 Aug 2014 10:19:35 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
Message-ID: <53E1F327.1000605@diamond.ac.uk>

On 05/08/14 18:55, Scott Fadden wrote:
> Is anyone running GPFS and Lustre on the same nodes. I have seen it
> work, I have heard people are doing it, I am looking for some confirmation.

Most of our compute cluster nodes are clients for Lustre and GPFS at the 
same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on servers 
(GPFS NSD server or Lustre OSS/MDS servers).

HTH,
Frederik

-- 
Frederik Ferner
Senior Computer Systems Administrator   phone: +44 1235 77 8624
Diamond Light Source Ltd.               mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
 

From sdinardo at ebi.ac.uk  Wed Aug  6 10:57:44 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 06 Aug 2014 10:57:44 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E1F327.1000605@diamond.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
	<53E1F327.1000605@diamond.ac.uk>
Message-ID: <53E1FC18.6080707@ebi.ac.uk>

Sorry for this little ot, but recetly i'm looking to Lustre to 
understand how it is comparable to GPFS in terms of performance, 
reliability and easy to use.
Could anyone share their experience ?

My company just recently got a first GPFS system , based on IBM GSS, but 
while its good performance wise, there are few unresolved problems and 
the IBM support is almost unexistent, so I'm starting to wonder if its 
work to look somewhere else  eventual future purchases.


Salvatore

On 06/08/14 10:19, Frederik Ferner wrote:
> On 05/08/14 18:55, Scott Fadden wrote:
>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>> work, I have heard people are doing it, I am looking for some 
>> confirmation.
>
> Most of our compute cluster nodes are clients for Lustre and GPFS at 
> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
> servers (GPFS NSD server or Lustre OSS/MDS servers).
>
> HTH,
> Frederik
>


From chair at gpfsug.org  Wed Aug  6 11:19:24 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Wed, 06 Aug 2014 11:19:24 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E1FC18.6080707@ebi.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>
Message-ID: <53E2012C.9040402@gpfsug.org>

"IBM support is almost unexistent"

I don't find that at all.
Do you log directly via ESC or via your OEM/integrator or are you only 
referring to GSS support rather than pure GPFS?

If you are having response issues, your IBM rep (or a few folks on here) 
can accelerate issues for you.

Jez


On 06/08/14 10:57, Salvatore Di Nardo wrote:
> Sorry for this little ot, but recetly i'm looking to Lustre to 
> understand how it is comparable to GPFS in terms of performance, 
> reliability and easy to use.
> Could anyone share their experience ?
>
> My company just recently got a first GPFS system , based on IBM GSS, 
> but while its good performance wise, there are few unresolved problems 
> and the IBM support is almost unexistent, so I'm starting to wonder if 
> its work to look somewhere else  eventual future purchases.
>
>
> Salvatore
>
> On 06/08/14 10:19, Frederik Ferner wrote:
>> On 05/08/14 18:55, Scott Fadden wrote:
>>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>>> work, I have heard people are doing it, I am looking for some 
>>> confirmation.
>>
>> Most of our compute cluster nodes are clients for Lustre and GPFS at 
>> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
>> servers (GPFS NSD server or Lustre OSS/MDS servers).
>>
>> HTH,
>> Frederik
>>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From service at metamodul.com  Wed Aug  6 14:26:47 2014
From: service at metamodul.com (service at metamodul.com)
Date: Wed, 6 Aug 2014 15:26:47 +0200 (CEST)
Subject: [gpfsug-discuss] Hi , i am new to this list
Message-ID: <1366482624.222989.1407331607965.open-xchange@oxbaltgw55.schlund.de>

Hi @ALL
i am Hajo Ehlers , an AIX and GPFS specialist ( Unix System Engineer ). You find
me at the IBM GPFS Forum and sometimes at news:c.u.a  and I am addicted to
cluster filesystems

My latest idee is an SAP-HANA light system ( DBMS on an in-memory cluster posix
FS ) which could be extended to a "reinvented" Cluster based AS/400 ^_^
I wrote also a small script to do a sequential backup of GPFS filesystems since
i got never used to mmbackup - i named it "pdsmc" for parallel dsmc".


Cheers
Hajo

BTW: Please let me know - service (at) metamodul (dot) com - In case somebody is
looking for a GPFS specialist.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140806/3c01d53a/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Fri Aug  8 10:53:36 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 08 Aug 2014 10:53:36 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E2012C.9040402@gpfsug.org>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>	<53E1FC18.6080707@ebi.ac.uk>
	<53E2012C.9040402@gpfsug.org>
Message-ID: <53E49E20.1090905@ebi.ac.uk>

Well, i didn't wanted to start a rant against IBM, and I'm referring 
specifically to GSS.

Since GSS its an appliance, we have to refer to GSS support for both 
hardware and software issues. Hardware support in total crap. It took 1 
mounth of chasing and shouting to get a drawer replacement that was 
causing some issues. Meanwhile 10 disks in that drawer got faulty. 
Finally we got the drawer replace but the disks are still faulty. Now 
its 3 days i'm triing to get them fixed or replaced ( its not clear if 
they disks are broken of they was just marked to be replaced because of 
the drawer). Right now i dont have any answer about how to put them 
online ( mmchcarrier don't work because it recognize that the disk where 
not replaced)

There are also few other cases ( gpfs related)  open that are still not 
answered. I have no experience with direct GPFS support, but if i open a 
case to GSS for a GPFS problem, the cases seems never get an answer.

The only reason that GSS is working its because _*I*_**installed it 
spending few months studying gpfs. So now I'm wondering if its worth at 
all rely in future on the whole appliance concept.

I'm wondering if in future its better just purchase the hardware and 
install GPFS by our own, or in alternatively even try Lustre.


Now, skipping all this GSS rant, which have nothing to do with the file 
system anyway  and  going back to my question:

Could someone point the main differences between GPFS and Lustre?

I found some documentation about Lustre and i'm going to have a look, 
but oddly enough have not found any practical comparison between them.


On 06/08/14 11:19, Jez Tucker (Chair) wrote:
> "IBM support is almost unexistent"
>
> I don't find that at all.
> Do you log directly via ESC or via your OEM/integrator or are you only 
> referring to GSS support rather than pure GPFS?
>
> If you are having response issues, your IBM rep (or a few folks on 
> here) can accelerate issues for you.
>
> Jez
>
>
> On 06/08/14 10:57, Salvatore Di Nardo wrote:
>> Sorry for this little ot, but recetly i'm looking to Lustre to 
>> understand how it is comparable to GPFS in terms of performance, 
>> reliability and easy to use.
>> Could anyone share their experience ?
>>
>> My company just recently got a first GPFS system , based on IBM GSS, 
>> but while its good performance wise, there are few unresolved 
>> problems and the IBM support is almost unexistent, so I'm starting to 
>> wonder if its work to look somewhere else eventual future purchases.
>>
>>
>> Salvatore
>>
>> On 06/08/14 10:19, Frederik Ferner wrote:
>>> On 05/08/14 18:55, Scott Fadden wrote:
>>>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>>>> work, I have heard people are doing it, I am looking for some 
>>>> confirmation.
>>>
>>> Most of our compute cluster nodes are clients for Lustre and GPFS at 
>>> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
>>> servers (GPFS NSD server or Lustre OSS/MDS servers).
>>>
>>> HTH,
>>> Frederik
>>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/04e3e4ae/attachment-0002.htm>

From jpro at bas.ac.uk  Fri Aug  8 12:40:00 2014
From: jpro at bas.ac.uk (Jeremy Robst)
Date: Fri, 8 Aug 2014 12:40:00 +0100 (BST)
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E49E20.1090905@ebi.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
	<53E1F327.1000605@diamond.ac.uk> <53E1FC18.6080707@ebi.ac.uk>
	<53E2012C.9040402@gpfsug.org> <53E49E20.1090905@ebi.ac.uk>
Message-ID: <alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>

On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:

> Now, skipping all this GSS rant, which have nothing to do with the file
> system anyway? and? going back to my question:
> 
> Could someone point the main differences between GPFS and Lustre?

I'm looking at making the same decision here - to buy GPFS or to roll our 
own Lustre configuration. I'm in the process of setting up test systems, 
and so far the main difference seems to be in the that in GPFS each server 
sees the full filesystem, and so you can run other applications (e.g 
backup) on a GPFS server whereas the Luste OSS (object storage servers) 
see only a portion of the storage (the filesystem is striped across the 
OSSes), so you need a Lustre client to mount the full filesystem for 
things like backup.

However I have very little practical experience of either and would also 
be interested in any comments.

Thanks

Jeremy
-- 
jpro at bas.ac.uk | (work) 01223 221402 (fax) 01223 362616
Unix System Administrator - British Antarctic Survey
#include <disclaimer.std>

From keith at ocf.co.uk  Fri Aug  8 14:12:39 2014
From: keith at ocf.co.uk (Keith Vickers)
Date: Fri, 8 Aug 2014 14:12:39 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
Message-ID: <A42128435E851644B9B011BB824F6C816F56CAF8F0@MAIL.ocf.local>

http://www.pdsw.org/pdsw10/resources/posters/parallelNASFSs.pdf

Has a good direct apples to apples comparison between Lustre and GPFS. It's pretty much abstractable from the hardware used.

Keith Vickers
Business Development Manager
OCF plc
Mobile: 07974 397863


From sergi.more at bsc.es  Fri Aug  8 14:14:33 2014
From: sergi.more at bsc.es (=?ISO-8859-1?Q?Sergi_Mor=E9_Codina?=)
Date: Fri, 08 Aug 2014 15:14:33 +0200
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>
	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
Message-ID: <53E4CD39.7080808@bsc.es>

Hi all,

About main differences between GPFS and Lustre, here you have some bits 
from our experience:

-Reliability: GPFS its been proved to be more stable and reliable. Also 
offers more flexibility in terms of fail-over. It have no restriction in 
number of servers. As far as I know, an NSD can have as many secondary 
servers as you want (we are using 8).

-Metadata: In Lustre each file system is restricted to two servers. No 
restriction in GPFS.

-Updates: In GPFS you can update the whole storage cluster without 
stopping production, one server at a time.

-Server/Client role: As Jeremy said, in GPFS every server act as a 
client as well. Useful for administrative tasks.

-Troubleshooting: Problems with GPFS are easier to track down. Logs are 
more clear, and offers better tools than Lustre.

-Support: No problems at all with GPFS support. It is true that it could 
take time to go up within all support levels, but we always got a good 
solution. Quite different in terms of hardware. IBM support quality has 
drop a lot since about last year an a half. Really slow and tedious 
process to get replacements. Moreover, we keep receiving bad "certified 
reutilitzed parts" hardware, which slow the whole process even more.


These are the main differences I would stand out after some years of 
experience with both file systems, but do not take it as a fact.

PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI 
a couple of months ago, and has experience working with both file 
systems here at BSC.

Best Regards,
Sergi.


On 08/08/2014 01:40 PM, Jeremy Robst wrote:
> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>
>> Now, skipping all this GSS rant, which have nothing to do with the file
>> system anyway  and  going back to my question:
>>
>> Could someone point the main differences between GPFS and Lustre?
>
> I'm looking at making the same decision here - to buy GPFS or to roll
> our own Lustre configuration. I'm in the process of setting up test
> systems, and so far the main difference seems to be in the that in GPFS
> each server sees the full filesystem, and so you can run other
> applications (e.g backup) on a GPFS server whereas the Luste OSS (object
> storage servers) see only a portion of the storage (the filesystem is
> striped across the OSSes), so you need a Lustre client to mount the full
> filesystem for things like backup.
>
> However I have very little practical experience of either and would also
> be interested in any comments.
>
> Thanks
>
> Jeremy
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 

------------------------------------------------------------------------

      Sergi More Codina
      Barcelona Supercomputing Center
      Centro Nacional de Supercomputacion
      WWW: http://www.bsc.es      Tel: +34-93-405 42 27
      e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21

------------------------------------------------------------------------

WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3242 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/ccba0783/attachment-0002.bin>

From viccornell at gmail.com  Fri Aug  8 18:15:30 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Fri, 8 Aug 2014 18:15:30 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E4CD39.7080808@bsc.es>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>
	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
	<53E4CD39.7080808@bsc.es>
Message-ID: <4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>

Disclaimers - I work for DDN - we sell lustre and GPFS. I know GPFS much better than I know Lustre.

The biggest difference we find between GPFS and Lustre is that GPFS - can usually achieve 90% of the bandwidth available to a single client with a single thread.

Lustre needs multiple parallel streams to saturate - say an Infiniband connection.

Lustre is often faster than GPFS and often has superior metadata performance - particularly where lots of files are created in a single directory.

GPFS can support Windows - Lustre cannot. I think GPFS is better integrated and easier to deploy than Lustre - some people disagree with me.

Regards,

Vic


On 8 Aug 2014, at 14:14, Sergi Mor? Codina <sergi.more at bsc.es> wrote:

> Hi all,
> 
> About main differences between GPFS and Lustre, here you have some bits from our experience:
> 
> -Reliability: GPFS its been proved to be more stable and reliable. Also offers more flexibility in terms of fail-over. It have no restriction in number of servers. As far as I know, an NSD can have as many secondary servers as you want (we are using 8).
> 
> -Metadata: In Lustre each file system is restricted to two servers. No restriction in GPFS.
> 
> -Updates: In GPFS you can update the whole storage cluster without stopping production, one server at a time.
> 
> -Server/Client role: As Jeremy said, in GPFS every server act as a client as well. Useful for administrative tasks.
> 
> -Troubleshooting: Problems with GPFS are easier to track down. Logs are more clear, and offers better tools than Lustre.
> 
> -Support: No problems at all with GPFS support. It is true that it could take time to go up within all support levels, but we always got a good solution. Quite different in terms of hardware. IBM support quality has drop a lot since about last year an a half. Really slow and tedious process to get replacements. Moreover, we keep receiving bad "certified reutilitzed parts" hardware, which slow the whole process even more.
> 
> 
> These are the main differences I would stand out after some years of experience with both file systems, but do not take it as a fact.
> 
> PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI a couple of months ago, and has experience working with both file systems here at BSC.
> 
> Best Regards,
> Sergi.
> 
> 
> On 08/08/2014 01:40 PM, Jeremy Robst wrote:
>> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>> 
>>> Now, skipping all this GSS rant, which have nothing to do with the file
>>> system anyway  and  going back to my question:
>>> 
>>> Could someone point the main differences between GPFS and Lustre?
>> 
>> I'm looking at making the same decision here - to buy GPFS or to roll
>> our own Lustre configuration. I'm in the process of setting up test
>> systems, and so far the main difference seems to be in the that in GPFS
>> each server sees the full filesystem, and so you can run other
>> applications (e.g backup) on a GPFS server whereas the Luste OSS (object
>> storage servers) see only a portion of the storage (the filesystem is
>> striped across the OSSes), so you need a Lustre client to mount the full
>> filesystem for things like backup.
>> 
>> However I have very little practical experience of either and would also
>> be interested in any comments.
>> 
>> Thanks
>> 
>> Jeremy
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
> 
> 
> -- 
> 
> ------------------------------------------------------------------------
> 
>     Sergi More Codina
>     Barcelona Supercomputing Center
>     Centro Nacional de Supercomputacion
>     WWW: http://www.bsc.es      Tel: +34-93-405 42 27
>     e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21
> 
> ------------------------------------------------------------------------
> 
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
> 
> http://www.bsc.es/disclaimer.htm
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at us.ibm.com  Fri Aug  8 20:09:44 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Fri, 8 Aug 2014 12:09:44 -0700
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>	<53E4CD39.7080808@bsc.es>
	<4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>
Message-ID: <OFA962BCA7.ED55EAB6-ON88257D2E.00665F04-88257D2E.00694311@us.ibm.com>

Vic, Sergi,

you can not compare Lustre and GPFS without providing a clear usecase as 
otherwise you compare apple with oranges. 
the reason for this is quite simple, Lustre plays well in pretty much one 
usecase - HPC, GPFS on the other hand is used in many forms of deployments 
from Storage for Virtual Machines, HPC, Scale-Out NAS, Solutions in 
digital media, to hosting some of the biggest, most business critical 
Transactional database installations in the world. you look at 2 products 
with completely different usability spectrum, functions and features 
unless as said above you narrow it down to a very specific usecase with a 
lot of details.
even just HPC has a very large spectrum and not everybody is working in a 
single directory, which is the main scale point for Lustre compared to 
GPFS and the reason is obvious, if you have only 1 active metadata server 
(which is what 99% of all lustre systems run) some operations like single 
directory contention is simpler to make fast, but only up to the limit of 
your one node, but what happens when you need to go beyond that and only a 
real distributed architecture can support your workload ? 
for example look at most chip design workloads, which is a form of HPC, it 
is something thats extremely metadata and small file dominated, you talk 
about 100's of millions (in some cases even billions) of files, majority 
of them <4k, the rest larger files , majority of it with random access 
patterns that benefit from massive client side caching and distributed 
data coherency models supported by GPFS token manager infrastructure 
across 10's or 100's of metadata server and 1000's of compute nodes. 
you also need to look at the rich feature set GPFS provides, which not all 
may be important for some environments but are for others like Snapshot, 
Clones, Hierarchical Storage Management (ILM) , Local Cache acceleration 
(LROC), Global Namespace Wan Integration (AFM), Encryption, etc just to 
name a few. 

Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Vic Cornell <viccornell at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   08/08/2014 10:16 AM
Subject:        Re: [gpfsug-discuss] GPFS and Lustre on same node
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Disclaimers - I work for DDN - we sell lustre and GPFS. I know GPFS much 
better than I know Lustre.

The biggest difference we find between GPFS and Lustre is that GPFS - can 
usually achieve 90% of the bandwidth available to a single client with a 
single thread.

Lustre needs multiple parallel streams to saturate - say an Infiniband 
connection.

Lustre is often faster than GPFS and often has superior metadata 
performance - particularly where lots of files are created in a single 
directory.

GPFS can support Windows - Lustre cannot. I think GPFS is better 
integrated and easier to deploy than Lustre - some people disagree with 
me.

Regards,

Vic


On 8 Aug 2014, at 14:14, Sergi Mor? Codina <sergi.more at bsc.es> wrote:

> Hi all,
> 
> About main differences between GPFS and Lustre, here you have some bits 
from our experience:
> 
> -Reliability: GPFS its been proved to be more stable and reliable. Also 
offers more flexibility in terms of fail-over. It have no restriction in 
number of servers. As far as I know, an NSD can have as many secondary 
servers as you want (we are using 8).
> 
> -Metadata: In Lustre each file system is restricted to two servers. No 
restriction in GPFS.
> 
> -Updates: In GPFS you can update the whole storage cluster without 
stopping production, one server at a time.
> 
> -Server/Client role: As Jeremy said, in GPFS every server act as a 
client as well. Useful for administrative tasks.
> 
> -Troubleshooting: Problems with GPFS are easier to track down. Logs are 
more clear, and offers better tools than Lustre.
> 
> -Support: No problems at all with GPFS support. It is true that it could 
take time to go up within all support levels, but we always got a good 
solution. Quite different in terms of hardware. IBM support quality has 
drop a lot since about last year an a half. Really slow and tedious 
process to get replacements. Moreover, we keep receiving bad "certified 
reutilitzed parts" hardware, which slow the whole process even more.
> 
> 
> These are the main differences I would stand out after some years of 
experience with both file systems, but do not take it as a fact.
> 
> PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI 
a couple of months ago, and has experience working with both file systems 
here at BSC.
> 
> Best Regards,
> Sergi.
> 
> 
> On 08/08/2014 01:40 PM, Jeremy Robst wrote:
>> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>> 
>>> Now, skipping all this GSS rant, which have nothing to do with the 
file
>>> system anyway  and  going back to my question:
>>> 
>>> Could someone point the main differences between GPFS and Lustre?
>> 
>> I'm looking at making the same decision here - to buy GPFS or to roll
>> our own Lustre configuration. I'm in the process of setting up test
>> systems, and so far the main difference seems to be in the that in GPFS
>> each server sees the full filesystem, and so you can run other
>> applications (e.g backup) on a GPFS server whereas the Luste OSS 
(object
>> storage servers) see only a portion of the storage (the filesystem is
>> striped across the OSSes), so you need a Lustre client to mount the 
full
>> filesystem for things like backup.
>> 
>> However I have very little practical experience of either and would 
also
>> be interested in any comments.
>> 
>> Thanks
>> 
>> Jeremy
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
> 
> 
> -- 
> 
> ------------------------------------------------------------------------
> 
>     Sergi More Codina
>     Barcelona Supercomputing Center
>     Centro Nacional de Supercomputacion
>     WWW: http://www.bsc.es      Tel: +34-93-405 42 27
>     e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21
> 
> ------------------------------------------------------------------------
> 
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
> 
> http://www.bsc.es/disclaimer.htm
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/f4de4ccd/attachment-0002.htm>

From kraemerf at de.ibm.com  Sat Aug  9 15:03:02 2014
From: kraemerf at de.ibm.com (Frank Kraemer)
Date: Sat, 9 Aug 2014 16:03:02 +0200
Subject: [gpfsug-discuss] GPFS and Lustre
In-Reply-To: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>
References: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>
Message-ID: <OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>

Vic, Sergi,

from my point of view for real High-End workloads the complete I/O stack
needs to be fine tuned and well understood in order to provide a good
system to the users.

- Application(s)
+ I/O Lib(s)
+ MPI
+ Parallel Filesystem (e.g. GPFS)
+ Hardware (Networks, Servers, Disks, etc.)

One of the best solutions to bring your application very efficently to work
with a Parallel FS is Sionlib from FZ Juelich:

Sionlib is a scalable I/O library for the parallel access to task-local
files. The library not only supports writing and reading binary data to or
from from several thousands of processors into a single or a small number
of physical files but also provides for global open and close functions to
access SIONlib file in parallel. SIONlib provides different interfaces:
parallel access using MPI, OpenMp, or their combination and sequential
access for post-processing utilities.

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/SIONlib/_node.html
http://apps.fz-juelich.de/jsc/sionlib/html/sionlib_tutorial_2013.pdf

-frank-

P.S. Nice blog from Nils

https://www.ibm.com/developerworks/community/blogs/storageneers/entry/scale_out_backup_with_tsm_and_gss_performance_test_results?lang=en

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany


From ewahl at osc.edu  Mon Aug 11 14:55:48 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Mon, 11 Aug 2014 13:55:48 +0000
Subject: [gpfsug-discuss] GPFS and Lustre
In-Reply-To: <OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>
References: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>,
	<OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>
Message-ID: <C59E5201836F7147BAD35189FFBB35D101164CE365@USOAPP09V04P.si.lan>

In a similar vein, IBM has an application transparent "File Cache Library" as well.  I believe it IS licensed and the only requirement is that it is for use on IBM hardware only.  Saw some presentations that mention it in some BioSci talks @SC13 and the numbers for a couple of selected small read applications were awesome. 

I probably have the contact info for it around here somewhere.  In addition to the pdf/user manual.

Ed Wahl
Ohio Supercomputer Center

________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Frank Kraemer [kraemerf at de.ibm.com]
Sent: Saturday, August 09, 2014 10:03 AM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] GPFS and Lustre

Vic, Sergi,

from my point of view for real High-End workloads the complete I/O stack
needs to be fine tuned and well understood in order to provide a good
system to the users.

- Application(s)
+ I/O Lib(s)
+ MPI
+ Parallel Filesystem (e.g. GPFS)
+ Hardware (Networks, Servers, Disks, etc.)

One of the best solutions to bring your application very efficently to work
with a Parallel FS is Sionlib from FZ Juelich:

Sionlib is a scalable I/O library for the parallel access to task-local
files. The library not only supports writing and reading binary data to or
from from several thousands of processors into a single or a small number
of physical files but also provides for global open and close functions to
access SIONlib file in parallel. SIONlib provides different interfaces:
parallel access using MPI, OpenMp, or their combination and sequential
access for post-processing utilities.

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/SIONlib/_node.html
http://apps.fz-juelich.de/jsc/sionlib/html/sionlib_tutorial_2013.pdf

-frank-

P.S. Nice blog from Nils

https://www.ibm.com/developerworks/community/blogs/storageneers/entry/scale_out_backup_with_tsm_and_gss_performance_test_results?lang=en

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From sabujp at gmail.com  Tue Aug 12 23:16:22 2014
From: sabujp at gmail.com (Sabuj Pattanayek)
Date: Tue, 12 Aug 2014 17:16:22 -0500
Subject: [gpfsug-discuss] reduce cnfs failover time to a few seconds
Message-ID: <CAEeMGHvSCrCW-3i6_+xQK5A+6P7wfj_4gOia8iWyyQwe0KA-tQ@mail.gmail.com>

Hi all,

Is there anyway to reduce CNFS failover time to just a few seconds?
Currently it seems like it's taking 5 - 10 minutes. We're using virtual
ip's, i.e. interface bond1.1550:0 has one of the cnfs vips, so it should
be fast, but it takes a long time and sometimes causes processes to
crash due to NFS timeouts (some have 600 second soft mount timeouts).
We've also noticed that it sometimes takes even longer unless the cnfs
system on which we're calling mmshutdown is completely shutdown and
isn't returning pings. Even 1 min seems too long.

For comparison, I'm running ctdb + samba on the other NSDs and it's
able to failover in a few seconds after mmshutdown completes.

Thanks,
Sabuj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140812/3495802f/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Fri Aug 15 14:31:29 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 15 Aug 2014 14:31:29 +0100
Subject: [gpfsug-discuss] gpfs client expels, fs hangind and waiters
Message-ID: <53EE0BB1.8000005@ebi.ac.uk>

Hello people,
Its quite a bit of time that i'm triing to solve a problem to our GPFS 
system, without much luck so i think its time to ask some help.

*First of a bit of introduction:**
*
Our GPFS system is made by 3xgss-26, In other words its made with 6x 
servers ( 4x10g links each) and several disk enclosures SAS attacked. 
The todal amount of spare its roughly 2PB, and the disks are SATA ( 
except few SSD dedicated to logtip ). My metadata and on dedicated 
vdisks, but both data and metadata vdiosks are in the same declustered 
arrays and recovery groups, so in the end they share the same spindles.

The clients its a LSF farm configured as another cluster ( standard 
multiclustering configuration) of  roughly 600 nodes .


*The issue:**
*
Recently we became aware that when some massive io request has been done 
we experience a lot of client expells. Heres an example of our logs:

        Fri Aug 15 12:40:24.680 2014: Expel 10.7.28.34 (gss03a) request
        from 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:41.652 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:45.754 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:52.305 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:17.069 2014: Expel 10.7.28.35 (gss03b) request
        from 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:23.555 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.258 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.540 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:42:57.288 2014: Expel 10.7.35.5 (ebi5-132 in
        ebi-cluster.ebi.ac.uk) request from 10.7.28.34 (gss03a).
        Expelling: 10.7.35.5 (ebi5-132 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:43:24.327 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:44:54.202 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:15:54.450 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:20:16.524 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:26:54.177 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:27:53.900 2014: Expel 10.7.28.3 (gss01b) request
        from 10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:28:24.297 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:29:23.913 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk)

at the same time we experience also long waiters queue (1000+ lines). An 
example in case of massive writes ( dd ) :

        0x7F522E1EEF90 waiting 1.861233182 seconds, NSDThread: on ThCond
        0x7F5158019B08 (0x7F5158019B08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.101 <c0n362>
        0x7F522E1EC9B0 waiting 1.490567470 seconds, NSDThread: on ThCond
        0x7F50F4038BA8 (0x7F50F4038BA8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1EB6C0 waiting 1.077098046 seconds, NSDThread: on ThCond
        0x7F50B40011F8 (0x7F50B40011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.156 <c0n647>
        0x7F522E1EA3D0 waiting 7.714968554 seconds, NSDThread: on ThCond
        0x7F50BC0078B8 (0x7F50BC0078B8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.107 <c0n455>
        0x7F522E1E90E0 waiting 4.774379417 seconds, NSDThread: on ThCond
        0x7F506801B1F8 (0x7F506801B1F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.23 <c0n493>
        0x7F522E1E7DF0 waiting 0.746172444 seconds, NSDThread: on ThCond
        0x7F5094007D78 (0x7F5094007D78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.84 <c0n18>
        0x7F522E1E6B00 waiting 1.553030487 seconds, NSDThread: on ThCond
        0x7F51C0004C78 (0x7F51C0004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.63 <c0n153>
        0x7F522E1E5810 waiting 2.165307633 seconds, NSDThread: on ThCond
        0x7F5178016A08 (0x7F5178016A08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.29 <c0n686>
        0x7F522E1E4520 waiting 1.128089273 seconds, NSDThread: on ThCond
        0x7F5074004D98 (0x7F5074004D98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1E3230 waiting 2.515214328 seconds, NSDThread: on ThCond
        0x7F51F400EF08 (0x7F51F400EF08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.90 <c0n494>
        0x7F522E1E1F40 waiting*162.966840834* seconds, NSDThread: on
        ThCond 0x7F51840207A8 (0x7F51840207A8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.34.97 <c0n6>
        0x7F522E1E0C50 waiting 1.140787288 seconds, NSDThread: on ThCond
        0x7F51AC005C08 (0x7F51AC005C08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.94 <c0n664>
        0x7F522E1DF960 waiting 41.907415248 seconds, NSDThread: on
        ThCond 0x7F5160019038 (0x7F5160019038) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1DE670 waiting 0.466560418 seconds, NSDThread: on ThCond
        0x7F513802B258 (0x7F513802B258) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.168 <c0n598>
        0x7F522E1DD380 waiting 3.102803621 seconds, NSDThread: on ThCond
        0x7F516C0106C8 (0x7F516C0106C8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.91 <c0n143>
        0x7F522E1DC090 waiting 2.751614295 seconds, NSDThread: on ThCond
        0x7F504C0011F8 (0x7F504C0011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.25 <c0n651>
        0x7F522E1DADA0 waiting 5.083691891 seconds, NSDThread: on ThCond
        0x7F507401BE88 (0x7F507401BE88) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1D9AB0 waiting 2.263374184 seconds, NSDThread: on ThCond
        0x7F5080003B98 (0x7F5080003B98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.36 <c0n225>
        0x7F522E1D87C0 waiting 0.206989639 seconds, NSDThread: on ThCond
        0x7F505801F0D8 (0x7F505801F0D8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.55 <c0n498>
        0x7F522E1D74D0 waiting *41.841279897* seconds, NSDThread: on
        ThCond 0x7F5194008B88 (0x7F5194008B88) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1D61E0 waiting 5.618652361 seconds, NSDThread: on ThCond
        0x1BAB868 (0x1BAB868) (MsgRecordCondvar), reason 'RPC wait' for
        getData on node 10.7.35.59 <c0n532>
        0x7F522E1D4EF0 waiting 6.185658427 seconds, NSDThread: on ThCond
        0x7F513802AAE8 (0x7F513802AAE8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.6 <c0n330>
        0x7F522E1D3C00 waiting 2.652370892 seconds, NSDThread: on ThCond
        0x7F5130004C78 (0x7F5130004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1D2910 waiting 11.396142225 seconds, NSDThread: on
        ThCond 0x7F51A401C0C8 (0x7F51A401C0C8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.169 <c0n549>
        0x7F522E1D1620 waiting 63.710723043 seconds, NSDThread: on
        ThCond 0x7F5038004D08 (0x7F5038004D08) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.37.120 <c0n8>


or for massive reads:

        0x7FBCE69A8C20 waiting 29.262629530 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE699CEC0 waiting 29.260869141 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE698C5A0 waiting 29.124824888 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6984110 waiting 22.729479654 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69512C0 waiting 29.272805926 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69409A0 waiting 28.833650198 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6924320 waiting 29.237067128 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6921D40 waiting 29.237953228 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6915FE0 waiting 29.046721161 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6913A00 waiting 29.264534710 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6900B00 waiting 29.267691105 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68F7380 waiting 29.266402464 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68D2870 waiting 29.276298231 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68BADB0 waiting 28.665700576 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68B61F0 waiting 29.236878611 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6885980 waiting *144*.530487248 seconds, NSDThread: on
        ThMutex 0x1803396A670 (0xFFFFC9003396A670) (DiskSchedulingMutex)
        0x7FBCE68833A0 waiting 29.231066610 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68820B0 waiting 29.269954514 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE686A5F0 waiting *140*.662994256 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE6864740 waiting 29.254180742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE683FC30 waiting 29.271840565 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE682E020 waiting 29.200969209 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6825B90 waiting 19.136732919 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6805C40 waiting 29.236055550 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FEAA0 waiting 29.283264161 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FC4C0 waiting 29.268992663 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67DFE40 waiting 29.150900786 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D2DF0 waiting 29.199058463 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D1B00 waiting 29.203199738 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67768D0 waiting 29.208231742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6768590 waiting 5.228192589 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE67672A0 waiting 29.252839376 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6757C70 waiting 28.869359044 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6748640 waiting 29.289284179 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6734450 waiting 29.253591817 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6730B80 waiting 29.289987273 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6720260 waiting 26.597589551 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66F32C0 waiting 29.177692849 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66E3C90 waiting 29.160268518 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66CC1D0 waiting 5.334330188 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE66B3420 waiting 34.274433161 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE668E910 waiting 27.699999488 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6689D50 waiting 34.279090465 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE66805D0 waiting 24.688626241 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6675B60 waiting 35.367745840 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE665E0A0 waiting 29.235994598 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE663CE60 waiting 29.162911979 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'


Another example with mmfsadm in case of massive reads:

        [root at gss02b ~]# mmfsadm dump waiters
        0x7F519000AEA0 waiting 28.915010347 seconds, replyCleanupThread:
        on ThCond 0x7F51101B27B8 (0x7F51101B27B8) (MsgRecordCondvar),
        reason 'RPC wait'
        0x7F511C012A10 waiting 279.522206863 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F52000095F8 (0x7F52000095F8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5120000B80 waiting 279.524782437 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5214000EE8 (0x7F5214000EE8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5154006310 waiting 138.164386224 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5174003F08 (0x7F5174003F08)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F522E1EB6C0 waiting 23.060703000 seconds, NSDThread: for poll
        on sock 85
        0x7F522E1E6B00 waiting 0.068456104 seconds, NSDThread: on ThCond
        0x7F50CC00E478 (0x7F50CC00E478) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1D0330 waiting 17.207907857 seconds, NSDThread: on
        ThCond 0x7F5078001688 (0x7F5078001688) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E1BFA10 waiting 0.181011711 seconds, NSDThread: on ThCond
        0x7F504000E558 (0x7F504000E558) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B4FA0 waiting 0.021780338 seconds, NSDThread: on ThCond
        0x7F522000E488 (0x7F522000E488) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B3CB0 waiting 0.794718000 seconds, NSDThread: for poll
        on sock 799
        0x7F522E186D10 waiting 0.191606803 seconds, NSDThread: on ThCond
        0x7F5184015D58 (0x7F5184015D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E184730 waiting 0.025562000 seconds, NSDThread: for poll
        on sock 867
        0x7F522E12CDD0 waiting 0.008921000 seconds, NSDThread: for poll
        on sock 543
        0x7F522E126F20 waiting 1.459531000 seconds, NSDThread: for poll
        on sock 983
        0x7F522E10F460 waiting 17.177936972 seconds, NSDThread: on
        ThCond 0x7F51EC002CE8 (0x7F51EC002CE8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E101120 waiting 17.232580316 seconds, NSDThread: on
        ThCond 0x7F51BC005BB8 (0x7F51BC005BB8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0F1AF0 waiting 438.556030000 seconds, NSDThread: for
        poll on sock 496
        0x7F522E0E7080 waiting 393.702839774 seconds, NSDThread: on
        ThCond 0x7F5164013668 (0x7F5164013668) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E09DA60 waiting 52.746984660 seconds, NSDThread: on
        ThCond 0x7F506C008858 (0x7F506C008858) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E084CB0 waiting 23.096688206 seconds, NSDThread: on
        ThCond 0x7F521C008E18 (0x7F521C008E18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0839C0 waiting 0.093456000 seconds, NSDThread: for poll
        on sock 962
        0x7F522E076970 waiting 2.236659731 seconds, NSDThread: on ThCond
        0x7F51E0027538 (0x7F51E0027538) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E044E10 waiting 52.752497765 seconds, NSDThread: on
        ThCond 0x7F513802BDD8 (0x7F513802BDD8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E033200 waiting 16.157355796 seconds, NSDThread: on
        ThCond 0x7F5104240D58 (0x7F5104240D58) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E02AD70 waiting 436.025203220 seconds, NSDThread: on
        ThCond 0x7F50E0016C28 (0x7F50E0016C28) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E01A450 waiting 393.673252777 seconds, NSDThread: on
        ThCond 0x7F50A8009C18 (0x7F50A8009C18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DFE0460 waiting 1.781358358 seconds, NSDThread: on ThCond
        0x7F51E0027638 (0x7F51E0027638) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF99420 waiting 0.038405427 seconds, NSDThread: on ThCond
        0x7F50F0172B18 (0x7F50F0172B18) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF7CDA0 waiting 438.204625355 seconds, NSDThread: on
        ThCond 0x7F50900023D8 (0x7F50900023D8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF76EF0 waiting 435.903645734 seconds, NSDThread: on
        ThCond 0x7F5084004BC8 (0x7F5084004BC8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF74910 waiting 21.749325022 seconds, NSDThread: on
        ThCond 0x7F507C011F48 (0x7F507C011F48) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF71040 waiting 1.027274000 seconds, NSDThread: for poll
        on sock 866
        0x7F522DF536D0 waiting 52.953847324 seconds, NSDThread: on
        ThCond 0x7F5200006FF8 (0x7F5200006FF8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF510F0 waiting 0.039278000 seconds, NSDThread: for poll
        on sock 837
        0x7F522DF4EB10 waiting 0.085745937 seconds, NSDThread: on ThCond
        0x7F51F0006828 (0x7F51F0006828) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF4C530 waiting 21.850733000 seconds, NSDThread: for poll
        on sock 986
        0x7F522DF4B240 waiting 0.054739884 seconds, NSDThread: on ThCond
        0x7F51EC0168D8 (0x7F51EC0168D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF48C60 waiting 0.186409714 seconds, NSDThread: on ThCond
        0x7F51E4000908 (0x7F51E4000908) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF41AC0 waiting 438.942861290 seconds, NSDThread: on
        ThCond 0x7F51CC010168 (0x7F51CC010168) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF3F4E0 waiting 0.060235106 seconds, NSDThread: on ThCond
        0x7F51C400A438 (0x7F51C400A438) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF22E60 waiting 0.361288000 seconds, NSDThread: for poll
        on sock 518
        0x7F522DF21B70 waiting 0.060722464 seconds, NSDThread: on ThCond
        0x7F51580162D8 (0x7F51580162D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF12540 waiting 23.077564448 seconds, NSDThread: on
        ThCond 0x7F512C13E1E8 (0x7F512C13E1E8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEFD060 waiting 0.723370000 seconds, NSDThread: for poll
        on sock 503
        0x7F522DEE09E0 waiting 1.565799175 seconds, NSDThread: on ThCond
        0x7F5084004D58 (0x7F5084004D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDF6F0 waiting 22.063017342 seconds, NSDThread: on
        ThCond 0x7F5078003E08 (0x7F5078003E08) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEDD110 waiting 0.049108780 seconds, NSDThread: on ThCond
        0x7F5070001D78 (0x7F5070001D78) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDAB30 waiting 229.603224376 seconds, NSDThread: on
        ThCond 0x7F50680221B8 (0x7F50680221B8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DED7260 waiting 0.071855457 seconds, NSDThread: on ThCond
        0x7F506400A5A8 (0x7F506400A5A8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DED5F70 waiting 0.648324000 seconds, NSDThread: for poll
        on sock 766
        0x7F522DEC3070 waiting 1.809205756 seconds, NSDThread: on ThCond
        0x7F522000E518 (0x7F522000E518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEB1460 waiting 436.017396645 seconds, NSDThread: on
        ThCond 0x7F51E4000978 (0x7F51E4000978) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEAC8A0 waiting 393.734102000 seconds, NSDThread: for
        poll on sock 609
        0x7F522DEA3120 waiting 17.960778837 seconds, NSDThread: on
        ThCond 0x7F51B4001708 (0x7F51B4001708) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE86AA0 waiting 23.112060045 seconds, NSDThread: on
        ThCond 0x7F5154096118 (0x7F5154096118) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE64570 waiting 0.076167410 seconds, NSDThread: on ThCond
        0x7F50D8005EF8 (0x7F50D8005EF8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DE1AF50 waiting 17.460836000 seconds, NSDThread: for poll
        on sock 737
        0x7F522DE104E0 waiting 0.205037000 seconds, NSDThread: for poll
        on sock 865
        0x7F522DDB8B80 waiting 0.106192000 seconds, NSDThread: for poll
        on sock 78
        0x7F522DDA36A0 waiting 0.738921180 seconds, NSDThread: on ThCond
        0x7F505400E048 (0x7F505400E048) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD9C500 waiting 0.731118367 seconds, NSDThread: on ThCond
        0x7F503C00B518 (0x7F503C00B518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD89600 waiting 229.609363000 seconds, NSDThread: for
        poll on sock 515
        0x7F522DD567B0 waiting 1.508489195 seconds, NSDThread: on ThCond
        0x7F514C021F88 (0x7F514C021F88) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'


Another thing worth to mention is that the filesystem its totaly 
unresponsive. Even a simple "cd" to a directory or an ls to a directory 
just hangs for several minutes ( litterally). This happens also if i try 
from the NSD servers.


*Few things i have looked into:*
* Our network seems fine, there might be some bottleneck on part of 
them, and this could explain the waiters, but doesnt explain why ad some 
poit those client ask to expel the NSD servers. THis also doesn't 
justify why the FS is slow even on NSD itself.

* Disk bottleneck? i dont think so. NSD servers have cpu usage  (and io 
wait ) very low. Also mmdiag --iohist seems condirming that the 
operation on the disks are reasonable fast:


        === mmdiag: iohist ===

        I/O history:

          I/O start time RW    Buf type disk:sectorNum     nSec  time
        ms  Type  Device/NSD ID         NSD server
        --------------- -- ----------- ----------------- ----- ------- 
        ---- ------------------ ---------------
        13:54:29.209276  W        data   34:5066338808    2056 88.307 
        lcl  sdtu
        13:54:29.209277  W        data   55:5095698936    2056 27.592 
        lcl  sdaab
        13:54:29.209278  W        data  171:5104087544    2056 22.801 
        lcl  sdtg
        13:54:29.209279  W        data  116:5011812856    2056 65.983 
        lcl  sdqr
        13:54:29.209280  W        data   98:4860817912    2056 17.892 
        lcl  sddl
        13:54:29.209281  W        data  159:4999229944    2056 21.324 
        lcl  sdjg
        13:54:29.209282  W        data   84:5049561592    2056 31.932 
        lcl  sdqz
        13:54:29.209283  W        data    8:5003424248    2056 30.912 
        lcl  sdcw
        13:54:29.209284  W        data   23:4965675512    2056 27.366 
        lcl  sdpt
        13:54:29.297715  W  vdiskMDLog    2:144008496        1 0.236 
        lcl  sdkr
        13:54:29.297717  W  vdiskMDLog    0:331703600        1 0.230 
        lcl  sdcm
        13:54:29.297718  W  vdiskMDLog    1:273769776        1 0.241 
        lcl  sdbp
        13:54:29.244902  W        data   51:3857589752    2056 35.566 
        lcl  sdyi
        13:54:29.244904  W        data   10:3773703672    2056 28.512 
        lcl  sdma
        13:54:29.244905  W        data   48:3639485944    2056 24.124 
        lcl  sdel
        13:54:29.244906  W        data   25:3777897976    2056 18.691 
        lcl  sdgt
        13:54:29.244908  W        data   91:3832423928    2056 20.699 
        lcl  sdlc
        13:54:29.244909  W        data  115:3723372024    2056 30.783 
        lcl  sdho
        13:54:29.244910  W        data  173:3882755576    2056 53.241 
        lcl  sdti
        13:54:29.244911  W        data   42:3782092280    2056 22.785 
        lcl  sddz
        13:54:29.244912  W        data   45:3647874552    2056 24.289 
        lcl  sdei
        13:54:29.244913  W        data   32:3652068856    2056 17.220 
        lcl  sdbn
        13:54:29.244914  W        data   39:3677234680    2056 26.017 
        lcl  sddw
        13:54:29.298273  W  vdiskMDLog    2:144008497        1 2.522 
        lcl  sduf
        13:54:29.298274  W  vdiskMDLog    0:331703601        1 1.025 
        lcl  sdlo
        13:54:29.298275  W  vdiskMDLog    1:273769777        1 2.586 
        lcl  sdtt
        13:54:29.288275  W        data   27:2249588200    2056 20.071 
        lcl  sdhb
        13:54:29.288279  W        data   33:2224422376    2056 19.682 
        lcl  sdts
        13:54:29.288281  W        data   47:2115370472    2056 21.667 
        lcl  sdwo
        13:54:29.288282  W        data   82:2316697064    2056 21.524 
        lcl  sdxy
        13:54:29.288283  W        data   85:2232810984    2056 17.467 
        lcl  sdra
        13:54:29.288285  W        data   30:2127953384    2056 18.475 
        lcl  sdqg
        13:54:29.288286  W        data   67:1876295144    2056 16.383 
        lcl  sdmx
        13:54:29.288287  W        data   64:2127953384    2056 21.908 
        lcl  sduh
        13:54:29.288288  W        data   38:2253782504    2056 19.775 
        lcl  sddv
        13:54:29.288290  W        data   15:2207645160    2056 20.599 
        lcl  sdet
        13:54:29.288291  W        data  157:2283142632    2056 21.198 
        lcl  sdiy


Bonding problem on the interfaces? Mellanox ( interface card prodicer) 
drivers and firmware updated, and we even tested the system with a 
single link ( without bonding).


Could someone help me with this? in particular:

* What exactly are client are looking to decide that another node is 
unresponsive? Ping? i dont think so because both NSD servers and clients 
can be pinged, so what they look? if comeone can also specify what port 
are they using i can try to tcpdump what exactly is cauding this expell.

* How can i monitor metadata operations to understand where EXACTLY is 
the bottleneck that causes this:

        [sdinardo at ebi5-001 ~]$ time ls /gpfs/nobackup/sdinardo

        1                   ebi3-054.ebi.ac.uk ebi3-154           
        ebi5-019.ebi.ac.uk  ebi5-052 ebi5-101           
        ebi5-156            ebi5-197 ebi5-228            ebi5-262.ebi.ac.uk
        10                  ebi3-055 ebi3-155           
        ebi5-021.ebi.ac.uk  ebi5-053 ebi5-104.ebi.ac.uk 
        ebi5-160.ebi.ac.uk  ebi5-198 ebi5-229            ebi5-263
        2                   ebi3-056.ebi.ac.uk ebi3-156           
        ebi5-022            ebi5-054.ebi.ac.uk ebi5-106           
        ebi5-161            ebi5-200 ebi5-230.ebi.ac.uk  ebi5-264
        3                   ebi3-057 ebi3-157           
        ebi5-023            ebi5-056 ebi5-109           
        ebi5-162.ebi.ac.uk  ebi5-201 ebi5-231.ebi.ac.uk  ebi5-265
        4                   ebi3-058 ebi3-158.ebi.ac.uk 
        ebi5-024.ebi.ac.uk  ebi5-057 ebi5-110.ebi.ac.uk 
        ebi5-163.ebi.ac.uk  ebi5-202.ebi.ac.uk ebi5-232           
        ebi5-266.ebi.ac.uk
        5                   ebi3-059.ebi.ac.uk ebi3-160           
        ebi5-025            ebi5-060 ebi5-111.ebi.ac.uk 
        ebi5-164            ebi5-204 ebi5-233            ebi5-267
        6                   ebi3-132 ebi3-161.ebi.ac.uk 
        ebi5-026            ebi5-061.ebi.ac.uk ebi5-112.ebi.ac.uk 
        ebi5-165            ebi5-205 ebi5-234            ebi5-269.ebi.ac.uk
        7                   ebi3-133 ebi3-163.ebi.ac.uk 
        ebi5-028            ebi5-062.ebi.ac.uk ebi5-129.ebi.ac.uk 
        ebi5-166            ebi5-206.ebi.ac.uk ebi5-236            ebi5-270
        8                   ebi3-134 ebi3-165           
        ebi5-030            ebi5-064 ebi5-131.ebi.ac.uk 
        ebi5-169.ebi.ac.uk  ebi5-207 ebi5-237            ebi5-271
        9                   ebi3-135 ebi3-166.ebi.ac.uk 
        ebi5-031            ebi5-065 ebi5-132           
        ebi5-170.ebi.ac.uk  ebi5-209 ebi5-239.ebi.ac.uk  launcher.sh

        _*real    21m14.948s*_( WTH ?!?!?!)
        user    0m0.004s
        sys    0m0.014s


I know that the question are not easy to answer, and i need to dig more, 
but could be very helpful if someone give me some hints about where to 
look at. My gpfs skills are limited since this is our first system and 
is in production for just few months, and the things stated to worsen 
just recenlty. In past we could get over 200Gb/s ( both read and write) 
without any issue. Now some clients get expelled even when data 
thoughuput is ad 4-5Gb/s.

Thanks in advance for any help.

Regards,
Salvatore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140815/3eaa5bc1/attachment-0002.htm>

From mail at arif-ali.co.uk  Tue Aug 19 11:18:10 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 11:18:10 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
Message-ID: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>

Hi all,

You may be aware that the website has been down for about a week now. This
is due to the amount of traffic to the website and the amount of people on
the mailing list, we had seen a few issues on the system.

In order to counter the issues, we are moving to a new system to counter
any future issues, and ease of management. We are hoping to do this tonight
( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
please let me know.

I will, as part of the move over, will be sending a few test mails to make
sure that mailing list is working correctly.

Thanks for your patience

--
Arif Ali
gpfsug Admin

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/bac2c92c/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Tue Aug 19 12:11:00 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 19 Aug 2014 12:11:00 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F330C4.808@ebi.ac.uk>

Still problems. Here some more detailed examples:

*EXAMPLE 1:*

            *EBI5-220**( CLIENT)**
            *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
            reply from node <GSS02B IP> gss02b*
            Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
            (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
            GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:04.982 2014: This node will be expelled
            from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
            IP> (ebi5-220)
            Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
            broke. Probing cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.
            Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked. 
            File system: gpfs1  Reason: SGPanic
            Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
            GSS.ebi.ac.uk) is now the Group Leader.

            *GSS02B ( NSD SERVER)*
            ...
            Tue Aug 19 11:03:17.070 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:25.016 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:28.080 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:36.019 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:39.083 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:47.023 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:50.088 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:52.218 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:58.030 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:01.092 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:03.220 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:09.034 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:12.096 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:14.224 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:20.037 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:23.103 2014: Accepted and connected to
            *<EBI5-220 IP>* ebi5-220 <c0n618>
            ...

            *GSS02a ( NSD SERVER)*
            Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
            request from <EBI5-220 IP> (ebi5-220 in
            ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
            in ebi-cluster.ebi.ac.uk)
            Tue Aug 19 11:03:12.069 2014: Accepted and connected to
            <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
*EXAMPLE 2*:

            *EBI5-038*
            Tue Aug 19 11:32:34.227 2014: *Disk lease period expired in
            cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
            Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
            cluster GSS.ebi.ac.uk*
            Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014
            IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting
            reconnect.
            ...
            LOT MORE RESETS BY PEER
            ...
            Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167
            IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
            gss02a <c1n2>
            Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
            IP> gss02a <c1n2>
            Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.*

            *GSS02a*
            Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
            in ebi-cluster.ebi.ac.uk) *is being expelled because of an
            expired lease.* Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the 
servers seems working fine.. how can i trace better ( to solve) the 
problem?

In example 2 it seems to me that for some reason the manager are not 
renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how 
can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/9b7c2042/attachment-0002.htm>

From mail at arif-ali.co.uk  Tue Aug 19 20:59:47 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 20:59:47 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
In-Reply-To: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
References: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
Message-ID: <CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>

This is a test mail to the mailing list

please do not reply

--
Arif Ali

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali


On 19 August 2014 11:18, Arif Ali <mail at arif-ali.co.uk> wrote:

> Hi all,
>
> You may be aware that the website has been down for about a week now. This
> is due to the amount of traffic to the website and the amount of people on
> the mailing list, we had seen a few issues on the system.
>
> In order to counter the issues, we are moving to a new system to counter
> any future issues, and ease of management. We are hoping to do this tonight
> ( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
> please let me know.
>
> I will, as part of the move over, will be sending a few test mails to make
> sure that mailing list is working correctly.
>
> Thanks for your patience
>
> --
> Arif Ali
> gpfsug Admin
>
> IRC: arif-ali at freenode
> LinkedIn: http://uk.linkedin.com/in/arifali
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/c2067414/attachment-0002.htm>

From mail at arif-ali.co.uk  Tue Aug 19 23:41:48 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 23:41:48 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
In-Reply-To: <CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>
References: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
	<CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>
Message-ID: <CAM0VtDa4pX8hi8VGkjkYYuxrW=tySdaXScOeBayHxwhcuUkAjg@mail.gmail.com>

Thanks for all your patience,

The service should all be back up again

--
Arif Ali

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali


On 19 August 2014 20:59, Arif Ali <mail at arif-ali.co.uk> wrote:

> This is a test mail to the mailing list
>
> please do not reply
>
> --
> Arif Ali
>
> IRC: arif-ali at freenode
> LinkedIn: http://uk.linkedin.com/in/arifali
>
>
> On 19 August 2014 11:18, Arif Ali <mail at arif-ali.co.uk> wrote:
>
>> Hi all,
>>
>> You may be aware that the website has been down for about a week now.
>> This is due to the amount of traffic to the website and the amount of
>> people on the mailing list, we had seen a few issues on the system.
>>
>> In order to counter the issues, we are moving to a new system to counter
>> any future issues, and ease of management. We are hoping to do this tonight
>> ( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
>> please let me know.
>>
>> I will, as part of the move over, will be sending a few test mails to
>> make sure that mailing list is working correctly.
>>
>> Thanks for your patience
>>
>> --
>> Arif Ali
>> gpfsug Admin
>>
>> IRC: arif-ali at freenode
>> LinkedIn: http://uk.linkedin.com/in/arifali
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/a82bb0f9/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Wed Aug 20 08:57:23 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 20 Aug 2014 08:57:23 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F454E3.40803@ebi.ac.uk>

Still problems. Here some more detailed examples:

*EXAMPLE 1:*

            *EBI5-220**( CLIENT)**
            *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
            reply from node <GSS02B IP> gss02b*
            Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
            (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
            GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:04.982 2014: This node will be expelled
            from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
            IP> (ebi5-220)
            Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
            broke. Probing cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.
            Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked. 
            File system: gpfs1  Reason: SGPanic
            Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
            GSS.ebi.ac.uk) is now the Group Leader.

            *GSS02B ( NSD SERVER)*
            ...
            Tue Aug 19 11:03:17.070 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:25.016 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:28.080 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:36.019 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:39.083 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:47.023 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:50.088 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:52.218 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:58.030 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:01.092 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:03.220 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:09.034 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:12.096 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:14.224 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:20.037 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:23.103 2014: Accepted and connected to
            *<EBI5-220 IP>* ebi5-220 <c0n618>
            ...

            *GSS02a ( NSD SERVER)*
            Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
            request from <EBI5-220 IP> (ebi5-220 in
            ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
            in ebi-cluster.ebi.ac.uk)
            Tue Aug 19 11:03:12.069 2014: Accepted and connected to
            <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
*EXAMPLE 2*:

            *EBI5-038*
            Tue Aug 19 11:32:34.227 2014: *Disk lease period expired in
            cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
            Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
            cluster GSS.ebi.ac.uk*
            Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014
            IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting
            reconnect.
            ...
            LOT MORE RESETS BY PEER
            ...
            Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167
            IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
            gss02a <c1n2>
            Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
            IP> gss02a <c1n2>
            Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.*

            *GSS02a*
            Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
            in ebi-cluster.ebi.ac.uk) *is being expelled because of an
            expired lease.* Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the 
servers seems working fine.. how can i trace better ( to solve) the 
problem?

In example 2 it seems to me that for some reason the manager are not 
renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how 
can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140820/b9977ac0/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Wed Aug 20 09:03:03 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 20 Aug 2014 09:03:03 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F454E3.40803@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
Message-ID: <53F45637.8080000@ebi.ac.uk>

Another interesting case about a specific waiter:

was looking the waiters on GSS until i found those( i got those info 
collecting from all the servers with a script i did, so i was able to 
trace hanging connection while they was happening):

                gss03b.ebi.ac.uk:*235.373993397*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss03b.ebi.ac.uk:*235.152271998*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss02a.ebi.ac.uk:*214.079093620 *(MsgRecordCondvar),
                reason 'RPC wait' for tmMsgRevoke on node 10.7.34.109
                <c0n656>
                gss02a.ebi.ac.uk:*213.580199240 *(MsgRecordCondvar),
                reason 'RPC wait' for tmMsgRevoke on node 10.7.37.109
                <c0n675>
                gss03b.ebi.ac.uk:*132.375138082*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss03b.ebi.ac.uk:*132.374973884 *(MsgRecordCondvar),
                reason 'RPC wait' for commMsgCheckMessages on node
                10.7.37.109 <c0n675>


the bolted number are seconds. looking at this page:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/Interpreting+GPFS+Waiter+Information

The web page claim that's, probably a network congestion, but i managed 
to login quick enough to the client and there the waiters was:

                [root at ebi5-236 ~]# mmdiag --waiters

                === mmdiag: waiters ===
                0x7F6690073460 waiting 147.973009173 seconds,
                RangeRevokeWorkerThread: on ThCond 0x1801E43F6A0
                (0xFFFFC9001E43F6A0) (LkObjCondvar), reason 'waiting for
                LX lock'
                0x7F65100036D0 waiting 140.458589856 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6500000F98
                (0x7F6500000F98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F63A0001080 waiting 245.153055801 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65D8002CF8
                (0x7F65D8002CF8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C03D3D0 waiting 245.750977203 seconds,
                CleanBufferThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait for buffer write to complete'
                0x7F674802E360 waiting 244.159861966 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65E0002358
                (0x7F65E0002358) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C038810 waiting 251.086748430 seconds,
                SGExceptionLogBufferFullThread: on ThCond 0x7F64EC001398
                (0x7F64EC001398) (MsgRecordCondvar), reason 'RPC wait'
                for I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C036230 waiting 139.556735095 seconds,
                CleanBufferThread: on ThCond 0x7F65CC004C78
                (0x7F65CC004C78) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C031670 waiting 144.327593052 seconds,
                WritebehindWorkerThread: on ThCond 0x7F672402D1A8
                (0x7F672402D1A8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C02A4D0 waiting 145.202712821 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65440018F8
                (0x7F65440018F8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C0291E0 waiting 247.131569232 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65740016C8
                (0x7F65740016C8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6748025BD0 waiting 11.631381523 seconds,
                replyCleanupThread: on ThCond 0x7F65E000A1F8
                (0x7F65E000A1F8) (MsgRecordCondvar), reason 'RPC wait'
                0x7F6748022300 waiting 245.616267612 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6470001468
                (0x7F6470001468) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6748021010 waiting 230.769670930 seconds,
                InodeAllocRevokeWorkerThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait for buffer write to complete'
                0x7F674801B160 waiting 245.830554594 seconds,
                UnusedInodePrefetchThread: on ThCond 0x7F65B8004438
                (0x7F65B8004438) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674800A820 waiting 252.332932000 seconds, Msg
                handler getData: for poll on sock 109
                0x7F63F4023090 waiting 253.073535042 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65C4000CC8
                (0x7F65C4000CC8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64A4000CE0 waiting 145.049659249 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6560000A98
                (0x7F6560000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6778006D00 waiting 142.124664264 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63DC000C08
                (0x7F63DC000C08) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780046D0 waiting 251.751439453 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6454000A98
                (0x7F6454000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780E4B70 waiting 142.431051232 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63C80010D8
                (0x7F63C80010D8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780E50D0 waiting 244.339624817 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65BC001B98
                (0x7F65BC001B98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6434000B40 waiting 145.343700410 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63B00036E8
                (0x7F63B00036E8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F670C0187A0 waiting 244.903963969 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F0000FB8
                (0x7F65F0000FB8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C04E2F0 waiting 245.837137631 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65A4000A98
                (0x7F65A4000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C04AA20 waiting 139.713993908 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6454002478
                (0x7F6454002478) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C049730 waiting 252.434187472 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F4003708
                (0x7F65F4003708) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C044B70 waiting 131.515829048 seconds, Msg
                handler ccMsgPing: on ThCond 0x7F64DC1D4888
                (0x7F64DC1D4888) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F6758008DE0 waiting 149.548547226 seconds, Msg
                handler getData: on ThCond 0x7F645C002458
                (0x7F645C002458) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F67580071D0 waiting 149.548543118 seconds, Msg
                handler commMsgCheckMessages: on ThCond 0x7F6450001C48
                (0x7F6450001C48) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F65A40052B0 waiting 11.498507001 seconds, Msg handler
                ccMsgPing: on ThCond 0x7F644C103F88 (0x7F644C103F88)
                (InuseCondvar), reason 'waiting for exclusive use of
                connection for sending msg'
                0x7F6448001620 waiting 139.844870446 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F0003098
                (0x7F65F0003098) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F63F4000F80 waiting 245.044791905 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6450001188
                (0x7F6450001188) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F659C0033A0 waiting 243.464399305 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6554002598
                (0x7F6554002598) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6514001690 waiting 245.826160463 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65A4004558
                (0x7F65A4004558) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64800012B0 waiting 253.174835511 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65E0000FB8
                (0x7F65E0000FB8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6510000EE0 waiting 140.746696039 seconds,
                WritebehindWorkerThread: on ThCond 0x7F647C000CC8
                (0x7F647C000CC8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6754001BB0 waiting 246.336055629 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6594002498
                (0x7F6594002498) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6420000930 waiting 140.606777450 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6578002498
                (0x7F6578002498) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6744009110 waiting 137.466372831 seconds,
                FileBlockReadFetchHandlerThread: on ThCond
                0x7F65F4007158 (0x7F65F4007158) (MsgRecordCondvar),
                reason 'RPC wait' for NSD I/O completion on node
                10.7.28.35 <c1n5>
                0x7F67280119F0 waiting 144.173427360 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6504000AE8
                (0x7F6504000AE8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F672800BB40 waiting 145.804301887 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6550001038
                (0x7F6550001038) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6728000910 waiting 252.601993452 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6450000A98
                (0x7F6450000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6744007E20 waiting 251.603329204 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6570004C18
                (0x7F6570004C18) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64AC002EF0 waiting 139.205774422 seconds,
                FileBlockWriteFetchHandlerThread: on ThCond
                0x18020AF0260 (0xFFFFC90020AF0260)
                (FetchFlowControlCondvar), reason 'wait for buffer for
                fetch'
                0x7F6724013050 waiting 71.501580932 seconds, Msg handler
                ccMsgPing: on ThCond 0x7F6580006608 (0x7F6580006608)
                (InuseCondvar), reason 'waiting for exclusive use of
                connection for sending msg'
                0x7F661C000DA0 waiting 245.654985276 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6570005288
                (0x7F6570005288) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C00F440 waiting 251.096002003 seconds,
                FileBlockReadFetchHandlerThread: on ThCond
                0x7F65BC002878 (0x7F65BC002878) (MsgRecordCondvar),
                reason 'RPC wait' for NSD I/O completion on node
                10.7.28.35 <c1n5>
                0x7F671C00E150 waiting 144.034006970 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6528001548
                (0x7F6528001548) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A02FCD20 waiting 142.324070945 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6580002A98
                (0x7F6580002A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A02FA330 waiting 200.670114385 seconds,
                EEWatchDogThread: on ThCond 0x7F65B0000A98
                (0x7F65B0000A98) (MsgRecordCondvar), reason 'RPC wait'
                0x7F67A02BF050 waiting 252.276161189 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6584003998
                (0x7F6584003998) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A0004160 waiting 251.173651822 seconds,
                SyncHandlerThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait on force active buffer write'


So from the client side its the client that's waiting the server. I 
managed also to ping, ssh, and   tcpdump each other before the node got 
expelled and discovered that ping works fine, ssh work fine , beside my 
tests there are  0 packet passing between them, LITERALLY.

So there is no congestion, no network issues, but the server waits for 
the client and the client waits the server. This happens until we reach 
350 secs ( 10 times the lease time) , then client get expelled.
There are no local io waiters that indicates that gss is struggling, 
there is plenty of bandwith and CPU resources and no network congestion.

Seems some sort of deadlock to me, but how can this be explained and 
hopefully fixed?

Regards,
Salvatore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140820/77aceb5a/attachment-0002.htm>

From chair at gpfsug.org  Thu Aug 21 09:20:39 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Thu, 21 Aug 2014 09:20:39 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F454E3.40803@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
Message-ID: <53F5ABD7.80107@gpfsug.org>

Hi there,

   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
better term now GSS is out) and seen ping 'working', but alongside 
ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - 
and rightly so.

In my experience this has _always_ been a network issue of one sort of 
another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen 
that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to 
determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly 
or via your OEM and collect and supply a snap and traces as required by 
support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
> Still problems. Here some more detailed examples:
>
> *EXAMPLE 1:*
>
>             *EBI5-220**( CLIENT)**
>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>             reply from node <GSS02B IP> gss02b*
>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
>             (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
>             GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>             from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
>             IP> (ebi5-220)
>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>             broke. Probing cluster GSS.ebi.ac.uk
>             Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
>             nodes during cluster probe.
>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>             GSS.ebi.ac.uk. Unmounting file systems.
>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>             invoked.  File system: gpfs1  Reason: SGPanic
>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>             gss02a <c1p687>
>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>             gss02a <c1p687>
>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>             gss02b <c1p686>
>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>             gss03b <c1p685>
>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>             gss03a <c1p684>
>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>             gss01b <c1p683>
>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>             gss01a <c1p1>
>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>             gss02b <c1p686>
>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>             gss03b <c1p685>
>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>             gss03a <c1p684>
>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>             gss01b <c1p683>
>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>             gss01a <c1p1>
>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
>             GSS.ebi.ac.uk) is now the Group Leader.
>
>             *GSS02B ( NSD SERVER)*
>             ...
>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>             ...
>
>             *GSS02a ( NSD SERVER)*
>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>             request from <EBI5-220 IP> (ebi5-220 in
>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
>             in ebi-cluster.ebi.ac.uk)
>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>             <EBI5-220 IP> ebi5-220 <c0n618>
>
>
> ===============================================
> *EXAMPLE 2*:
>
>             *EBI5-038*
>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>             cluster GSS.ebi.ac.uk*
>             Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection reset by peer). Attempting
>             reconnect.
>             Tue Aug 19 11:35:24.865 2014: Close connection to
>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>             peer). Attempting reconnect.
>             ...
>             LOT MORE RESETS BY PEER
>             ...
>             Tue Aug 19 11:35:25.096 2014: Close connection to
>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>             peer). Attempting reconnect.
>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>             gss02a <c1n2>
>             Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection failed because destination
>             is still processing previous node failure)
>             Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
>             IP> gss02a <c1n2>
>             Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection failed because destination
>             is still processing previous node failure)
>             Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
>             nodes during cluster probe.
>             Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
>             GSS.ebi.ac.uk. Unmounting file systems.*
>
>             *GSS02a*
>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
>             in ebi-cluster.ebi.ac.uk) *is being expelled because of an
>             expired lease.* Pings sent: 60. Replies received: 60.
>
>
>
>
> In example 1 seems that an NSD was not repliyng to the client, but the 
> servers seems working fine.. how can i trace better ( to solve) the 
> problem?
>
> In example 2 it seems to me that for some reason the manager are not 
> renewing the lease in time. when this happens , its not a single client.
> Loads of them fail to get the lease renewed. Why this is happening? 
> how can i trace to the source of the problem?
>
>
>
> Thanks in advance for any tips.
>
> Regards,
> Salvatore
>
>
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/9039306e/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Thu Aug 21 10:04:47 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 10:04:47 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5ABD7.80107@gpfsug.org>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>
Message-ID: <53F5B62F.1060305@ebi.ac.uk>

Thanks for the feedback, but we managed to find a scenario that excludes 
network problems.

we have a file called */input_file/* of nearly 100GB:

if from *client A* we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they 
flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the 
same client) *client B* :

cat input_file >> output_file


  ( in other words we are trying to write to the same destination) all 
the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for 
one of the "cat", so have to wait ( and since the file is BIG , have to 
wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel 
the node? We can reproduce this every time, and since our users to 
operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to 
the same destination, its also quite common if we want to dump to a log 
file logs and for some reason one of the writers, write for a lot of 
time keeping the file locked.
Our expels are not due to network congestion, but because a write 
attempts have to wait another one. What i really dont understand is why 
to take a so expreme mesure to expell jest because a process is waiteing 
"to too much time".


I have ticket opened to IBM for this and the issue is under 
investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
> Hi there,
>
>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
> better term now GSS is out) and seen ping 'working', but alongside 
> ejections from the cluster.
> The GPFS internode 'ping' is somewhat more circumspect than unix ping 
> - and rightly so.
>
> In my experience this has _always_ been a network issue of one sort of 
> another.  If the network is experiencing issues, nodes will be ejected.
> Of course it could be unresponsive mmfsd or high loadavg, but I've 
> seen that only twice in 10 years over many versions of GPFS.
>
> You need to follow the logs through from each machine in time order to 
> determine who could not see who and in what order.
> Your best way forward is to log a SEV2 case with IBM support, directly 
> or via your OEM and collect and supply a snap and traces as required 
> by support.
>
> Without knowing your full setup, it's hard to help further.
>
> Jez
>
> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>> Still problems. Here some more detailed examples:
>>
>> *EXAMPLE 1:*
>>
>>             *EBI5-220**( CLIENT)**
>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>             reply from node <GSS02B IP> gss02b*
>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
>>             (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
>>             GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>             from cluster GSS.ebi.ac.uk due to expel msg from
>>             <EBI5-220 IP> (ebi5-220)
>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>             broke. Probing cluster GSS.ebi.ac.uk
>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>             quorum nodes during cluster probe.
>>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>             GSS.ebi.ac.uk. Unmounting file systems.
>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>             invoked.  File system: gpfs1  Reason: SGPanic
>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>             gss02a <c1p687>
>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>             gss02a <c1p687>
>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>             gss02b <c1p686>
>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>             gss03b <c1p685>
>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>             gss03a <c1p684>
>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>             gss01b <c1p683>
>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>             gss01a <c1p1>
>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>             gss02b <c1p686>
>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>             gss03b <c1p685>
>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>             gss03a <c1p684>
>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>             gss01b <c1p683>
>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>             gss01a <c1p1>
>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
>>             GSS.ebi.ac.uk) is now the Group Leader.
>>
>>             *GSS02B ( NSD SERVER)*
>>             ...
>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>             ...
>>
>>             *GSS02a ( NSD SERVER)*
>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>             request from <EBI5-220 IP> (ebi5-220 in
>>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>             (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>
>>
>> ===============================================
>> *EXAMPLE 2*:
>>
>>             *EBI5-038*
>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>             cluster GSS.ebi.ac.uk*
>>             Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection reset by peer). Attempting
>>             reconnect.
>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>             peer). Attempting reconnect.
>>             ...
>>             LOT MORE RESETS BY PEER
>>             ...
>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>             peer). Attempting reconnect.
>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>             gss02a <c1n2>
>>             Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection failed because destination
>>             is still processing previous node failure)
>>             Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
>>             IP> gss02a <c1n2>
>>             Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection failed because destination
>>             is still processing previous node failure)
>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>             quorum nodes during cluster probe.
>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
>>             GSS.ebi.ac.uk. Unmounting file systems.*
>>
>>             *GSS02a*
>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>             (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>             because of an expired lease.* Pings sent: 60. Replies
>>             received: 60.
>>
>>
>>
>>
>> In example 1 seems that an NSD was not repliyng to the client, but 
>> the servers seems working fine.. how can i trace better ( to solve) 
>> the problem?
>>
>> In example 2 it seems to me that for some reason the manager are not 
>> renewing the lease in time. when this happens , its not a single client.
>> Loads of them fail to get the lease renewed. Why this is happening? 
>> how can i trace to the source of the problem?
>>
>>
>>
>> Thanks in advance for any tips.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/a0a8b3b7/attachment-0002.htm>

From bbanister at jumptrading.com  Thu Aug 21 13:48:38 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 21 Aug 2014 12:48:38 +0000
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>,<53F5B62F.1060305@ebi.ac.uk>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>

As I understand GPFS distributed locking semantics, GPFS will not allow one node to hold a write lock for a file indefinitely.  Once Client B opens the file for writing it would have contacted the File System Manager to obtain the lock.  The FS manager would have told Client B that Client A has the lock and that Client B would have to contact Client A and revoke the write lock token.  If Client A does not respond to Client B's request to revoke the write token, then Client B will ask that Client A be expelled from the cluster for NOT adhering to the proper protocol for write lock contention.


[cid:2fb2253c-3ffb-4ac6-88a8-d019b1a24f66]


Have you checked the communication path between the two clients at this point?

I could not follow the logs that you provided.  You should definitely look at the exact sequence of log events on the two clients and the file system manager (as reported by mmlsmgr).

Hope that helps,
-Bryan

________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk]
Sent: Thursday, August 21, 2014 4:04 AM
To: chair at gpfsug.org; gpfsug main discussion list
Subject: Re: [gpfsug-discuss] gpfs client expels

Thanks for the feedback, but we managed to find a scenario that excludes network problems.

we have a file called input_file of nearly 100GB:

if from client A we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the same client) client B :

cat input_file >> output_file


 ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".


I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
Hi there,

  I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.

In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
Still problems. Here some more detailed examples:

EXAMPLE 1:
EBI5-220 ( CLIENT)
Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.

GSS02B ( NSD SERVER)
...
Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
...

GSS02a ( NSD SERVER)
Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
EXAMPLE 2:

EBI5-038
Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
...
LOT MORE RESETS BY PEER
...
Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.

GSS02a
Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem?

In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/416902ff/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GPFS_Token_Protocol.png
Type: image/png
Size: 249179 bytes
Desc: GPFS_Token_Protocol.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/416902ff/attachment-0002.png>

From jbernard at jumptrading.com  Thu Aug 21 13:52:05 2014
From: jbernard at jumptrading.com (Jon Bernard)
Date: Thu, 21 Aug 2014 12:52:05 +0000
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>, <53F5B62F.1060305@ebi.ac.uk>,
	<21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <D3287279-9A7C-4645-B41F-E2B36DCDBA85@jumptrading.com>

Where is that from?

On Aug 21, 2014, at 7:49, "Bryan Banister" <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:

As I understand GPFS distributed locking semantics, GPFS will not allow one node to hold a write lock for a file indefinitely.  Once Client B opens the file for writing it would have contacted the File System Manager to obtain the lock.  The FS manager would have told Client B that Client A has the lock and that Client B would have to contact Client A and revoke the write lock token.  If Client A does not respond to Client B's request to revoke the write token, then Client B will ask that Client A be expelled from the cluster for NOT adhering to the proper protocol for write lock contention.


<GPFS_Token_Protocol.png>


Have you checked the communication path between the two clients at this point?

I could not follow the logs that you provided.  You should definitely look at the exact sequence of log events on the two clients and the file system manager (as reported by mmlsmgr).

Hope that helps,
-Bryan

________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk<mailto:sdinardo at ebi.ac.uk>]
Sent: Thursday, August 21, 2014 4:04 AM
To: chair at gpfsug.org<mailto:chair at gpfsug.org>; gpfsug main discussion list
Subject: Re: [gpfsug-discuss] gpfs client expels

Thanks for the feedback, but we managed to find a scenario that excludes network problems.

we have a file called input_file of nearly 100GB:

if from client A we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the same client) client B :

cat input_file >> output_file


 ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".


I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
Hi there,

  I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.

In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
Still problems. Here some more detailed examples:

EXAMPLE 1:
EBI5-220 ( CLIENT)
Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220 IP> (ebi5-220)
Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Unmounting file systems.
Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) is now the Group Leader.

GSS02B ( NSD SERVER)
...
Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
...

GSS02a ( NSD SERVER)
Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>)
Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
EXAMPLE 2:

EBI5-038
Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Attempting to reacquire lease.
Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
...
LOT MORE RESETS BY PEER
...
Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Unmounting file systems.

GSS02a
Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem?

In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/ede5a53a/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GPFS_Token_Protocol.png
Type: image/png
Size: 249179 bytes
Desc: GPFS_Token_Protocol.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/ede5a53a/attachment-0002.png>

From viccornell at gmail.com  Thu Aug 21 14:03:14 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Thu, 21 Aug 2014 14:03:14 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org> <53F5B62F.1060305@ebi.ac.uk>
Message-ID: <9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>

Hi Salvatore,

Are you using ethernet or infiniband as the GPFS interconnect to your clients?

If 10/40GbE - do you have a separate admin network?

I have seen behaviour similar to this where the storage traffic causes congestion and the "admin" traffic gets lost or delayed causing expels.

Vic


On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:

> Thanks for the feedback, but we managed to find a scenario that excludes network problems.
> 
> we have a file called input_file of nearly 100GB:
> 
> if from client A we do:
> 
> cat input_file >> output_file
> 
> it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...
> 
> 
> if now we do the same from another client ( or just another shell on the same client) client B :
> 
> cat input_file >> output_file
> 
> 
>  ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.
> 
> 
> Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease. 
> Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.
> 
> 
> 
> As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
> Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".
> 
> 
> I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..
> 
> Regards,
> Salvatore
> 
> 
> 
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>> 
>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.
>> 
>> In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.
>> 
>> You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.
>> 
>> Without knowing your full setup, it's hard to help further.
>> 
>> Jez
>> 
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>> 
>>> EXAMPLE 1:
>>> EBI5-220 ( CLIENT)
>>> Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.
>>> 
>>> GSS02B ( NSD SERVER)
>>> ...
>>> Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>> ...
>>> 
>>> GSS02a ( NSD SERVER)
>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>> 
>>> 
>>> ===============================================
>>> EXAMPLE 2:
>>> 
>>> EBI5-038
>>> Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
>>> Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
>>> Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
>>> ...
>>> LOT MORE RESETS BY PEER
>>> ...
>>> Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
>>> Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>> Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
>>> Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
>>> Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>> 
>>> GSS02a
>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.
>>> 
>>> 
>>> 
>>> In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem? 
>>> 
>>> In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client. 
>>> Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?
>>> 
>>> 
>>> 
>>> Thanks in advance for any tips.
>>> 
>>> Regards,
>>> Salvatore
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/8ebcc5bd/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Thu Aug 21 14:04:59 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 14:04:59 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>,
	<53F5B62F.1060305@ebi.ac.uk>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <53F5EE7B.2080306@ebi.ac.uk>

Thanks for the info...  it helps a bit understanding whats going on, but 
i think you missed the part that Node A and Node B could also be the 
same machine.

If for instance i ran 2 cp on the same machine, hence Client B cannot 
have problems contacting Client A since they are the same machine.....

BTW i did the same also using 2 clients and the result its the same. 
Nonetheless your description is made me understand a bit better what's 
going on


Regards,
Salvatore

On 21/08/14 13:48, Bryan Banister wrote:
> As I understand GPFS distributed locking semantics, GPFS will not 
> allow one node to hold a write lock for a file indefinitely.  Once 
> Client B opens the file for writing it would have contacted the File 
> System Manager to obtain the lock.  The FS manager would have told 
> Client B that Client A has the lock and that Client B would have to 
> contact Client A and revoke the write lock token.  If Client A does 
> not respond to Client B's request to revoke the write token, then 
> Client B will ask that Client A be expelled from the cluster for NOT 
> adhering to the proper protocol for write lock contention.
>
>
>
> Have you checked the communication path between the two clients at 
> this point?
>
> I could not follow the logs that you provided.  You should definitely 
> look at the exact sequence of log events on the two clients and the 
> file system manager (as reported by mmlsmgr).
>
> Hope that helps,
> -Bryan
>
> ------------------------------------------------------------------------
> *From:* gpfsug-discuss-bounces at gpfsug.org 
> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo 
> [sdinardo at ebi.ac.uk]
> *Sent:* Thursday, August 21, 2014 4:04 AM
> *To:* chair at gpfsug.org; gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] gpfs client expels
>
> Thanks for the feedback, but we managed to find a scenario that 
> excludes network problems.
>
> we have a file called */input_file/* of nearly 100GB:
>
> if from *client A* we do:
>
> cat input_file >> output_file
>
> it start copying.. and we see waiter goeg a bit up,secs but then they 
> flushes back to 0, so we xcan say that the copy proceed well...
>
>
> if now we do the same from another client ( or just another shell on 
> the same client) *client B* :
>
> cat input_file >> output_file
>
>
>  ( in other words we are trying to write to the same destination) all 
> the waiters gets up until one node get expelled.
>
>
> Now, while its understandable that the destination file is locked for 
> one of the "cat", so have to wait ( and since the file is BIG , have 
> to wait for a while), its not understandable why it stop the renewal 
> lease.
> Why its doen't return just a timeout error on the copy instead to 
> expel the node? We can reproduce this every time, and since our users 
> to operations like this on files over 100GB each you can imagine the 
> result.
>
>
>
> As you can imagine even if its a bit silly to write at the same time 
> to the same destination, its also quite common if we want to dump to a 
> log file logs and for some reason one of the writers, write for a lot 
> of time keeping the file locked.
> Our expels are not due to network congestion, but because a write 
> attempts have to wait another one. What i really dont understand is 
> why to take a so expreme mesure to expell jest because a process is 
> waiteing "to too much time".
>
>
> I have ticket opened to IBM for this and the issue is under 
> investigation, but no luck so far..
>
> Regards,
> Salvatore
>
>
>
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>>
>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>> better term now GSS is out) and seen ping 'working', but alongside 
>> ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping 
>> - and rightly so.
>>
>> In my experience this has _always_ been a network issue of one sort 
>> of another.  If the network is experiencing issues, nodes will be 
>> ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>> seen that only twice in 10 years over many versions of GPFS.
>>
>> You need to follow the logs through from each machine in time order 
>> to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support, 
>> directly or via your OEM and collect and supply a snap and traces as 
>> required by support.
>>
>> Without knowing your full setup, it's hard to help further.
>>
>> Jez
>>
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>>
>>> *EXAMPLE 1:*
>>>
>>>             *EBI5-220**( CLIENT)**
>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>             reply from node <GSS02B IP> gss02b*
>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>             IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP>
>>>             (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>>             from cluster GSS.ebi.ac.uk due to expel msg from
>>>             <EBI5-220 IP> (ebi5-220)
>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>>             broke. Probing cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>>             GSS.ebi.ac.uk. Unmounting file systems.
>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>             in GSS.ebi.ac.uk) is now the Group Leader.
>>>
>>>             *GSS02B ( NSD SERVER)*
>>>             ...
>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>             ...
>>>
>>>             *GSS02a ( NSD SERVER)*
>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>>             request from <EBI5-220 IP> (ebi5-220 in
>>>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>>             (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>
>>>
>>> ===============================================
>>> *EXAMPLE 2*:
>>>
>>>             *EBI5-038*
>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>>             cluster GSS.ebi.ac.uk*
>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>             Attempting reconnect.
>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             ...
>>>             LOT MORE RESETS BY PEER
>>>             ...
>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1n2>
>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>             <GSS02A IP> gss02a <c1n2>
>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>             cluster GSS.ebi.ac.uk. Unmounting file systems.*
>>>
>>>             *GSS02a*
>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>>             because of an expired lease.* Pings sent: 60. Replies
>>>             received: 60.
>>>
>>>
>>>
>>>
>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>> the servers seems working fine.. how can i trace better ( to solve) 
>>> the problem?
>>>
>>> In example 2 it seems to me that for some reason the manager are not 
>>> renewing the lease in time. when this happens , its not a single 
>>> client.
>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>> how can i trace to the source of the problem?
>>>
>>>
>>>
>>> Thanks in advance for any tips.
>>>
>>> Regards,
>>> Salvatore
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 249179 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment-0002.png>

From sdinardo at ebi.ac.uk  Thu Aug 21 14:18:19 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 14:18:19 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>
	<53F5B62F.1060305@ebi.ac.uk>
	<9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
Message-ID: <53F5F19B.1010603@ebi.ac.uk>

This is an interesting point!

We use ethernet ( 10g links on the clients) but we dont have a separate 
network for the admin network.

Could you explain this a bit further, because the clients and the 
servers we have are on different subnet so the packet are routed.. I 
don't see a practical way to separate them. The clients are blades in a 
chassis so even if i create 2 interfaces, they will physically use the 
came "cable" to go to the first switch. even the clients ( 600 clients) 
have different subsets.

I will forward this consideration to our network admin , so see if we 
can work on a dedicated network.

thanks for your tip.

Regards,
Salvatore


On 21/08/14 14:03, Vic Cornell wrote:
> Hi Salvatore,
>
> Are you using ethernet or infiniband as the GPFS interconnect to your 
> clients?
>
> If 10/40GbE - do you have a separate admin network?
>
> I have seen behaviour similar to this where the storage traffic causes 
> congestion and the "admin" traffic gets lost or delayed causing expels.
>
> Vic
>
>
>
> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk 
> <mailto:sdinardo at ebi.ac.uk>> wrote:
>
>> Thanks for the feedback, but we managed to find a scenario that 
>> excludes network problems.
>>
>> we have a file called */input_file/* of nearly 100GB:
>>
>> if from *client A* we do:
>>
>> cat input_file >> output_file
>>
>> it start copying.. and we see waiter goeg a bit up,secs but then they 
>> flushes back to 0, so we xcan say that the copy proceed well...
>>
>>
>> if now we do the same from another client ( or just another shell on 
>> the same client) *client B* :
>>
>> cat input_file >> output_file
>>
>>
>>  ( in other words we are trying to write to the same destination) all 
>> the waiters gets up until one node get expelled.
>>
>>
>> Now, while its understandable that the destination file is locked for 
>> one of the "cat", so have to wait ( and since the file is BIG , have 
>> to wait for a while), its not understandable why it stop the renewal 
>> lease.
>> Why its doen't return just a timeout error on the copy instead to 
>> expel the node? We can reproduce this every time, and since our users 
>> to operations like this on files over 100GB each you can imagine the 
>> result.
>>
>>
>>
>> As you can imagine even if its a bit silly to write at the same time 
>> to the same destination, its also quite common if we want to dump to 
>> a log file logs and for some reason one of the writers, write for a 
>> lot of time keeping the file locked.
>> Our expels are not due to network congestion, but because a write 
>> attempts have to wait another one. What i really dont understand is 
>> why to take a so expreme mesure to expell jest because a process is 
>> waiteing "to too much time".
>>
>>
>> I have ticket opened to IBM for this and the issue is under 
>> investigation, but no luck so far..
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>> Hi there,
>>>
>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>>> better term now GSS is out) and seen ping 'working', but alongside 
>>> ejections from the cluster.
>>> The GPFS internode 'ping' is somewhat more circumspect than unix 
>>> ping - and rightly so.
>>>
>>> In my experience this has _always_ been a network issue of one sort 
>>> of another.  If the network is experiencing issues, nodes will be 
>>> ejected.
>>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>>> seen that only twice in 10 years over many versions of GPFS.
>>>
>>> You need to follow the logs through from each machine in time order 
>>> to determine who could not see who and in what order.
>>> Your best way forward is to log a SEV2 case with IBM support, 
>>> directly or via your OEM and collect and supply a snap and traces as 
>>> required by support.
>>>
>>> Without knowing your full setup, it's hard to help further.
>>>
>>> Jez
>>>
>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>> Still problems. Here some more detailed examples:
>>>>
>>>> *EXAMPLE 1:*
>>>>
>>>>             *EBI5-220**( CLIENT)**
>>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>>             reply from node <GSS02B IP> gss02b*
>>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>>             IP> (gss02a in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) to
>>>>             expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:04.982 2014: This node will be
>>>>             expelled from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220
>>>>             IP> (ebi5-220)
>>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager
>>>>             connection broke. Probing cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.
>>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>>             in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) is now the
>>>>             Group Leader.
>>>>
>>>>             *GSS02B ( NSD SERVER)*
>>>>             ...
>>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>>             ...
>>>>
>>>>             *GSS02a ( NSD SERVER)*
>>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP>
>>>>             (gss02b) request from <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>).
>>>>             Expelling: <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>)
>>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>>
>>>>
>>>> ===============================================
>>>> *EXAMPLE 2*:
>>>>
>>>>             *EBI5-038*
>>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period
>>>>             expired in cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>. Attempting to reacquire lease.*
>>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue.
>>>>             Probing cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>*
>>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>>             Attempting reconnect.
>>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             ...
>>>>             LOT MORE RESETS BY PEER
>>>>             ...
>>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1n2>
>>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>>             <GSS02A IP> gss02a <c1n2>
>>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.*
>>>>
>>>>             *GSS02a*
>>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk
>>>>             <http://ebi-cluster.ebi.ac.uk>) *is being expelled
>>>>             because of an expired lease.* Pings sent: 60. Replies
>>>>             received: 60.
>>>>
>>>>
>>>>
>>>>
>>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>>> the servers seems working fine.. how can i trace better ( to solve) 
>>>> the problem?
>>>>
>>>> In example 2 it seems to me that for some reason the manager are 
>>>> not renewing the lease in time. when this happens , its not a 
>>>> single client.
>>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>>> how can i trace to the source of the problem?
>>>>
>>>>
>>>>
>>>> Thanks in advance for any tips.
>>>>
>>>> Regards,
>>>> Salvatore
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/bf1a6c40/attachment-0002.htm>

From service at metamodul.com  Thu Aug 21 14:19:33 2014
From: service at metamodul.com (service at metamodul.com)
Date: Thu, 21 Aug 2014 15:19:33 +0200 (CEST)
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org> <53F5B62F.1060305@ebi.ac.uk>
Message-ID: <1481989063.92260.1408627173332.open-xchange@oxbaltgw09.schlund.de>

> Now, while its understandable that the destination file is locked for one of
> the "cat", so have to wait

If GPFS is posix compatible i do not understand why a cat should block the other
cat completly meanings on a standard FS you can "cat" from many source to the
same target. Of course the result is not predictable.

>From this point of view i would expect that both "cat" would start writing
immediately thus i would expect a GPFS bug.

All imho.
Hajo

Note: You might test which the input_file in a different directory and i would
test the behaviour if the output_file is on a local FS like /tmp.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/f02dd903/attachment-0002.htm>

From viccornell at gmail.com  Thu Aug 21 14:22:22 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Thu, 21 Aug 2014 14:22:22 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5F19B.1010603@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>
	<53F5B62F.1060305@ebi.ac.uk>
	<9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
	<53F5F19B.1010603@ebi.ac.uk>
Message-ID: <0F03996A-2008-4076-9A2B-B4B2BB89E959@gmail.com>

For my system I always use a dedicated admin network - as described in the gpfs manuals - for a gpfs cluster on 10/40GbE where the system will be heavily loaded.

The difference in the stability of the system is very noticeable.

Not sure how/if this would work on GSS - IBM ought to know :-)

Vic


On 21 Aug 2014, at 14:18, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:

> This is an interesting point!
> 
> We use ethernet ( 10g links on the clients) but we dont have a separate network for the admin network. 
> 
> Could you explain this a bit further, because the clients and the servers we have are on different subnet so the packet are routed.. I don't see a practical way to separate them. The clients are blades in a chassis so even if i create 2 interfaces, they will physically use the came "cable" to go to the first switch. even the clients ( 600 clients) have different subsets.
> 
> I will forward this consideration to our network admin , so see if we can work on a dedicated network.
> 
> thanks for your tip.
> 
> Regards,
> Salvatore
> 
> 
> 
> 
> On 21/08/14 14:03, Vic Cornell wrote:
>> Hi Salvatore,
>> 
>> Are you using ethernet or infiniband as the GPFS interconnect to your clients?
>> 
>> If 10/40GbE - do you have a separate admin network?
>> 
>> I have seen behaviour similar to this where the storage traffic causes congestion and the "admin" traffic gets lost or delayed causing expels.
>> 
>> Vic
>> 
>> 
>> 
>> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:
>> 
>>> Thanks for the feedback, but we managed to find a scenario that excludes network problems.
>>> 
>>> we have a file called input_file of nearly 100GB:
>>> 
>>> if from client A we do:
>>> 
>>> cat input_file >> output_file
>>> 
>>> it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...
>>> 
>>> 
>>> if now we do the same from another client ( or just another shell on the same client) client B :
>>> 
>>> cat input_file >> output_file
>>> 
>>> 
>>>  ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.
>>> 
>>> 
>>> Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease. 
>>> Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.
>>> 
>>> 
>>> 
>>> As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
>>> Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".
>>> 
>>> 
>>> I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..
>>> 
>>> Regards,
>>> Salvatore
>>> 
>>> 
>>> 
>>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>>> Hi there,
>>>> 
>>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
>>>> The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.
>>>> 
>>>> In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
>>>> Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.
>>>> 
>>>> You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
>>>> Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.
>>>> 
>>>> Without knowing your full setup, it's hard to help further.
>>>> 
>>>> Jez
>>>> 
>>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>>> Still problems. Here some more detailed examples:
>>>>> 
>>>>> EXAMPLE 1:
>>>>> EBI5-220 ( CLIENT)
>>>>> Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
>>>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
>>>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
>>>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
>>>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.
>>>>> 
>>>>> GSS02B ( NSD SERVER)
>>>>> ...
>>>>> Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>>>> ...
>>>>> 
>>>>> GSS02a ( NSD SERVER)
>>>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>>>> 
>>>>> 
>>>>> ===============================================
>>>>> EXAMPLE 2:
>>>>> 
>>>>> EBI5-038
>>>>> Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
>>>>> Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
>>>>> ...
>>>>> LOT MORE RESETS BY PEER
>>>>> ...
>>>>> Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
>>>>> Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>>>> Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
>>>>> Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
>>>>> Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>>>> 
>>>>> GSS02a
>>>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.
>>>>> 
>>>>> 
>>>>> 
>>>>> In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem? 
>>>>> 
>>>>> In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client. 
>>>>> Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks in advance for any tips.
>>>>> 
>>>>> Regards,
>>>>> Salvatore
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at gpfsug.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at gpfsug.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/a46c2c76/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Fri Aug 22 10:37:42 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 22 Aug 2014 10:37:42 +0100
Subject: [gpfsug-discuss] gpfs client expels, fs hangind and waiters
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F70F66.2010405@ebi.ac.uk>

Hello everyone,

Just to let you know, we found the cause of our problems.

We discovered that not all of the recommend kernel setting was 
configured on the clients ( on server was everything ok, but the clients 
had some setting  missing ), and
IBM support pointed to this document that describes perfectly our issues 
and the fix wich suggest to raise some parameters even higher than the 
standard "best practice" :


http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5091222


Thanks to everyone for the replies.


Regards,
Salvatore


From ewahl at osc.edu  Mon Aug 25 19:55:08 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Mon, 25 Aug 2014 18:55:08 +0000
Subject: [gpfsug-discuss] CNFS using NFS over RDMA?
Message-ID: <C59E5201836F7147BAD35189FFBB35D101164D42DF@USOAPP09V04P.si.lan>

Anyone out there doing CNFS with NFS over RDMA?  Is this even possible?

We currently have been delivering some CNFS services using TCP over IB, but that layer tends to have a large number of bugs all the time.  Like to take a look at moving back down to verbs...

Ed Wahl
OSC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140825/bd329ccd/attachment-0002.htm>

From zander at ebi.ac.uk  Fri Aug  1 14:44:49 2014
From: zander at ebi.ac.uk (Zander Mears)
Date: Fri, 01 Aug 2014 14:44:49 +0100
Subject: [gpfsug-discuss] Hello!
In-Reply-To: <53D981EF.3020000@gpfsug.org>
References: <53D8C897.9000902@ebi.ac.uk> <53D981EF.3020000@gpfsug.org>
Message-ID: <53DB99D1.8050304@ebi.ac.uk>

Hi Jez

We're just monitoring the standard OS stuff, some interface errors, 
throughput, number of network and gpfs connections due to previous 
issues. We don't really know as yet what is good to monitor GPFS wise.

cheers

Zander

On 31/07/2014 00:38, Jez Tucker (Chair) wrote:
> Hi Zander,
>
>    We have a git repository.  Would you be interested in adding any
> Zabbix custom metrics gathering to GPFS to it?
>
> https://github.com/gpfsug/gpfsug-tools
>
> Best,
>
> Jez


From sfadden at us.ibm.com  Tue Aug  5 18:55:20 2014
From: sfadden at us.ibm.com (Scott Fadden)
Date: Tue, 5 Aug 2014 10:55:20 -0700
Subject: [gpfsug-discuss] GPFS and Lustre on same node
Message-ID: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>


Is anyone running GPFS and Lustre on the same nodes. I have seen it work, I
have heard people are doing it, I am looking for some confirmation.

Thanks

Scott Fadden
GPFS Technical Marketing
Phone: (503) 880-5833
sfadden at us.ibm.com
http://www.ibm.com/systems/gpfs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140805/a2400b55/attachment-0003.htm>

From u.sibiller at science-computing.de  Wed Aug  6 08:46:31 2014
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Wed, 06 Aug 2014 09:46:31 +0200
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
Message-ID: <53E1DD57.90103@science-computing.de>

Am 05.08.2014 19:55, schrieb Scott Fadden:
> Is anyone running GPFS and Lustre on the same nodes. I have seen it work, I have heard people are
> doing it, I am looking for some confirmation.

I have some nodes running lustre 2.1.6 or 2.5.58 and gpfs 3.5.0.17 on RHEL5.8 and RHEL6.5. None of 
them are servers.

Kind regards,

Ulrich Sibiller

-- 
______________________________________creating IT solutions
Dipl.-Inf. Ulrich Sibiller           science + computing ag
System Administration                    Hagellocher Weg 73
mail nfz at science-computing.de      72070 Tuebingen, Germany
hotline +49 7071 9457 674   http://www.science-computing.de
-- 
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs, Dr. Arno Steitz
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From frederik.ferner at diamond.ac.uk  Wed Aug  6 10:19:35 2014
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Wed, 6 Aug 2014 10:19:35 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
Message-ID: <53E1F327.1000605@diamond.ac.uk>

On 05/08/14 18:55, Scott Fadden wrote:
> Is anyone running GPFS and Lustre on the same nodes. I have seen it
> work, I have heard people are doing it, I am looking for some confirmation.

Most of our compute cluster nodes are clients for Lustre and GPFS at the 
same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on servers 
(GPFS NSD server or Lustre OSS/MDS servers).

HTH,
Frederik

-- 
Frederik Ferner
Senior Computer Systems Administrator   phone: +44 1235 77 8624
Diamond Light Source Ltd.               mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
 

From sdinardo at ebi.ac.uk  Wed Aug  6 10:57:44 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 06 Aug 2014 10:57:44 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E1F327.1000605@diamond.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
	<53E1F327.1000605@diamond.ac.uk>
Message-ID: <53E1FC18.6080707@ebi.ac.uk>

Sorry for this little ot, but recetly i'm looking to Lustre to 
understand how it is comparable to GPFS in terms of performance, 
reliability and easy to use.
Could anyone share their experience ?

My company just recently got a first GPFS system , based on IBM GSS, but 
while its good performance wise, there are few unresolved problems and 
the IBM support is almost unexistent, so I'm starting to wonder if its 
work to look somewhere else  eventual future purchases.


Salvatore

On 06/08/14 10:19, Frederik Ferner wrote:
> On 05/08/14 18:55, Scott Fadden wrote:
>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>> work, I have heard people are doing it, I am looking for some 
>> confirmation.
>
> Most of our compute cluster nodes are clients for Lustre and GPFS at 
> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
> servers (GPFS NSD server or Lustre OSS/MDS servers).
>
> HTH,
> Frederik
>


From chair at gpfsug.org  Wed Aug  6 11:19:24 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Wed, 06 Aug 2014 11:19:24 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E1FC18.6080707@ebi.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>
Message-ID: <53E2012C.9040402@gpfsug.org>

"IBM support is almost unexistent"

I don't find that at all.
Do you log directly via ESC or via your OEM/integrator or are you only 
referring to GSS support rather than pure GPFS?

If you are having response issues, your IBM rep (or a few folks on here) 
can accelerate issues for you.

Jez


On 06/08/14 10:57, Salvatore Di Nardo wrote:
> Sorry for this little ot, but recetly i'm looking to Lustre to 
> understand how it is comparable to GPFS in terms of performance, 
> reliability and easy to use.
> Could anyone share their experience ?
>
> My company just recently got a first GPFS system , based on IBM GSS, 
> but while its good performance wise, there are few unresolved problems 
> and the IBM support is almost unexistent, so I'm starting to wonder if 
> its work to look somewhere else  eventual future purchases.
>
>
> Salvatore
>
> On 06/08/14 10:19, Frederik Ferner wrote:
>> On 05/08/14 18:55, Scott Fadden wrote:
>>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>>> work, I have heard people are doing it, I am looking for some 
>>> confirmation.
>>
>> Most of our compute cluster nodes are clients for Lustre and GPFS at 
>> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
>> servers (GPFS NSD server or Lustre OSS/MDS servers).
>>
>> HTH,
>> Frederik
>>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From service at metamodul.com  Wed Aug  6 14:26:47 2014
From: service at metamodul.com (service at metamodul.com)
Date: Wed, 6 Aug 2014 15:26:47 +0200 (CEST)
Subject: [gpfsug-discuss] Hi , i am new to this list
Message-ID: <1366482624.222989.1407331607965.open-xchange@oxbaltgw55.schlund.de>

Hi @ALL
i am Hajo Ehlers , an AIX and GPFS specialist ( Unix System Engineer ). You find
me at the IBM GPFS Forum and sometimes at news:c.u.a  and I am addicted to
cluster filesystems

My latest idee is an SAP-HANA light system ( DBMS on an in-memory cluster posix
FS ) which could be extended to a "reinvented" Cluster based AS/400 ^_^
I wrote also a small script to do a sequential backup of GPFS filesystems since
i got never used to mmbackup - i named it "pdsmc" for parallel dsmc".


Cheers
Hajo

BTW: Please let me know - service (at) metamodul (dot) com - In case somebody is
looking for a GPFS specialist.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140806/3c01d53a/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Fri Aug  8 10:53:36 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 08 Aug 2014 10:53:36 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E2012C.9040402@gpfsug.org>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>	<53E1FC18.6080707@ebi.ac.uk>
	<53E2012C.9040402@gpfsug.org>
Message-ID: <53E49E20.1090905@ebi.ac.uk>

Well, i didn't wanted to start a rant against IBM, and I'm referring 
specifically to GSS.

Since GSS its an appliance, we have to refer to GSS support for both 
hardware and software issues. Hardware support in total crap. It took 1 
mounth of chasing and shouting to get a drawer replacement that was 
causing some issues. Meanwhile 10 disks in that drawer got faulty. 
Finally we got the drawer replace but the disks are still faulty. Now 
its 3 days i'm triing to get them fixed or replaced ( its not clear if 
they disks are broken of they was just marked to be replaced because of 
the drawer). Right now i dont have any answer about how to put them 
online ( mmchcarrier don't work because it recognize that the disk where 
not replaced)

There are also few other cases ( gpfs related)  open that are still not 
answered. I have no experience with direct GPFS support, but if i open a 
case to GSS for a GPFS problem, the cases seems never get an answer.

The only reason that GSS is working its because _*I*_**installed it 
spending few months studying gpfs. So now I'm wondering if its worth at 
all rely in future on the whole appliance concept.

I'm wondering if in future its better just purchase the hardware and 
install GPFS by our own, or in alternatively even try Lustre.


Now, skipping all this GSS rant, which have nothing to do with the file 
system anyway  and  going back to my question:

Could someone point the main differences between GPFS and Lustre?

I found some documentation about Lustre and i'm going to have a look, 
but oddly enough have not found any practical comparison between them.


On 06/08/14 11:19, Jez Tucker (Chair) wrote:
> "IBM support is almost unexistent"
>
> I don't find that at all.
> Do you log directly via ESC or via your OEM/integrator or are you only 
> referring to GSS support rather than pure GPFS?
>
> If you are having response issues, your IBM rep (or a few folks on 
> here) can accelerate issues for you.
>
> Jez
>
>
> On 06/08/14 10:57, Salvatore Di Nardo wrote:
>> Sorry for this little ot, but recetly i'm looking to Lustre to 
>> understand how it is comparable to GPFS in terms of performance, 
>> reliability and easy to use.
>> Could anyone share their experience ?
>>
>> My company just recently got a first GPFS system , based on IBM GSS, 
>> but while its good performance wise, there are few unresolved 
>> problems and the IBM support is almost unexistent, so I'm starting to 
>> wonder if its work to look somewhere else eventual future purchases.
>>
>>
>> Salvatore
>>
>> On 06/08/14 10:19, Frederik Ferner wrote:
>>> On 05/08/14 18:55, Scott Fadden wrote:
>>>> Is anyone running GPFS and Lustre on the same nodes. I have seen it
>>>> work, I have heard people are doing it, I am looking for some 
>>>> confirmation.
>>>
>>> Most of our compute cluster nodes are clients for Lustre and GPFS at 
>>> the same time. Lustre 1.8.9-wc1 and GPFS 3.5.0.11. Nothing shared on 
>>> servers (GPFS NSD server or Lustre OSS/MDS servers).
>>>
>>> HTH,
>>> Frederik
>>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/04e3e4ae/attachment-0003.htm>

From jpro at bas.ac.uk  Fri Aug  8 12:40:00 2014
From: jpro at bas.ac.uk (Jeremy Robst)
Date: Fri, 8 Aug 2014 12:40:00 +0100 (BST)
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E49E20.1090905@ebi.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>
	<53E1F327.1000605@diamond.ac.uk> <53E1FC18.6080707@ebi.ac.uk>
	<53E2012C.9040402@gpfsug.org> <53E49E20.1090905@ebi.ac.uk>
Message-ID: <alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>

On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:

> Now, skipping all this GSS rant, which have nothing to do with the file
> system anyway? and? going back to my question:
> 
> Could someone point the main differences between GPFS and Lustre?

I'm looking at making the same decision here - to buy GPFS or to roll our 
own Lustre configuration. I'm in the process of setting up test systems, 
and so far the main difference seems to be in the that in GPFS each server 
sees the full filesystem, and so you can run other applications (e.g 
backup) on a GPFS server whereas the Luste OSS (object storage servers) 
see only a portion of the storage (the filesystem is striped across the 
OSSes), so you need a Lustre client to mount the full filesystem for 
things like backup.

However I have very little practical experience of either and would also 
be interested in any comments.

Thanks

Jeremy
-- 
jpro at bas.ac.uk | (work) 01223 221402 (fax) 01223 362616
Unix System Administrator - British Antarctic Survey
#include <disclaimer.std>

From keith at ocf.co.uk  Fri Aug  8 14:12:39 2014
From: keith at ocf.co.uk (Keith Vickers)
Date: Fri, 8 Aug 2014 14:12:39 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
Message-ID: <A42128435E851644B9B011BB824F6C816F56CAF8F0@MAIL.ocf.local>

http://www.pdsw.org/pdsw10/resources/posters/parallelNASFSs.pdf

Has a good direct apples to apples comparison between Lustre and GPFS. It's pretty much abstractable from the hardware used.

Keith Vickers
Business Development Manager
OCF plc
Mobile: 07974 397863


From sergi.more at bsc.es  Fri Aug  8 14:14:33 2014
From: sergi.more at bsc.es (=?ISO-8859-1?Q?Sergi_Mor=E9_Codina?=)
Date: Fri, 08 Aug 2014 15:14:33 +0200
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>
	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
Message-ID: <53E4CD39.7080808@bsc.es>

Hi all,

About main differences between GPFS and Lustre, here you have some bits 
from our experience:

-Reliability: GPFS its been proved to be more stable and reliable. Also 
offers more flexibility in terms of fail-over. It have no restriction in 
number of servers. As far as I know, an NSD can have as many secondary 
servers as you want (we are using 8).

-Metadata: In Lustre each file system is restricted to two servers. No 
restriction in GPFS.

-Updates: In GPFS you can update the whole storage cluster without 
stopping production, one server at a time.

-Server/Client role: As Jeremy said, in GPFS every server act as a 
client as well. Useful for administrative tasks.

-Troubleshooting: Problems with GPFS are easier to track down. Logs are 
more clear, and offers better tools than Lustre.

-Support: No problems at all with GPFS support. It is true that it could 
take time to go up within all support levels, but we always got a good 
solution. Quite different in terms of hardware. IBM support quality has 
drop a lot since about last year an a half. Really slow and tedious 
process to get replacements. Moreover, we keep receiving bad "certified 
reutilitzed parts" hardware, which slow the whole process even more.


These are the main differences I would stand out after some years of 
experience with both file systems, but do not take it as a fact.

PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI 
a couple of months ago, and has experience working with both file 
systems here at BSC.

Best Regards,
Sergi.


On 08/08/2014 01:40 PM, Jeremy Robst wrote:
> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>
>> Now, skipping all this GSS rant, which have nothing to do with the file
>> system anyway  and  going back to my question:
>>
>> Could someone point the main differences between GPFS and Lustre?
>
> I'm looking at making the same decision here - to buy GPFS or to roll
> our own Lustre configuration. I'm in the process of setting up test
> systems, and so far the main difference seems to be in the that in GPFS
> each server sees the full filesystem, and so you can run other
> applications (e.g backup) on a GPFS server whereas the Luste OSS (object
> storage servers) see only a portion of the storage (the filesystem is
> striped across the OSSes), so you need a Lustre client to mount the full
> filesystem for things like backup.
>
> However I have very little practical experience of either and would also
> be interested in any comments.
>
> Thanks
>
> Jeremy
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 

------------------------------------------------------------------------

      Sergi More Codina
      Barcelona Supercomputing Center
      Centro Nacional de Supercomputacion
      WWW: http://www.bsc.es      Tel: +34-93-405 42 27
      e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21

------------------------------------------------------------------------

WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3242 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/ccba0783/attachment-0003.bin>

From viccornell at gmail.com  Fri Aug  8 18:15:30 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Fri, 8 Aug 2014 18:15:30 +0100
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <53E4CD39.7080808@bsc.es>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>
	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>
	<53E4CD39.7080808@bsc.es>
Message-ID: <4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>

Disclaimers - I work for DDN - we sell lustre and GPFS. I know GPFS much better than I know Lustre.

The biggest difference we find between GPFS and Lustre is that GPFS - can usually achieve 90% of the bandwidth available to a single client with a single thread.

Lustre needs multiple parallel streams to saturate - say an Infiniband connection.

Lustre is often faster than GPFS and often has superior metadata performance - particularly where lots of files are created in a single directory.

GPFS can support Windows - Lustre cannot. I think GPFS is better integrated and easier to deploy than Lustre - some people disagree with me.

Regards,

Vic


On 8 Aug 2014, at 14:14, Sergi Mor? Codina <sergi.more at bsc.es> wrote:

> Hi all,
> 
> About main differences between GPFS and Lustre, here you have some bits from our experience:
> 
> -Reliability: GPFS its been proved to be more stable and reliable. Also offers more flexibility in terms of fail-over. It have no restriction in number of servers. As far as I know, an NSD can have as many secondary servers as you want (we are using 8).
> 
> -Metadata: In Lustre each file system is restricted to two servers. No restriction in GPFS.
> 
> -Updates: In GPFS you can update the whole storage cluster without stopping production, one server at a time.
> 
> -Server/Client role: As Jeremy said, in GPFS every server act as a client as well. Useful for administrative tasks.
> 
> -Troubleshooting: Problems with GPFS are easier to track down. Logs are more clear, and offers better tools than Lustre.
> 
> -Support: No problems at all with GPFS support. It is true that it could take time to go up within all support levels, but we always got a good solution. Quite different in terms of hardware. IBM support quality has drop a lot since about last year an a half. Really slow and tedious process to get replacements. Moreover, we keep receiving bad "certified reutilitzed parts" hardware, which slow the whole process even more.
> 
> 
> These are the main differences I would stand out after some years of experience with both file systems, but do not take it as a fact.
> 
> PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI a couple of months ago, and has experience working with both file systems here at BSC.
> 
> Best Regards,
> Sergi.
> 
> 
> On 08/08/2014 01:40 PM, Jeremy Robst wrote:
>> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>> 
>>> Now, skipping all this GSS rant, which have nothing to do with the file
>>> system anyway  and  going back to my question:
>>> 
>>> Could someone point the main differences between GPFS and Lustre?
>> 
>> I'm looking at making the same decision here - to buy GPFS or to roll
>> our own Lustre configuration. I'm in the process of setting up test
>> systems, and so far the main difference seems to be in the that in GPFS
>> each server sees the full filesystem, and so you can run other
>> applications (e.g backup) on a GPFS server whereas the Luste OSS (object
>> storage servers) see only a portion of the storage (the filesystem is
>> striped across the OSSes), so you need a Lustre client to mount the full
>> filesystem for things like backup.
>> 
>> However I have very little practical experience of either and would also
>> be interested in any comments.
>> 
>> Thanks
>> 
>> Jeremy
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
> 
> 
> -- 
> 
> ------------------------------------------------------------------------
> 
>     Sergi More Codina
>     Barcelona Supercomputing Center
>     Centro Nacional de Supercomputacion
>     WWW: http://www.bsc.es      Tel: +34-93-405 42 27
>     e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21
> 
> ------------------------------------------------------------------------
> 
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
> 
> http://www.bsc.es/disclaimer.htm
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at us.ibm.com  Fri Aug  8 20:09:44 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Fri, 8 Aug 2014 12:09:44 -0700
Subject: [gpfsug-discuss] GPFS and Lustre on same node
In-Reply-To: <4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>
References: <OFC01A4F71.1640384F-ON88257D2B.00625CE5-88257D2B.00627367@us.ibm.com>	<53E1F327.1000605@diamond.ac.uk>
	<53E1FC18.6080707@ebi.ac.uk>	<53E2012C.9040402@gpfsug.org>	<53E49E20.1090905@ebi.ac.uk>
	<alpine.DEB.2.00.1408081236140.930@jpro.nerc-bas.ac.uk>	<53E4CD39.7080808@bsc.es>
	<4001D2D9-5E74-4EF9-908F-5B0E3443EA5B@gmail.com>
Message-ID: <OFA962BCA7.ED55EAB6-ON88257D2E.00665F04-88257D2E.00694311@us.ibm.com>

Vic, Sergi,

you can not compare Lustre and GPFS without providing a clear usecase as 
otherwise you compare apple with oranges. 
the reason for this is quite simple, Lustre plays well in pretty much one 
usecase - HPC, GPFS on the other hand is used in many forms of deployments 
from Storage for Virtual Machines, HPC, Scale-Out NAS, Solutions in 
digital media, to hosting some of the biggest, most business critical 
Transactional database installations in the world. you look at 2 products 
with completely different usability spectrum, functions and features 
unless as said above you narrow it down to a very specific usecase with a 
lot of details.
even just HPC has a very large spectrum and not everybody is working in a 
single directory, which is the main scale point for Lustre compared to 
GPFS and the reason is obvious, if you have only 1 active metadata server 
(which is what 99% of all lustre systems run) some operations like single 
directory contention is simpler to make fast, but only up to the limit of 
your one node, but what happens when you need to go beyond that and only a 
real distributed architecture can support your workload ? 
for example look at most chip design workloads, which is a form of HPC, it 
is something thats extremely metadata and small file dominated, you talk 
about 100's of millions (in some cases even billions) of files, majority 
of them <4k, the rest larger files , majority of it with random access 
patterns that benefit from massive client side caching and distributed 
data coherency models supported by GPFS token manager infrastructure 
across 10's or 100's of metadata server and 1000's of compute nodes. 
you also need to look at the rich feature set GPFS provides, which not all 
may be important for some environments but are for others like Snapshot, 
Clones, Hierarchical Storage Management (ILM) , Local Cache acceleration 
(LROC), Global Namespace Wan Integration (AFM), Encryption, etc just to 
name a few. 

Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Vic Cornell <viccornell at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   08/08/2014 10:16 AM
Subject:        Re: [gpfsug-discuss] GPFS and Lustre on same node
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Disclaimers - I work for DDN - we sell lustre and GPFS. I know GPFS much 
better than I know Lustre.

The biggest difference we find between GPFS and Lustre is that GPFS - can 
usually achieve 90% of the bandwidth available to a single client with a 
single thread.

Lustre needs multiple parallel streams to saturate - say an Infiniband 
connection.

Lustre is often faster than GPFS and often has superior metadata 
performance - particularly where lots of files are created in a single 
directory.

GPFS can support Windows - Lustre cannot. I think GPFS is better 
integrated and easier to deploy than Lustre - some people disagree with 
me.

Regards,

Vic


On 8 Aug 2014, at 14:14, Sergi Mor? Codina <sergi.more at bsc.es> wrote:

> Hi all,
> 
> About main differences between GPFS and Lustre, here you have some bits 
from our experience:
> 
> -Reliability: GPFS its been proved to be more stable and reliable. Also 
offers more flexibility in terms of fail-over. It have no restriction in 
number of servers. As far as I know, an NSD can have as many secondary 
servers as you want (we are using 8).
> 
> -Metadata: In Lustre each file system is restricted to two servers. No 
restriction in GPFS.
> 
> -Updates: In GPFS you can update the whole storage cluster without 
stopping production, one server at a time.
> 
> -Server/Client role: As Jeremy said, in GPFS every server act as a 
client as well. Useful for administrative tasks.
> 
> -Troubleshooting: Problems with GPFS are easier to track down. Logs are 
more clear, and offers better tools than Lustre.
> 
> -Support: No problems at all with GPFS support. It is true that it could 
take time to go up within all support levels, but we always got a good 
solution. Quite different in terms of hardware. IBM support quality has 
drop a lot since about last year an a half. Really slow and tedious 
process to get replacements. Moreover, we keep receiving bad "certified 
reutilitzed parts" hardware, which slow the whole process even more.
> 
> 
> These are the main differences I would stand out after some years of 
experience with both file systems, but do not take it as a fact.
> 
> PD: Salvatore, I would suggest you to contact Jordi Valls. He joined EBI 
a couple of months ago, and has experience working with both file systems 
here at BSC.
> 
> Best Regards,
> Sergi.
> 
> 
> On 08/08/2014 01:40 PM, Jeremy Robst wrote:
>> On Fri, 8 Aug 2014, Salvatore Di Nardo wrote:
>> 
>>> Now, skipping all this GSS rant, which have nothing to do with the 
file
>>> system anyway  and  going back to my question:
>>> 
>>> Could someone point the main differences between GPFS and Lustre?
>> 
>> I'm looking at making the same decision here - to buy GPFS or to roll
>> our own Lustre configuration. I'm in the process of setting up test
>> systems, and so far the main difference seems to be in the that in GPFS
>> each server sees the full filesystem, and so you can run other
>> applications (e.g backup) on a GPFS server whereas the Luste OSS 
(object
>> storage servers) see only a portion of the storage (the filesystem is
>> striped across the OSSes), so you need a Lustre client to mount the 
full
>> filesystem for things like backup.
>> 
>> However I have very little practical experience of either and would 
also
>> be interested in any comments.
>> 
>> Thanks
>> 
>> Jeremy
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
> 
> 
> -- 
> 
> ------------------------------------------------------------------------
> 
>     Sergi More Codina
>     Barcelona Supercomputing Center
>     Centro Nacional de Supercomputacion
>     WWW: http://www.bsc.es      Tel: +34-93-405 42 27
>     e-mail: sergi.more at bsc.es   Fax: +34-93-413 77 21
> 
> ------------------------------------------------------------------------
> 
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
> 
> http://www.bsc.es/disclaimer.htm
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140808/f4de4ccd/attachment-0003.htm>

From kraemerf at de.ibm.com  Sat Aug  9 15:03:02 2014
From: kraemerf at de.ibm.com (Frank Kraemer)
Date: Sat, 9 Aug 2014 16:03:02 +0200
Subject: [gpfsug-discuss] GPFS and Lustre
In-Reply-To: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>
References: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>
Message-ID: <OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>

Vic, Sergi,

from my point of view for real High-End workloads the complete I/O stack
needs to be fine tuned and well understood in order to provide a good
system to the users.

- Application(s)
+ I/O Lib(s)
+ MPI
+ Parallel Filesystem (e.g. GPFS)
+ Hardware (Networks, Servers, Disks, etc.)

One of the best solutions to bring your application very efficently to work
with a Parallel FS is Sionlib from FZ Juelich:

Sionlib is a scalable I/O library for the parallel access to task-local
files. The library not only supports writing and reading binary data to or
from from several thousands of processors into a single or a small number
of physical files but also provides for global open and close functions to
access SIONlib file in parallel. SIONlib provides different interfaces:
parallel access using MPI, OpenMp, or their combination and sequential
access for post-processing utilities.

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/SIONlib/_node.html
http://apps.fz-juelich.de/jsc/sionlib/html/sionlib_tutorial_2013.pdf

-frank-

P.S. Nice blog from Nils

https://www.ibm.com/developerworks/community/blogs/storageneers/entry/scale_out_backup_with_tsm_and_gss_performance_test_results?lang=en

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany


From ewahl at osc.edu  Mon Aug 11 14:55:48 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Mon, 11 Aug 2014 13:55:48 +0000
Subject: [gpfsug-discuss] GPFS and Lustre
In-Reply-To: <OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>
References: <mailman.1.1407582001.11178.gpfsug-discuss@gpfsug.org>,
	<OFC2F325C9.AFD50BE9-ONC1257D2F.004B4186-C1257D2F.004D2EDD@de.ibm.com>
Message-ID: <C59E5201836F7147BAD35189FFBB35D101164CE365@USOAPP09V04P.si.lan>

In a similar vein, IBM has an application transparent "File Cache Library" as well.  I believe it IS licensed and the only requirement is that it is for use on IBM hardware only.  Saw some presentations that mention it in some BioSci talks @SC13 and the numbers for a couple of selected small read applications were awesome. 

I probably have the contact info for it around here somewhere.  In addition to the pdf/user manual.

Ed Wahl
Ohio Supercomputer Center

________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Frank Kraemer [kraemerf at de.ibm.com]
Sent: Saturday, August 09, 2014 10:03 AM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] GPFS and Lustre

Vic, Sergi,

from my point of view for real High-End workloads the complete I/O stack
needs to be fine tuned and well understood in order to provide a good
system to the users.

- Application(s)
+ I/O Lib(s)
+ MPI
+ Parallel Filesystem (e.g. GPFS)
+ Hardware (Networks, Servers, Disks, etc.)

One of the best solutions to bring your application very efficently to work
with a Parallel FS is Sionlib from FZ Juelich:

Sionlib is a scalable I/O library for the parallel access to task-local
files. The library not only supports writing and reading binary data to or
from from several thousands of processors into a single or a small number
of physical files but also provides for global open and close functions to
access SIONlib file in parallel. SIONlib provides different interfaces:
parallel access using MPI, OpenMp, or their combination and sequential
access for post-processing utilities.

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/SIONlib/_node.html
http://apps.fz-juelich.de/jsc/sionlib/html/sionlib_tutorial_2013.pdf

-frank-

P.S. Nice blog from Nils

https://www.ibm.com/developerworks/community/blogs/storageneers/entry/scale_out_backup_with_tsm_and_gss_performance_test_results?lang=en

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From sabujp at gmail.com  Tue Aug 12 23:16:22 2014
From: sabujp at gmail.com (Sabuj Pattanayek)
Date: Tue, 12 Aug 2014 17:16:22 -0500
Subject: [gpfsug-discuss] reduce cnfs failover time to a few seconds
Message-ID: <CAEeMGHvSCrCW-3i6_+xQK5A+6P7wfj_4gOia8iWyyQwe0KA-tQ@mail.gmail.com>

Hi all,

Is there anyway to reduce CNFS failover time to just a few seconds?
Currently it seems like it's taking 5 - 10 minutes. We're using virtual
ip's, i.e. interface bond1.1550:0 has one of the cnfs vips, so it should
be fast, but it takes a long time and sometimes causes processes to
crash due to NFS timeouts (some have 600 second soft mount timeouts).
We've also noticed that it sometimes takes even longer unless the cnfs
system on which we're calling mmshutdown is completely shutdown and
isn't returning pings. Even 1 min seems too long.

For comparison, I'm running ctdb + samba on the other NSDs and it's
able to failover in a few seconds after mmshutdown completes.

Thanks,
Sabuj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140812/3495802f/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Fri Aug 15 14:31:29 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 15 Aug 2014 14:31:29 +0100
Subject: [gpfsug-discuss] gpfs client expels, fs hangind and waiters
Message-ID: <53EE0BB1.8000005@ebi.ac.uk>

Hello people,
Its quite a bit of time that i'm triing to solve a problem to our GPFS 
system, without much luck so i think its time to ask some help.

*First of a bit of introduction:**
*
Our GPFS system is made by 3xgss-26, In other words its made with 6x 
servers ( 4x10g links each) and several disk enclosures SAS attacked. 
The todal amount of spare its roughly 2PB, and the disks are SATA ( 
except few SSD dedicated to logtip ). My metadata and on dedicated 
vdisks, but both data and metadata vdiosks are in the same declustered 
arrays and recovery groups, so in the end they share the same spindles.

The clients its a LSF farm configured as another cluster ( standard 
multiclustering configuration) of  roughly 600 nodes .


*The issue:**
*
Recently we became aware that when some massive io request has been done 
we experience a lot of client expells. Heres an example of our logs:

        Fri Aug 15 12:40:24.680 2014: Expel 10.7.28.34 (gss03a) request
        from 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:41.652 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:45.754 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:52.305 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:17.069 2014: Expel 10.7.28.35 (gss03b) request
        from 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:23.555 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.258 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.540 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:42:57.288 2014: Expel 10.7.35.5 (ebi5-132 in
        ebi-cluster.ebi.ac.uk) request from 10.7.28.34 (gss03a).
        Expelling: 10.7.35.5 (ebi5-132 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:43:24.327 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:44:54.202 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:15:54.450 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:20:16.524 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:26:54.177 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:27:53.900 2014: Expel 10.7.28.3 (gss01b) request
        from 10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:28:24.297 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:29:23.913 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk)

at the same time we experience also long waiters queue (1000+ lines). An 
example in case of massive writes ( dd ) :

        0x7F522E1EEF90 waiting 1.861233182 seconds, NSDThread: on ThCond
        0x7F5158019B08 (0x7F5158019B08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.101 <c0n362>
        0x7F522E1EC9B0 waiting 1.490567470 seconds, NSDThread: on ThCond
        0x7F50F4038BA8 (0x7F50F4038BA8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1EB6C0 waiting 1.077098046 seconds, NSDThread: on ThCond
        0x7F50B40011F8 (0x7F50B40011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.156 <c0n647>
        0x7F522E1EA3D0 waiting 7.714968554 seconds, NSDThread: on ThCond
        0x7F50BC0078B8 (0x7F50BC0078B8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.107 <c0n455>
        0x7F522E1E90E0 waiting 4.774379417 seconds, NSDThread: on ThCond
        0x7F506801B1F8 (0x7F506801B1F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.23 <c0n493>
        0x7F522E1E7DF0 waiting 0.746172444 seconds, NSDThread: on ThCond
        0x7F5094007D78 (0x7F5094007D78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.84 <c0n18>
        0x7F522E1E6B00 waiting 1.553030487 seconds, NSDThread: on ThCond
        0x7F51C0004C78 (0x7F51C0004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.63 <c0n153>
        0x7F522E1E5810 waiting 2.165307633 seconds, NSDThread: on ThCond
        0x7F5178016A08 (0x7F5178016A08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.29 <c0n686>
        0x7F522E1E4520 waiting 1.128089273 seconds, NSDThread: on ThCond
        0x7F5074004D98 (0x7F5074004D98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1E3230 waiting 2.515214328 seconds, NSDThread: on ThCond
        0x7F51F400EF08 (0x7F51F400EF08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.90 <c0n494>
        0x7F522E1E1F40 waiting*162.966840834* seconds, NSDThread: on
        ThCond 0x7F51840207A8 (0x7F51840207A8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.34.97 <c0n6>
        0x7F522E1E0C50 waiting 1.140787288 seconds, NSDThread: on ThCond
        0x7F51AC005C08 (0x7F51AC005C08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.94 <c0n664>
        0x7F522E1DF960 waiting 41.907415248 seconds, NSDThread: on
        ThCond 0x7F5160019038 (0x7F5160019038) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1DE670 waiting 0.466560418 seconds, NSDThread: on ThCond
        0x7F513802B258 (0x7F513802B258) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.168 <c0n598>
        0x7F522E1DD380 waiting 3.102803621 seconds, NSDThread: on ThCond
        0x7F516C0106C8 (0x7F516C0106C8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.91 <c0n143>
        0x7F522E1DC090 waiting 2.751614295 seconds, NSDThread: on ThCond
        0x7F504C0011F8 (0x7F504C0011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.25 <c0n651>
        0x7F522E1DADA0 waiting 5.083691891 seconds, NSDThread: on ThCond
        0x7F507401BE88 (0x7F507401BE88) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1D9AB0 waiting 2.263374184 seconds, NSDThread: on ThCond
        0x7F5080003B98 (0x7F5080003B98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.36 <c0n225>
        0x7F522E1D87C0 waiting 0.206989639 seconds, NSDThread: on ThCond
        0x7F505801F0D8 (0x7F505801F0D8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.55 <c0n498>
        0x7F522E1D74D0 waiting *41.841279897* seconds, NSDThread: on
        ThCond 0x7F5194008B88 (0x7F5194008B88) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1D61E0 waiting 5.618652361 seconds, NSDThread: on ThCond
        0x1BAB868 (0x1BAB868) (MsgRecordCondvar), reason 'RPC wait' for
        getData on node 10.7.35.59 <c0n532>
        0x7F522E1D4EF0 waiting 6.185658427 seconds, NSDThread: on ThCond
        0x7F513802AAE8 (0x7F513802AAE8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.6 <c0n330>
        0x7F522E1D3C00 waiting 2.652370892 seconds, NSDThread: on ThCond
        0x7F5130004C78 (0x7F5130004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1D2910 waiting 11.396142225 seconds, NSDThread: on
        ThCond 0x7F51A401C0C8 (0x7F51A401C0C8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.169 <c0n549>
        0x7F522E1D1620 waiting 63.710723043 seconds, NSDThread: on
        ThCond 0x7F5038004D08 (0x7F5038004D08) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.37.120 <c0n8>


or for massive reads:

        0x7FBCE69A8C20 waiting 29.262629530 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE699CEC0 waiting 29.260869141 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE698C5A0 waiting 29.124824888 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6984110 waiting 22.729479654 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69512C0 waiting 29.272805926 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69409A0 waiting 28.833650198 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6924320 waiting 29.237067128 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6921D40 waiting 29.237953228 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6915FE0 waiting 29.046721161 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6913A00 waiting 29.264534710 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6900B00 waiting 29.267691105 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68F7380 waiting 29.266402464 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68D2870 waiting 29.276298231 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68BADB0 waiting 28.665700576 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68B61F0 waiting 29.236878611 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6885980 waiting *144*.530487248 seconds, NSDThread: on
        ThMutex 0x1803396A670 (0xFFFFC9003396A670) (DiskSchedulingMutex)
        0x7FBCE68833A0 waiting 29.231066610 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68820B0 waiting 29.269954514 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE686A5F0 waiting *140*.662994256 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE6864740 waiting 29.254180742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE683FC30 waiting 29.271840565 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE682E020 waiting 29.200969209 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6825B90 waiting 19.136732919 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6805C40 waiting 29.236055550 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FEAA0 waiting 29.283264161 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FC4C0 waiting 29.268992663 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67DFE40 waiting 29.150900786 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D2DF0 waiting 29.199058463 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D1B00 waiting 29.203199738 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67768D0 waiting 29.208231742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6768590 waiting 5.228192589 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE67672A0 waiting 29.252839376 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6757C70 waiting 28.869359044 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6748640 waiting 29.289284179 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6734450 waiting 29.253591817 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6730B80 waiting 29.289987273 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6720260 waiting 26.597589551 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66F32C0 waiting 29.177692849 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66E3C90 waiting 29.160268518 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66CC1D0 waiting 5.334330188 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE66B3420 waiting 34.274433161 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE668E910 waiting 27.699999488 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6689D50 waiting 34.279090465 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE66805D0 waiting 24.688626241 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6675B60 waiting 35.367745840 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE665E0A0 waiting 29.235994598 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE663CE60 waiting 29.162911979 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'


Another example with mmfsadm in case of massive reads:

        [root at gss02b ~]# mmfsadm dump waiters
        0x7F519000AEA0 waiting 28.915010347 seconds, replyCleanupThread:
        on ThCond 0x7F51101B27B8 (0x7F51101B27B8) (MsgRecordCondvar),
        reason 'RPC wait'
        0x7F511C012A10 waiting 279.522206863 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F52000095F8 (0x7F52000095F8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5120000B80 waiting 279.524782437 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5214000EE8 (0x7F5214000EE8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5154006310 waiting 138.164386224 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5174003F08 (0x7F5174003F08)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F522E1EB6C0 waiting 23.060703000 seconds, NSDThread: for poll
        on sock 85
        0x7F522E1E6B00 waiting 0.068456104 seconds, NSDThread: on ThCond
        0x7F50CC00E478 (0x7F50CC00E478) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1D0330 waiting 17.207907857 seconds, NSDThread: on
        ThCond 0x7F5078001688 (0x7F5078001688) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E1BFA10 waiting 0.181011711 seconds, NSDThread: on ThCond
        0x7F504000E558 (0x7F504000E558) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B4FA0 waiting 0.021780338 seconds, NSDThread: on ThCond
        0x7F522000E488 (0x7F522000E488) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B3CB0 waiting 0.794718000 seconds, NSDThread: for poll
        on sock 799
        0x7F522E186D10 waiting 0.191606803 seconds, NSDThread: on ThCond
        0x7F5184015D58 (0x7F5184015D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E184730 waiting 0.025562000 seconds, NSDThread: for poll
        on sock 867
        0x7F522E12CDD0 waiting 0.008921000 seconds, NSDThread: for poll
        on sock 543
        0x7F522E126F20 waiting 1.459531000 seconds, NSDThread: for poll
        on sock 983
        0x7F522E10F460 waiting 17.177936972 seconds, NSDThread: on
        ThCond 0x7F51EC002CE8 (0x7F51EC002CE8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E101120 waiting 17.232580316 seconds, NSDThread: on
        ThCond 0x7F51BC005BB8 (0x7F51BC005BB8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0F1AF0 waiting 438.556030000 seconds, NSDThread: for
        poll on sock 496
        0x7F522E0E7080 waiting 393.702839774 seconds, NSDThread: on
        ThCond 0x7F5164013668 (0x7F5164013668) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E09DA60 waiting 52.746984660 seconds, NSDThread: on
        ThCond 0x7F506C008858 (0x7F506C008858) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E084CB0 waiting 23.096688206 seconds, NSDThread: on
        ThCond 0x7F521C008E18 (0x7F521C008E18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0839C0 waiting 0.093456000 seconds, NSDThread: for poll
        on sock 962
        0x7F522E076970 waiting 2.236659731 seconds, NSDThread: on ThCond
        0x7F51E0027538 (0x7F51E0027538) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E044E10 waiting 52.752497765 seconds, NSDThread: on
        ThCond 0x7F513802BDD8 (0x7F513802BDD8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E033200 waiting 16.157355796 seconds, NSDThread: on
        ThCond 0x7F5104240D58 (0x7F5104240D58) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E02AD70 waiting 436.025203220 seconds, NSDThread: on
        ThCond 0x7F50E0016C28 (0x7F50E0016C28) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E01A450 waiting 393.673252777 seconds, NSDThread: on
        ThCond 0x7F50A8009C18 (0x7F50A8009C18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DFE0460 waiting 1.781358358 seconds, NSDThread: on ThCond
        0x7F51E0027638 (0x7F51E0027638) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF99420 waiting 0.038405427 seconds, NSDThread: on ThCond
        0x7F50F0172B18 (0x7F50F0172B18) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF7CDA0 waiting 438.204625355 seconds, NSDThread: on
        ThCond 0x7F50900023D8 (0x7F50900023D8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF76EF0 waiting 435.903645734 seconds, NSDThread: on
        ThCond 0x7F5084004BC8 (0x7F5084004BC8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF74910 waiting 21.749325022 seconds, NSDThread: on
        ThCond 0x7F507C011F48 (0x7F507C011F48) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF71040 waiting 1.027274000 seconds, NSDThread: for poll
        on sock 866
        0x7F522DF536D0 waiting 52.953847324 seconds, NSDThread: on
        ThCond 0x7F5200006FF8 (0x7F5200006FF8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF510F0 waiting 0.039278000 seconds, NSDThread: for poll
        on sock 837
        0x7F522DF4EB10 waiting 0.085745937 seconds, NSDThread: on ThCond
        0x7F51F0006828 (0x7F51F0006828) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF4C530 waiting 21.850733000 seconds, NSDThread: for poll
        on sock 986
        0x7F522DF4B240 waiting 0.054739884 seconds, NSDThread: on ThCond
        0x7F51EC0168D8 (0x7F51EC0168D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF48C60 waiting 0.186409714 seconds, NSDThread: on ThCond
        0x7F51E4000908 (0x7F51E4000908) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF41AC0 waiting 438.942861290 seconds, NSDThread: on
        ThCond 0x7F51CC010168 (0x7F51CC010168) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF3F4E0 waiting 0.060235106 seconds, NSDThread: on ThCond
        0x7F51C400A438 (0x7F51C400A438) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF22E60 waiting 0.361288000 seconds, NSDThread: for poll
        on sock 518
        0x7F522DF21B70 waiting 0.060722464 seconds, NSDThread: on ThCond
        0x7F51580162D8 (0x7F51580162D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF12540 waiting 23.077564448 seconds, NSDThread: on
        ThCond 0x7F512C13E1E8 (0x7F512C13E1E8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEFD060 waiting 0.723370000 seconds, NSDThread: for poll
        on sock 503
        0x7F522DEE09E0 waiting 1.565799175 seconds, NSDThread: on ThCond
        0x7F5084004D58 (0x7F5084004D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDF6F0 waiting 22.063017342 seconds, NSDThread: on
        ThCond 0x7F5078003E08 (0x7F5078003E08) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEDD110 waiting 0.049108780 seconds, NSDThread: on ThCond
        0x7F5070001D78 (0x7F5070001D78) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDAB30 waiting 229.603224376 seconds, NSDThread: on
        ThCond 0x7F50680221B8 (0x7F50680221B8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DED7260 waiting 0.071855457 seconds, NSDThread: on ThCond
        0x7F506400A5A8 (0x7F506400A5A8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DED5F70 waiting 0.648324000 seconds, NSDThread: for poll
        on sock 766
        0x7F522DEC3070 waiting 1.809205756 seconds, NSDThread: on ThCond
        0x7F522000E518 (0x7F522000E518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEB1460 waiting 436.017396645 seconds, NSDThread: on
        ThCond 0x7F51E4000978 (0x7F51E4000978) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEAC8A0 waiting 393.734102000 seconds, NSDThread: for
        poll on sock 609
        0x7F522DEA3120 waiting 17.960778837 seconds, NSDThread: on
        ThCond 0x7F51B4001708 (0x7F51B4001708) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE86AA0 waiting 23.112060045 seconds, NSDThread: on
        ThCond 0x7F5154096118 (0x7F5154096118) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE64570 waiting 0.076167410 seconds, NSDThread: on ThCond
        0x7F50D8005EF8 (0x7F50D8005EF8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DE1AF50 waiting 17.460836000 seconds, NSDThread: for poll
        on sock 737
        0x7F522DE104E0 waiting 0.205037000 seconds, NSDThread: for poll
        on sock 865
        0x7F522DDB8B80 waiting 0.106192000 seconds, NSDThread: for poll
        on sock 78
        0x7F522DDA36A0 waiting 0.738921180 seconds, NSDThread: on ThCond
        0x7F505400E048 (0x7F505400E048) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD9C500 waiting 0.731118367 seconds, NSDThread: on ThCond
        0x7F503C00B518 (0x7F503C00B518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD89600 waiting 229.609363000 seconds, NSDThread: for
        poll on sock 515
        0x7F522DD567B0 waiting 1.508489195 seconds, NSDThread: on ThCond
        0x7F514C021F88 (0x7F514C021F88) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'


Another thing worth to mention is that the filesystem its totaly 
unresponsive. Even a simple "cd" to a directory or an ls to a directory 
just hangs for several minutes ( litterally). This happens also if i try 
from the NSD servers.


*Few things i have looked into:*
* Our network seems fine, there might be some bottleneck on part of 
them, and this could explain the waiters, but doesnt explain why ad some 
poit those client ask to expel the NSD servers. THis also doesn't 
justify why the FS is slow even on NSD itself.

* Disk bottleneck? i dont think so. NSD servers have cpu usage  (and io 
wait ) very low. Also mmdiag --iohist seems condirming that the 
operation on the disks are reasonable fast:


        === mmdiag: iohist ===

        I/O history:

          I/O start time RW    Buf type disk:sectorNum     nSec  time
        ms  Type  Device/NSD ID         NSD server
        --------------- -- ----------- ----------------- ----- ------- 
        ---- ------------------ ---------------
        13:54:29.209276  W        data   34:5066338808    2056 88.307 
        lcl  sdtu
        13:54:29.209277  W        data   55:5095698936    2056 27.592 
        lcl  sdaab
        13:54:29.209278  W        data  171:5104087544    2056 22.801 
        lcl  sdtg
        13:54:29.209279  W        data  116:5011812856    2056 65.983 
        lcl  sdqr
        13:54:29.209280  W        data   98:4860817912    2056 17.892 
        lcl  sddl
        13:54:29.209281  W        data  159:4999229944    2056 21.324 
        lcl  sdjg
        13:54:29.209282  W        data   84:5049561592    2056 31.932 
        lcl  sdqz
        13:54:29.209283  W        data    8:5003424248    2056 30.912 
        lcl  sdcw
        13:54:29.209284  W        data   23:4965675512    2056 27.366 
        lcl  sdpt
        13:54:29.297715  W  vdiskMDLog    2:144008496        1 0.236 
        lcl  sdkr
        13:54:29.297717  W  vdiskMDLog    0:331703600        1 0.230 
        lcl  sdcm
        13:54:29.297718  W  vdiskMDLog    1:273769776        1 0.241 
        lcl  sdbp
        13:54:29.244902  W        data   51:3857589752    2056 35.566 
        lcl  sdyi
        13:54:29.244904  W        data   10:3773703672    2056 28.512 
        lcl  sdma
        13:54:29.244905  W        data   48:3639485944    2056 24.124 
        lcl  sdel
        13:54:29.244906  W        data   25:3777897976    2056 18.691 
        lcl  sdgt
        13:54:29.244908  W        data   91:3832423928    2056 20.699 
        lcl  sdlc
        13:54:29.244909  W        data  115:3723372024    2056 30.783 
        lcl  sdho
        13:54:29.244910  W        data  173:3882755576    2056 53.241 
        lcl  sdti
        13:54:29.244911  W        data   42:3782092280    2056 22.785 
        lcl  sddz
        13:54:29.244912  W        data   45:3647874552    2056 24.289 
        lcl  sdei
        13:54:29.244913  W        data   32:3652068856    2056 17.220 
        lcl  sdbn
        13:54:29.244914  W        data   39:3677234680    2056 26.017 
        lcl  sddw
        13:54:29.298273  W  vdiskMDLog    2:144008497        1 2.522 
        lcl  sduf
        13:54:29.298274  W  vdiskMDLog    0:331703601        1 1.025 
        lcl  sdlo
        13:54:29.298275  W  vdiskMDLog    1:273769777        1 2.586 
        lcl  sdtt
        13:54:29.288275  W        data   27:2249588200    2056 20.071 
        lcl  sdhb
        13:54:29.288279  W        data   33:2224422376    2056 19.682 
        lcl  sdts
        13:54:29.288281  W        data   47:2115370472    2056 21.667 
        lcl  sdwo
        13:54:29.288282  W        data   82:2316697064    2056 21.524 
        lcl  sdxy
        13:54:29.288283  W        data   85:2232810984    2056 17.467 
        lcl  sdra
        13:54:29.288285  W        data   30:2127953384    2056 18.475 
        lcl  sdqg
        13:54:29.288286  W        data   67:1876295144    2056 16.383 
        lcl  sdmx
        13:54:29.288287  W        data   64:2127953384    2056 21.908 
        lcl  sduh
        13:54:29.288288  W        data   38:2253782504    2056 19.775 
        lcl  sddv
        13:54:29.288290  W        data   15:2207645160    2056 20.599 
        lcl  sdet
        13:54:29.288291  W        data  157:2283142632    2056 21.198 
        lcl  sdiy


Bonding problem on the interfaces? Mellanox ( interface card prodicer) 
drivers and firmware updated, and we even tested the system with a 
single link ( without bonding).


Could someone help me with this? in particular:

* What exactly are client are looking to decide that another node is 
unresponsive? Ping? i dont think so because both NSD servers and clients 
can be pinged, so what they look? if comeone can also specify what port 
are they using i can try to tcpdump what exactly is cauding this expell.

* How can i monitor metadata operations to understand where EXACTLY is 
the bottleneck that causes this:

        [sdinardo at ebi5-001 ~]$ time ls /gpfs/nobackup/sdinardo

        1                   ebi3-054.ebi.ac.uk ebi3-154           
        ebi5-019.ebi.ac.uk  ebi5-052 ebi5-101           
        ebi5-156            ebi5-197 ebi5-228            ebi5-262.ebi.ac.uk
        10                  ebi3-055 ebi3-155           
        ebi5-021.ebi.ac.uk  ebi5-053 ebi5-104.ebi.ac.uk 
        ebi5-160.ebi.ac.uk  ebi5-198 ebi5-229            ebi5-263
        2                   ebi3-056.ebi.ac.uk ebi3-156           
        ebi5-022            ebi5-054.ebi.ac.uk ebi5-106           
        ebi5-161            ebi5-200 ebi5-230.ebi.ac.uk  ebi5-264
        3                   ebi3-057 ebi3-157           
        ebi5-023            ebi5-056 ebi5-109           
        ebi5-162.ebi.ac.uk  ebi5-201 ebi5-231.ebi.ac.uk  ebi5-265
        4                   ebi3-058 ebi3-158.ebi.ac.uk 
        ebi5-024.ebi.ac.uk  ebi5-057 ebi5-110.ebi.ac.uk 
        ebi5-163.ebi.ac.uk  ebi5-202.ebi.ac.uk ebi5-232           
        ebi5-266.ebi.ac.uk
        5                   ebi3-059.ebi.ac.uk ebi3-160           
        ebi5-025            ebi5-060 ebi5-111.ebi.ac.uk 
        ebi5-164            ebi5-204 ebi5-233            ebi5-267
        6                   ebi3-132 ebi3-161.ebi.ac.uk 
        ebi5-026            ebi5-061.ebi.ac.uk ebi5-112.ebi.ac.uk 
        ebi5-165            ebi5-205 ebi5-234            ebi5-269.ebi.ac.uk
        7                   ebi3-133 ebi3-163.ebi.ac.uk 
        ebi5-028            ebi5-062.ebi.ac.uk ebi5-129.ebi.ac.uk 
        ebi5-166            ebi5-206.ebi.ac.uk ebi5-236            ebi5-270
        8                   ebi3-134 ebi3-165           
        ebi5-030            ebi5-064 ebi5-131.ebi.ac.uk 
        ebi5-169.ebi.ac.uk  ebi5-207 ebi5-237            ebi5-271
        9                   ebi3-135 ebi3-166.ebi.ac.uk 
        ebi5-031            ebi5-065 ebi5-132           
        ebi5-170.ebi.ac.uk  ebi5-209 ebi5-239.ebi.ac.uk  launcher.sh

        _*real    21m14.948s*_( WTH ?!?!?!)
        user    0m0.004s
        sys    0m0.014s


I know that the question are not easy to answer, and i need to dig more, 
but could be very helpful if someone give me some hints about where to 
look at. My gpfs skills are limited since this is our first system and 
is in production for just few months, and the things stated to worsen 
just recenlty. In past we could get over 200Gb/s ( both read and write) 
without any issue. Now some clients get expelled even when data 
thoughuput is ad 4-5Gb/s.

Thanks in advance for any help.

Regards,
Salvatore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140815/3eaa5bc1/attachment-0003.htm>

From mail at arif-ali.co.uk  Tue Aug 19 11:18:10 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 11:18:10 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
Message-ID: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>

Hi all,

You may be aware that the website has been down for about a week now. This
is due to the amount of traffic to the website and the amount of people on
the mailing list, we had seen a few issues on the system.

In order to counter the issues, we are moving to a new system to counter
any future issues, and ease of management. We are hoping to do this tonight
( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
please let me know.

I will, as part of the move over, will be sending a few test mails to make
sure that mailing list is working correctly.

Thanks for your patience

--
Arif Ali
gpfsug Admin

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/bac2c92c/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Tue Aug 19 12:11:00 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 19 Aug 2014 12:11:00 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F330C4.808@ebi.ac.uk>

Still problems. Here some more detailed examples:

*EXAMPLE 1:*

            *EBI5-220**( CLIENT)**
            *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
            reply from node <GSS02B IP> gss02b*
            Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
            (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
            GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:04.982 2014: This node will be expelled
            from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
            IP> (ebi5-220)
            Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
            broke. Probing cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.
            Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked. 
            File system: gpfs1  Reason: SGPanic
            Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
            GSS.ebi.ac.uk) is now the Group Leader.

            *GSS02B ( NSD SERVER)*
            ...
            Tue Aug 19 11:03:17.070 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:25.016 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:28.080 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:36.019 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:39.083 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:47.023 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:50.088 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:52.218 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:58.030 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:01.092 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:03.220 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:09.034 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:12.096 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:14.224 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:20.037 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:23.103 2014: Accepted and connected to
            *<EBI5-220 IP>* ebi5-220 <c0n618>
            ...

            *GSS02a ( NSD SERVER)*
            Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
            request from <EBI5-220 IP> (ebi5-220 in
            ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
            in ebi-cluster.ebi.ac.uk)
            Tue Aug 19 11:03:12.069 2014: Accepted and connected to
            <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
*EXAMPLE 2*:

            *EBI5-038*
            Tue Aug 19 11:32:34.227 2014: *Disk lease period expired in
            cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
            Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
            cluster GSS.ebi.ac.uk*
            Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014
            IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting
            reconnect.
            ...
            LOT MORE RESETS BY PEER
            ...
            Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167
            IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
            gss02a <c1n2>
            Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
            IP> gss02a <c1n2>
            Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.*

            *GSS02a*
            Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
            in ebi-cluster.ebi.ac.uk) *is being expelled because of an
            expired lease.* Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the 
servers seems working fine.. how can i trace better ( to solve) the 
problem?

In example 2 it seems to me that for some reason the manager are not 
renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how 
can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/9b7c2042/attachment-0003.htm>

From mail at arif-ali.co.uk  Tue Aug 19 20:59:47 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 20:59:47 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
In-Reply-To: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
References: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
Message-ID: <CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>

This is a test mail to the mailing list

please do not reply

--
Arif Ali

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali


On 19 August 2014 11:18, Arif Ali <mail at arif-ali.co.uk> wrote:

> Hi all,
>
> You may be aware that the website has been down for about a week now. This
> is due to the amount of traffic to the website and the amount of people on
> the mailing list, we had seen a few issues on the system.
>
> In order to counter the issues, we are moving to a new system to counter
> any future issues, and ease of management. We are hoping to do this tonight
> ( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
> please let me know.
>
> I will, as part of the move over, will be sending a few test mails to make
> sure that mailing list is working correctly.
>
> Thanks for your patience
>
> --
> Arif Ali
> gpfsug Admin
>
> IRC: arif-ali at freenode
> LinkedIn: http://uk.linkedin.com/in/arifali
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/c2067414/attachment-0003.htm>

From mail at arif-ali.co.uk  Tue Aug 19 23:41:48 2014
From: mail at arif-ali.co.uk (Arif Ali)
Date: Tue, 19 Aug 2014 23:41:48 +0100
Subject: [gpfsug-discuss] gpfsug Maintenance
In-Reply-To: <CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>
References: <CAM0VtDbdku8RJsDtdmRoMeviV-=g5tiztk7qrxJy1siLij2e9g@mail.gmail.com>
	<CAM0VtDbx2bRf=HbaPD1QEgkYgat9RCUjxzDA7pU3TvfJESBZBw@mail.gmail.com>
Message-ID: <CAM0VtDa4pX8hi8VGkjkYYuxrW=tySdaXScOeBayHxwhcuUkAjg@mail.gmail.com>

Thanks for all your patience,

The service should all be back up again

--
Arif Ali

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali


On 19 August 2014 20:59, Arif Ali <mail at arif-ali.co.uk> wrote:

> This is a test mail to the mailing list
>
> please do not reply
>
> --
> Arif Ali
>
> IRC: arif-ali at freenode
> LinkedIn: http://uk.linkedin.com/in/arifali
>
>
> On 19 August 2014 11:18, Arif Ali <mail at arif-ali.co.uk> wrote:
>
>> Hi all,
>>
>> You may be aware that the website has been down for about a week now.
>> This is due to the amount of traffic to the website and the amount of
>> people on the mailing list, we had seen a few issues on the system.
>>
>> In order to counter the issues, we are moving to a new system to counter
>> any future issues, and ease of management. We are hoping to do this tonight
>> ( between 20:00 - 23:00 BST). If this causes an issue for anyone, then
>> please let me know.
>>
>> I will, as part of the move over, will be sending a few test mails to
>> make sure that mailing list is working correctly.
>>
>> Thanks for your patience
>>
>> --
>> Arif Ali
>> gpfsug Admin
>>
>> IRC: arif-ali at freenode
>> LinkedIn: http://uk.linkedin.com/in/arifali
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140819/a82bb0f9/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Wed Aug 20 08:57:23 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 20 Aug 2014 08:57:23 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F454E3.40803@ebi.ac.uk>

Still problems. Here some more detailed examples:

*EXAMPLE 1:*

            *EBI5-220**( CLIENT)**
            *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
            reply from node <GSS02B IP> gss02b*
            Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
            (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
            GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:04.982 2014: This node will be expelled
            from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
            IP> (ebi5-220)
            Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
            broke. Probing cluster GSS.ebi.ac.uk
            Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.
            Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked. 
            File system: gpfs1  Reason: SGPanic
            Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
            gss02a <c1p687>
            Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
            gss02b <c1p686>
            Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
            gss03b <c1p685>
            Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
            gss03a <c1p684>
            Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
            gss01b <c1p683>
            Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
            gss01a <c1p1>
            Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
            GSS.ebi.ac.uk) is now the Group Leader.

            *GSS02B ( NSD SERVER)*
            ...
            Tue Aug 19 11:03:17.070 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:25.016 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:28.080 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:36.019 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:39.083 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:47.023 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:50.088 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:52.218 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:03:58.030 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:01.092 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:03.220 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:09.034 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:12.096 2014: Killing connection from
            *<EBI5-220 IP>* because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:14.224 2014: Killing connection from
            <EBI5-043 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:20.037 2014: Killing connection from
            <EBI5-102 IP> because the group is not ready for it to
            rejoin, err 46
            Tue Aug 19 11:04:23.103 2014: Accepted and connected to
            *<EBI5-220 IP>* ebi5-220 <c0n618>
            ...

            *GSS02a ( NSD SERVER)*
            Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
            request from <EBI5-220 IP> (ebi5-220 in
            ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
            in ebi-cluster.ebi.ac.uk)
            Tue Aug 19 11:03:12.069 2014: Accepted and connected to
            <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
*EXAMPLE 2*:

            *EBI5-038*
            Tue Aug 19 11:32:34.227 2014: *Disk lease period expired in
            cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
            Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
            cluster GSS.ebi.ac.uk*
            Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014
            IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting
            reconnect.
            ...
            LOT MORE RESETS BY PEER
            ...
            Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167
            IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting
            reconnect.
            Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
            gss02a <c1n2>
            Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
            IP> gss02a <c1n2>
            Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
            IP> gss02a <c1n2> (Connection failed because destination is
            still processing previous node failure)
            Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
            nodes during cluster probe.
            Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
            GSS.ebi.ac.uk. Unmounting file systems.*

            *GSS02a*
            Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
            in ebi-cluster.ebi.ac.uk) *is being expelled because of an
            expired lease.* Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the 
servers seems working fine.. how can i trace better ( to solve) the 
problem?

In example 2 it seems to me that for some reason the manager are not 
renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how 
can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140820/b9977ac0/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Wed Aug 20 09:03:03 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Wed, 20 Aug 2014 09:03:03 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F454E3.40803@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
Message-ID: <53F45637.8080000@ebi.ac.uk>

Another interesting case about a specific waiter:

was looking the waiters on GSS until i found those( i got those info 
collecting from all the servers with a script i did, so i was able to 
trace hanging connection while they was happening):

                gss03b.ebi.ac.uk:*235.373993397*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss03b.ebi.ac.uk:*235.152271998*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss02a.ebi.ac.uk:*214.079093620 *(MsgRecordCondvar),
                reason 'RPC wait' for tmMsgRevoke on node 10.7.34.109
                <c0n656>
                gss02a.ebi.ac.uk:*213.580199240 *(MsgRecordCondvar),
                reason 'RPC wait' for tmMsgRevoke on node 10.7.37.109
                <c0n675>
                gss03b.ebi.ac.uk:*132.375138082*(MsgRecordCondvar),
                reason 'RPC wait' for getData on node 10.7.37.109 <c0n675>
                gss03b.ebi.ac.uk:*132.374973884 *(MsgRecordCondvar),
                reason 'RPC wait' for commMsgCheckMessages on node
                10.7.37.109 <c0n675>


the bolted number are seconds. looking at this page:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/Interpreting+GPFS+Waiter+Information

The web page claim that's, probably a network congestion, but i managed 
to login quick enough to the client and there the waiters was:

                [root at ebi5-236 ~]# mmdiag --waiters

                === mmdiag: waiters ===
                0x7F6690073460 waiting 147.973009173 seconds,
                RangeRevokeWorkerThread: on ThCond 0x1801E43F6A0
                (0xFFFFC9001E43F6A0) (LkObjCondvar), reason 'waiting for
                LX lock'
                0x7F65100036D0 waiting 140.458589856 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6500000F98
                (0x7F6500000F98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F63A0001080 waiting 245.153055801 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65D8002CF8
                (0x7F65D8002CF8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C03D3D0 waiting 245.750977203 seconds,
                CleanBufferThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait for buffer write to complete'
                0x7F674802E360 waiting 244.159861966 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65E0002358
                (0x7F65E0002358) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C038810 waiting 251.086748430 seconds,
                SGExceptionLogBufferFullThread: on ThCond 0x7F64EC001398
                (0x7F64EC001398) (MsgRecordCondvar), reason 'RPC wait'
                for I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C036230 waiting 139.556735095 seconds,
                CleanBufferThread: on ThCond 0x7F65CC004C78
                (0x7F65CC004C78) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C031670 waiting 144.327593052 seconds,
                WritebehindWorkerThread: on ThCond 0x7F672402D1A8
                (0x7F672402D1A8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C02A4D0 waiting 145.202712821 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65440018F8
                (0x7F65440018F8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674C0291E0 waiting 247.131569232 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65740016C8
                (0x7F65740016C8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6748025BD0 waiting 11.631381523 seconds,
                replyCleanupThread: on ThCond 0x7F65E000A1F8
                (0x7F65E000A1F8) (MsgRecordCondvar), reason 'RPC wait'
                0x7F6748022300 waiting 245.616267612 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6470001468
                (0x7F6470001468) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6748021010 waiting 230.769670930 seconds,
                InodeAllocRevokeWorkerThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait for buffer write to complete'
                0x7F674801B160 waiting 245.830554594 seconds,
                UnusedInodePrefetchThread: on ThCond 0x7F65B8004438
                (0x7F65B8004438) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F674800A820 waiting 252.332932000 seconds, Msg
                handler getData: for poll on sock 109
                0x7F63F4023090 waiting 253.073535042 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65C4000CC8
                (0x7F65C4000CC8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64A4000CE0 waiting 145.049659249 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6560000A98
                (0x7F6560000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6778006D00 waiting 142.124664264 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63DC000C08
                (0x7F63DC000C08) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780046D0 waiting 251.751439453 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6454000A98
                (0x7F6454000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780E4B70 waiting 142.431051232 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63C80010D8
                (0x7F63C80010D8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67780E50D0 waiting 244.339624817 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65BC001B98
                (0x7F65BC001B98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6434000B40 waiting 145.343700410 seconds,
                WritebehindWorkerThread: on ThCond 0x7F63B00036E8
                (0x7F63B00036E8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F670C0187A0 waiting 244.903963969 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F0000FB8
                (0x7F65F0000FB8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C04E2F0 waiting 245.837137631 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65A4000A98
                (0x7F65A4000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C04AA20 waiting 139.713993908 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6454002478
                (0x7F6454002478) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C049730 waiting 252.434187472 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F4003708
                (0x7F65F4003708) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C044B70 waiting 131.515829048 seconds, Msg
                handler ccMsgPing: on ThCond 0x7F64DC1D4888
                (0x7F64DC1D4888) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F6758008DE0 waiting 149.548547226 seconds, Msg
                handler getData: on ThCond 0x7F645C002458
                (0x7F645C002458) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F67580071D0 waiting 149.548543118 seconds, Msg
                handler commMsgCheckMessages: on ThCond 0x7F6450001C48
                (0x7F6450001C48) (InuseCondvar), reason 'waiting for
                exclusive use of connection for sending msg'
                0x7F65A40052B0 waiting 11.498507001 seconds, Msg handler
                ccMsgPing: on ThCond 0x7F644C103F88 (0x7F644C103F88)
                (InuseCondvar), reason 'waiting for exclusive use of
                connection for sending msg'
                0x7F6448001620 waiting 139.844870446 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65F0003098
                (0x7F65F0003098) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F63F4000F80 waiting 245.044791905 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6450001188
                (0x7F6450001188) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F659C0033A0 waiting 243.464399305 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6554002598
                (0x7F6554002598) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6514001690 waiting 245.826160463 seconds,
                PrefetchWorkerThread: on ThCond 0x7F65A4004558
                (0x7F65A4004558) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64800012B0 waiting 253.174835511 seconds,
                WritebehindWorkerThread: on ThCond 0x7F65E0000FB8
                (0x7F65E0000FB8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6510000EE0 waiting 140.746696039 seconds,
                WritebehindWorkerThread: on ThCond 0x7F647C000CC8
                (0x7F647C000CC8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6754001BB0 waiting 246.336055629 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6594002498
                (0x7F6594002498) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6420000930 waiting 140.606777450 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6578002498
                (0x7F6578002498) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6744009110 waiting 137.466372831 seconds,
                FileBlockReadFetchHandlerThread: on ThCond
                0x7F65F4007158 (0x7F65F4007158) (MsgRecordCondvar),
                reason 'RPC wait' for NSD I/O completion on node
                10.7.28.35 <c1n5>
                0x7F67280119F0 waiting 144.173427360 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6504000AE8
                (0x7F6504000AE8) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F672800BB40 waiting 145.804301887 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6550001038
                (0x7F6550001038) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6728000910 waiting 252.601993452 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6450000A98
                (0x7F6450000A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F6744007E20 waiting 251.603329204 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6570004C18
                (0x7F6570004C18) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F64AC002EF0 waiting 139.205774422 seconds,
                FileBlockWriteFetchHandlerThread: on ThCond
                0x18020AF0260 (0xFFFFC90020AF0260)
                (FetchFlowControlCondvar), reason 'wait for buffer for
                fetch'
                0x7F6724013050 waiting 71.501580932 seconds, Msg handler
                ccMsgPing: on ThCond 0x7F6580006608 (0x7F6580006608)
                (InuseCondvar), reason 'waiting for exclusive use of
                connection for sending msg'
                0x7F661C000DA0 waiting 245.654985276 seconds,
                PrefetchWorkerThread: on ThCond 0x7F6570005288
                (0x7F6570005288) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F671C00F440 waiting 251.096002003 seconds,
                FileBlockReadFetchHandlerThread: on ThCond
                0x7F65BC002878 (0x7F65BC002878) (MsgRecordCondvar),
                reason 'RPC wait' for NSD I/O completion on node
                10.7.28.35 <c1n5>
                0x7F671C00E150 waiting 144.034006970 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6528001548
                (0x7F6528001548) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A02FCD20 waiting 142.324070945 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6580002A98
                (0x7F6580002A98) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A02FA330 waiting 200.670114385 seconds,
                EEWatchDogThread: on ThCond 0x7F65B0000A98
                (0x7F65B0000A98) (MsgRecordCondvar), reason 'RPC wait'
                0x7F67A02BF050 waiting 252.276161189 seconds,
                WritebehindWorkerThread: on ThCond 0x7F6584003998
                (0x7F6584003998) (MsgRecordCondvar), reason 'RPC wait'
                for NSD I/O completion on node 10.7.28.35 <c1n5>
                0x7F67A0004160 waiting 251.173651822 seconds,
                SyncHandlerThread: on ThCond 0x7F64880079E8
                (0x7F64880079E8) (LogFileBufferDescriptorCondvar),
                reason 'force wait on force active buffer write'


So from the client side its the client that's waiting the server. I 
managed also to ping, ssh, and   tcpdump each other before the node got 
expelled and discovered that ping works fine, ssh work fine , beside my 
tests there are  0 packet passing between them, LITERALLY.

So there is no congestion, no network issues, but the server waits for 
the client and the client waits the server. This happens until we reach 
350 secs ( 10 times the lease time) , then client get expelled.
There are no local io waiters that indicates that gss is struggling, 
there is plenty of bandwith and CPU resources and no network congestion.

Seems some sort of deadlock to me, but how can this be explained and 
hopefully fixed?

Regards,
Salvatore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140820/77aceb5a/attachment-0003.htm>

From chair at gpfsug.org  Thu Aug 21 09:20:39 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Thu, 21 Aug 2014 09:20:39 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F454E3.40803@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
Message-ID: <53F5ABD7.80107@gpfsug.org>

Hi there,

   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
better term now GSS is out) and seen ping 'working', but alongside 
ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - 
and rightly so.

In my experience this has _always_ been a network issue of one sort of 
another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen 
that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to 
determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly 
or via your OEM and collect and supply a snap and traces as required by 
support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
> Still problems. Here some more detailed examples:
>
> *EXAMPLE 1:*
>
>             *EBI5-220**( CLIENT)**
>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>             reply from node <GSS02B IP> gss02b*
>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
>             (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
>             GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>             from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220
>             IP> (ebi5-220)
>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>             broke. Probing cluster GSS.ebi.ac.uk
>             Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum
>             nodes during cluster probe.
>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>             GSS.ebi.ac.uk. Unmounting file systems.
>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>             invoked.  File system: gpfs1  Reason: SGPanic
>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>             gss02a <c1p687>
>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>             gss02a <c1p687>
>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>             gss02b <c1p686>
>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>             gss03b <c1p685>
>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>             gss03a <c1p684>
>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>             gss01b <c1p683>
>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>             gss01a <c1p1>
>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>             gss02b <c1p686>
>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>             gss03b <c1p685>
>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>             gss03a <c1p684>
>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>             gss01b <c1p683>
>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>             gss01a <c1p1>
>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
>             GSS.ebi.ac.uk) is now the Group Leader.
>
>             *GSS02B ( NSD SERVER)*
>             ...
>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>             *<EBI5-220 IP>* because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>             <EBI5-043 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>             <EBI5-102 IP> because the group is not ready for it to
>             rejoin, err 46
>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>             ...
>
>             *GSS02a ( NSD SERVER)*
>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>             request from <EBI5-220 IP> (ebi5-220 in
>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220
>             in ebi-cluster.ebi.ac.uk)
>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>             <EBI5-220 IP> ebi5-220 <c0n618>
>
>
> ===============================================
> *EXAMPLE 2*:
>
>             *EBI5-038*
>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>             cluster GSS.ebi.ac.uk*
>             Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection reset by peer). Attempting
>             reconnect.
>             Tue Aug 19 11:35:24.865 2014: Close connection to
>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>             peer). Attempting reconnect.
>             ...
>             LOT MORE RESETS BY PEER
>             ...
>             Tue Aug 19 11:35:25.096 2014: Close connection to
>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>             peer). Attempting reconnect.
>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>             gss02a <c1n2>
>             Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection failed because destination
>             is still processing previous node failure)
>             Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
>             IP> gss02a <c1n2>
>             Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
>             IP> gss02a <c1n2> (Connection failed because destination
>             is still processing previous node failure)
>             Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum
>             nodes during cluster probe.
>             Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
>             GSS.ebi.ac.uk. Unmounting file systems.*
>
>             *GSS02a*
>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038
>             in ebi-cluster.ebi.ac.uk) *is being expelled because of an
>             expired lease.* Pings sent: 60. Replies received: 60.
>
>
>
>
> In example 1 seems that an NSD was not repliyng to the client, but the 
> servers seems working fine.. how can i trace better ( to solve) the 
> problem?
>
> In example 2 it seems to me that for some reason the manager are not 
> renewing the lease in time. when this happens , its not a single client.
> Loads of them fail to get the lease renewed. Why this is happening? 
> how can i trace to the source of the problem?
>
>
>
> Thanks in advance for any tips.
>
> Regards,
> Salvatore
>
>
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/9039306e/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Thu Aug 21 10:04:47 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 10:04:47 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5ABD7.80107@gpfsug.org>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>
Message-ID: <53F5B62F.1060305@ebi.ac.uk>

Thanks for the feedback, but we managed to find a scenario that excludes 
network problems.

we have a file called */input_file/* of nearly 100GB:

if from *client A* we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they 
flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the 
same client) *client B* :

cat input_file >> output_file


  ( in other words we are trying to write to the same destination) all 
the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for 
one of the "cat", so have to wait ( and since the file is BIG , have to 
wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel 
the node? We can reproduce this every time, and since our users to 
operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to 
the same destination, its also quite common if we want to dump to a log 
file logs and for some reason one of the writers, write for a lot of 
time keeping the file locked.
Our expels are not due to network congestion, but because a write 
attempts have to wait another one. What i really dont understand is why 
to take a so expreme mesure to expell jest because a process is waiteing 
"to too much time".


I have ticket opened to IBM for this and the issue is under 
investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
> Hi there,
>
>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
> better term now GSS is out) and seen ping 'working', but alongside 
> ejections from the cluster.
> The GPFS internode 'ping' is somewhat more circumspect than unix ping 
> - and rightly so.
>
> In my experience this has _always_ been a network issue of one sort of 
> another.  If the network is experiencing issues, nodes will be ejected.
> Of course it could be unresponsive mmfsd or high loadavg, but I've 
> seen that only twice in 10 years over many versions of GPFS.
>
> You need to follow the logs through from each machine in time order to 
> determine who could not see who and in what order.
> Your best way forward is to log a SEV2 case with IBM support, directly 
> or via your OEM and collect and supply a snap and traces as required 
> by support.
>
> Without knowing your full setup, it's hard to help further.
>
> Jez
>
> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>> Still problems. Here some more detailed examples:
>>
>> *EXAMPLE 1:*
>>
>>             *EBI5-220**( CLIENT)**
>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>             reply from node <GSS02B IP> gss02b*
>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP>
>>             (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in
>>             GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>             from cluster GSS.ebi.ac.uk due to expel msg from
>>             <EBI5-220 IP> (ebi5-220)
>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>             broke. Probing cluster GSS.ebi.ac.uk
>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>             quorum nodes during cluster probe.
>>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>             GSS.ebi.ac.uk. Unmounting file systems.
>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>             invoked.  File system: gpfs1  Reason: SGPanic
>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>             gss02a <c1p687>
>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>             gss02a <c1p687>
>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>             gss02b <c1p686>
>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>             gss03b <c1p685>
>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>             gss03a <c1p684>
>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>             gss01b <c1p683>
>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>             gss01a <c1p1>
>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>             gss02b <c1p686>
>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>             gss03b <c1p685>
>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>             gss03a <c1p684>
>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>             gss01b <c1p683>
>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>             gss01a <c1p1>
>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in
>>             GSS.ebi.ac.uk) is now the Group Leader.
>>
>>             *GSS02B ( NSD SERVER)*
>>             ...
>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>             *<EBI5-220 IP>* because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>             <EBI5-043 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>             <EBI5-102 IP> because the group is not ready for it to
>>             rejoin, err 46
>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>             ...
>>
>>             *GSS02a ( NSD SERVER)*
>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>             request from <EBI5-220 IP> (ebi5-220 in
>>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>             (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>
>>
>> ===============================================
>> *EXAMPLE 2*:
>>
>>             *EBI5-038*
>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>             cluster GSS.ebi.ac.uk*
>>             Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection reset by peer). Attempting
>>             reconnect.
>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>             peer). Attempting reconnect.
>>             ...
>>             LOT MORE RESETS BY PEER
>>             ...
>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>             peer). Attempting reconnect.
>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>             gss02a <c1n2>
>>             Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection failed because destination
>>             is still processing previous node failure)
>>             Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A
>>             IP> gss02a <c1n2>
>>             Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A
>>             IP> gss02a <c1n2> (Connection failed because destination
>>             is still processing previous node failure)
>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>             quorum nodes during cluster probe.
>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in cluster
>>             GSS.ebi.ac.uk. Unmounting file systems.*
>>
>>             *GSS02a*
>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>             (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>             because of an expired lease.* Pings sent: 60. Replies
>>             received: 60.
>>
>>
>>
>>
>> In example 1 seems that an NSD was not repliyng to the client, but 
>> the servers seems working fine.. how can i trace better ( to solve) 
>> the problem?
>>
>> In example 2 it seems to me that for some reason the manager are not 
>> renewing the lease in time. when this happens , its not a single client.
>> Loads of them fail to get the lease renewed. Why this is happening? 
>> how can i trace to the source of the problem?
>>
>>
>>
>> Thanks in advance for any tips.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/a0a8b3b7/attachment-0003.htm>

From bbanister at jumptrading.com  Thu Aug 21 13:48:38 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 21 Aug 2014 12:48:38 +0000
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>,<53F5B62F.1060305@ebi.ac.uk>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>

As I understand GPFS distributed locking semantics, GPFS will not allow one node to hold a write lock for a file indefinitely.  Once Client B opens the file for writing it would have contacted the File System Manager to obtain the lock.  The FS manager would have told Client B that Client A has the lock and that Client B would have to contact Client A and revoke the write lock token.  If Client A does not respond to Client B's request to revoke the write token, then Client B will ask that Client A be expelled from the cluster for NOT adhering to the proper protocol for write lock contention.


[cid:2fb2253c-3ffb-4ac6-88a8-d019b1a24f66]


Have you checked the communication path between the two clients at this point?

I could not follow the logs that you provided.  You should definitely look at the exact sequence of log events on the two clients and the file system manager (as reported by mmlsmgr).

Hope that helps,
-Bryan

________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk]
Sent: Thursday, August 21, 2014 4:04 AM
To: chair at gpfsug.org; gpfsug main discussion list
Subject: Re: [gpfsug-discuss] gpfs client expels

Thanks for the feedback, but we managed to find a scenario that excludes network problems.

we have a file called input_file of nearly 100GB:

if from client A we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the same client) client B :

cat input_file >> output_file


 ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".


I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
Hi there,

  I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.

In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
Still problems. Here some more detailed examples:

EXAMPLE 1:
EBI5-220 ( CLIENT)
Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.

GSS02B ( NSD SERVER)
...
Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
...

GSS02a ( NSD SERVER)
Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
EXAMPLE 2:

EBI5-038
Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
...
LOT MORE RESETS BY PEER
...
Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.

GSS02a
Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem?

In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/416902ff/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GPFS_Token_Protocol.png
Type: image/png
Size: 249179 bytes
Desc: GPFS_Token_Protocol.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/416902ff/attachment-0003.png>

From jbernard at jumptrading.com  Thu Aug 21 13:52:05 2014
From: jbernard at jumptrading.com (Jon Bernard)
Date: Thu, 21 Aug 2014 12:52:05 +0000
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org>, <53F5B62F.1060305@ebi.ac.uk>,
	<21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <D3287279-9A7C-4645-B41F-E2B36DCDBA85@jumptrading.com>

Where is that from?

On Aug 21, 2014, at 7:49, "Bryan Banister" <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:

As I understand GPFS distributed locking semantics, GPFS will not allow one node to hold a write lock for a file indefinitely.  Once Client B opens the file for writing it would have contacted the File System Manager to obtain the lock.  The FS manager would have told Client B that Client A has the lock and that Client B would have to contact Client A and revoke the write lock token.  If Client A does not respond to Client B's request to revoke the write token, then Client B will ask that Client A be expelled from the cluster for NOT adhering to the proper protocol for write lock contention.


<GPFS_Token_Protocol.png>


Have you checked the communication path between the two clients at this point?

I could not follow the logs that you provided.  You should definitely look at the exact sequence of log events on the two clients and the file system manager (as reported by mmlsmgr).

Hope that helps,
-Bryan

________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] on behalf of Salvatore Di Nardo [sdinardo at ebi.ac.uk<mailto:sdinardo at ebi.ac.uk>]
Sent: Thursday, August 21, 2014 4:04 AM
To: chair at gpfsug.org<mailto:chair at gpfsug.org>; gpfsug main discussion list
Subject: Re: [gpfsug-discuss] gpfs client expels

Thanks for the feedback, but we managed to find a scenario that excludes network problems.

we have a file called input_file of nearly 100GB:

if from client A we do:

cat input_file >> output_file

it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...


if now we do the same from another client ( or just another shell on the same client) client B :

cat input_file >> output_file


 ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.


Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease.
Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.


As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".


I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..

Regards,
Salvatore


On 21/08/14 09:20, Jez Tucker (Chair) wrote:
Hi there,

  I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.

In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.

You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.

Without knowing your full setup, it's hard to help further.

Jez

On 20/08/14 08:57, Salvatore Di Nardo wrote:
Still problems. Here some more detailed examples:

EXAMPLE 1:
EBI5-220 ( CLIENT)
Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220 IP> (ebi5-220)
Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Unmounting file systems.
Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>) is now the Group Leader.

GSS02B ( NSD SERVER)
...
Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
...

GSS02a ( NSD SERVER)
Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>)
Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>


===============================================
EXAMPLE 2:

EBI5-038
Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Attempting to reacquire lease.
Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>
Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
...
LOT MORE RESETS BY PEER
...
Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk<http://GSS.ebi.ac.uk>. Unmounting file systems.

GSS02a
Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk<http://ebi-cluster.ebi.ac.uk>) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.


In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem?

In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client.
Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?


Thanks in advance for any tips.

Regards,
Salvatore


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/ede5a53a/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GPFS_Token_Protocol.png
Type: image/png
Size: 249179 bytes
Desc: GPFS_Token_Protocol.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/ede5a53a/attachment-0003.png>

From viccornell at gmail.com  Thu Aug 21 14:03:14 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Thu, 21 Aug 2014 14:03:14 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org> <53F5B62F.1060305@ebi.ac.uk>
Message-ID: <9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>

Hi Salvatore,

Are you using ethernet or infiniband as the GPFS interconnect to your clients?

If 10/40GbE - do you have a separate admin network?

I have seen behaviour similar to this where the storage traffic causes congestion and the "admin" traffic gets lost or delayed causing expels.

Vic


On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:

> Thanks for the feedback, but we managed to find a scenario that excludes network problems.
> 
> we have a file called input_file of nearly 100GB:
> 
> if from client A we do:
> 
> cat input_file >> output_file
> 
> it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...
> 
> 
> if now we do the same from another client ( or just another shell on the same client) client B :
> 
> cat input_file >> output_file
> 
> 
>  ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.
> 
> 
> Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease. 
> Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.
> 
> 
> 
> As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
> Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".
> 
> 
> I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..
> 
> Regards,
> Salvatore
> 
> 
> 
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>> 
>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.
>> 
>> In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.
>> 
>> You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.
>> 
>> Without knowing your full setup, it's hard to help further.
>> 
>> Jez
>> 
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>> 
>>> EXAMPLE 1:
>>> EBI5-220 ( CLIENT)
>>> Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.
>>> 
>>> GSS02B ( NSD SERVER)
>>> ...
>>> Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>> ...
>>> 
>>> GSS02a ( NSD SERVER)
>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>> 
>>> 
>>> ===============================================
>>> EXAMPLE 2:
>>> 
>>> EBI5-038
>>> Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
>>> Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
>>> Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
>>> ...
>>> LOT MORE RESETS BY PEER
>>> ...
>>> Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
>>> Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>> Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
>>> Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
>>> Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>> 
>>> GSS02a
>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.
>>> 
>>> 
>>> 
>>> In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem? 
>>> 
>>> In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client. 
>>> Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?
>>> 
>>> 
>>> 
>>> Thanks in advance for any tips.
>>> 
>>> Regards,
>>> Salvatore
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/8ebcc5bd/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Thu Aug 21 14:04:59 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 14:04:59 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>,
	<53F5B62F.1060305@ebi.ac.uk>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8263D9@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <53F5EE7B.2080306@ebi.ac.uk>

Thanks for the info...  it helps a bit understanding whats going on, but 
i think you missed the part that Node A and Node B could also be the 
same machine.

If for instance i ran 2 cp on the same machine, hence Client B cannot 
have problems contacting Client A since they are the same machine.....

BTW i did the same also using 2 clients and the result its the same. 
Nonetheless your description is made me understand a bit better what's 
going on


Regards,
Salvatore

On 21/08/14 13:48, Bryan Banister wrote:
> As I understand GPFS distributed locking semantics, GPFS will not 
> allow one node to hold a write lock for a file indefinitely.  Once 
> Client B opens the file for writing it would have contacted the File 
> System Manager to obtain the lock.  The FS manager would have told 
> Client B that Client A has the lock and that Client B would have to 
> contact Client A and revoke the write lock token.  If Client A does 
> not respond to Client B's request to revoke the write token, then 
> Client B will ask that Client A be expelled from the cluster for NOT 
> adhering to the proper protocol for write lock contention.
>
>
>
> Have you checked the communication path between the two clients at 
> this point?
>
> I could not follow the logs that you provided.  You should definitely 
> look at the exact sequence of log events on the two clients and the 
> file system manager (as reported by mmlsmgr).
>
> Hope that helps,
> -Bryan
>
> ------------------------------------------------------------------------
> *From:* gpfsug-discuss-bounces at gpfsug.org 
> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo 
> [sdinardo at ebi.ac.uk]
> *Sent:* Thursday, August 21, 2014 4:04 AM
> *To:* chair at gpfsug.org; gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] gpfs client expels
>
> Thanks for the feedback, but we managed to find a scenario that 
> excludes network problems.
>
> we have a file called */input_file/* of nearly 100GB:
>
> if from *client A* we do:
>
> cat input_file >> output_file
>
> it start copying.. and we see waiter goeg a bit up,secs but then they 
> flushes back to 0, so we xcan say that the copy proceed well...
>
>
> if now we do the same from another client ( or just another shell on 
> the same client) *client B* :
>
> cat input_file >> output_file
>
>
>  ( in other words we are trying to write to the same destination) all 
> the waiters gets up until one node get expelled.
>
>
> Now, while its understandable that the destination file is locked for 
> one of the "cat", so have to wait ( and since the file is BIG , have 
> to wait for a while), its not understandable why it stop the renewal 
> lease.
> Why its doen't return just a timeout error on the copy instead to 
> expel the node? We can reproduce this every time, and since our users 
> to operations like this on files over 100GB each you can imagine the 
> result.
>
>
>
> As you can imagine even if its a bit silly to write at the same time 
> to the same destination, its also quite common if we want to dump to a 
> log file logs and for some reason one of the writers, write for a lot 
> of time keeping the file locked.
> Our expels are not due to network congestion, but because a write 
> attempts have to wait another one. What i really dont understand is 
> why to take a so expreme mesure to expell jest because a process is 
> waiteing "to too much time".
>
>
> I have ticket opened to IBM for this and the issue is under 
> investigation, but no luck so far..
>
> Regards,
> Salvatore
>
>
>
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>>
>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>> better term now GSS is out) and seen ping 'working', but alongside 
>> ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping 
>> - and rightly so.
>>
>> In my experience this has _always_ been a network issue of one sort 
>> of another.  If the network is experiencing issues, nodes will be 
>> ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>> seen that only twice in 10 years over many versions of GPFS.
>>
>> You need to follow the logs through from each machine in time order 
>> to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support, 
>> directly or via your OEM and collect and supply a snap and traces as 
>> required by support.
>>
>> Without knowing your full setup, it's hard to help further.
>>
>> Jez
>>
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>>
>>> *EXAMPLE 1:*
>>>
>>>             *EBI5-220**( CLIENT)**
>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>             reply from node <GSS02B IP> gss02b*
>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>             IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP>
>>>             (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>>             from cluster GSS.ebi.ac.uk due to expel msg from
>>>             <EBI5-220 IP> (ebi5-220)
>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>>             broke. Probing cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>>             GSS.ebi.ac.uk. Unmounting file systems.
>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>             in GSS.ebi.ac.uk) is now the Group Leader.
>>>
>>>             *GSS02B ( NSD SERVER)*
>>>             ...
>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>             ...
>>>
>>>             *GSS02a ( NSD SERVER)*
>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>>             request from <EBI5-220 IP> (ebi5-220 in
>>>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>>             (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>
>>>
>>> ===============================================
>>> *EXAMPLE 2*:
>>>
>>>             *EBI5-038*
>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>>             cluster GSS.ebi.ac.uk*
>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>             Attempting reconnect.
>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             ...
>>>             LOT MORE RESETS BY PEER
>>>             ...
>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1n2>
>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>             <GSS02A IP> gss02a <c1n2>
>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>             cluster GSS.ebi.ac.uk. Unmounting file systems.*
>>>
>>>             *GSS02a*
>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>>             because of an expired lease.* Pings sent: 60. Replies
>>>             received: 60.
>>>
>>>
>>>
>>>
>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>> the servers seems working fine.. how can i trace better ( to solve) 
>>> the problem?
>>>
>>> In example 2 it seems to me that for some reason the manager are not 
>>> renewing the lease in time. when this happens , its not a single 
>>> client.
>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>> how can i trace to the source of the problem?
>>>
>>>
>>>
>>> Thanks in advance for any tips.
>>>
>>> Regards,
>>> Salvatore
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 249179 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment-0003.png>

From sdinardo at ebi.ac.uk  Thu Aug 21 14:18:19 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 21 Aug 2014 14:18:19 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>
	<53F5B62F.1060305@ebi.ac.uk>
	<9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
Message-ID: <53F5F19B.1010603@ebi.ac.uk>

This is an interesting point!

We use ethernet ( 10g links on the clients) but we dont have a separate 
network for the admin network.

Could you explain this a bit further, because the clients and the 
servers we have are on different subnet so the packet are routed.. I 
don't see a practical way to separate them. The clients are blades in a 
chassis so even if i create 2 interfaces, they will physically use the 
came "cable" to go to the first switch. even the clients ( 600 clients) 
have different subsets.

I will forward this consideration to our network admin , so see if we 
can work on a dedicated network.

thanks for your tip.

Regards,
Salvatore


On 21/08/14 14:03, Vic Cornell wrote:
> Hi Salvatore,
>
> Are you using ethernet or infiniband as the GPFS interconnect to your 
> clients?
>
> If 10/40GbE - do you have a separate admin network?
>
> I have seen behaviour similar to this where the storage traffic causes 
> congestion and the "admin" traffic gets lost or delayed causing expels.
>
> Vic
>
>
>
> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk 
> <mailto:sdinardo at ebi.ac.uk>> wrote:
>
>> Thanks for the feedback, but we managed to find a scenario that 
>> excludes network problems.
>>
>> we have a file called */input_file/* of nearly 100GB:
>>
>> if from *client A* we do:
>>
>> cat input_file >> output_file
>>
>> it start copying.. and we see waiter goeg a bit up,secs but then they 
>> flushes back to 0, so we xcan say that the copy proceed well...
>>
>>
>> if now we do the same from another client ( or just another shell on 
>> the same client) *client B* :
>>
>> cat input_file >> output_file
>>
>>
>>  ( in other words we are trying to write to the same destination) all 
>> the waiters gets up until one node get expelled.
>>
>>
>> Now, while its understandable that the destination file is locked for 
>> one of the "cat", so have to wait ( and since the file is BIG , have 
>> to wait for a while), its not understandable why it stop the renewal 
>> lease.
>> Why its doen't return just a timeout error on the copy instead to 
>> expel the node? We can reproduce this every time, and since our users 
>> to operations like this on files over 100GB each you can imagine the 
>> result.
>>
>>
>>
>> As you can imagine even if its a bit silly to write at the same time 
>> to the same destination, its also quite common if we want to dump to 
>> a log file logs and for some reason one of the writers, write for a 
>> lot of time keeping the file locked.
>> Our expels are not due to network congestion, but because a write 
>> attempts have to wait another one. What i really dont understand is 
>> why to take a so expreme mesure to expell jest because a process is 
>> waiteing "to too much time".
>>
>>
>> I have ticket opened to IBM for this and the issue is under 
>> investigation, but no luck so far..
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>> Hi there,
>>>
>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>>> better term now GSS is out) and seen ping 'working', but alongside 
>>> ejections from the cluster.
>>> The GPFS internode 'ping' is somewhat more circumspect than unix 
>>> ping - and rightly so.
>>>
>>> In my experience this has _always_ been a network issue of one sort 
>>> of another.  If the network is experiencing issues, nodes will be 
>>> ejected.
>>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>>> seen that only twice in 10 years over many versions of GPFS.
>>>
>>> You need to follow the logs through from each machine in time order 
>>> to determine who could not see who and in what order.
>>> Your best way forward is to log a SEV2 case with IBM support, 
>>> directly or via your OEM and collect and supply a snap and traces as 
>>> required by support.
>>>
>>> Without knowing your full setup, it's hard to help further.
>>>
>>> Jez
>>>
>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>> Still problems. Here some more detailed examples:
>>>>
>>>> *EXAMPLE 1:*
>>>>
>>>>             *EBI5-220**( CLIENT)**
>>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>>             reply from node <GSS02B IP> gss02b*
>>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>>             IP> (gss02a in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) to
>>>>             expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:04.982 2014: This node will be
>>>>             expelled from cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220
>>>>             IP> (ebi5-220)
>>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager
>>>>             connection broke. Probing cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>
>>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.
>>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>>             gss02a <c1p687>
>>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>>             gss02b <c1p686>
>>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>>             gss03b <c1p685>
>>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>>             gss03a <c1p684>
>>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>>             gss01b <c1p683>
>>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>>             gss01a <c1p1>
>>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>>             in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) is now the
>>>>             Group Leader.
>>>>
>>>>             *GSS02B ( NSD SERVER)*
>>>>             ...
>>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>>             *<EBI5-220 IP>* because the group is not ready for it
>>>>             to rejoin, err 46
>>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>>             <EBI5-043 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>>             <EBI5-102 IP> because the group is not ready for it to
>>>>             rejoin, err 46
>>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>>             ...
>>>>
>>>>             *GSS02a ( NSD SERVER)*
>>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP>
>>>>             (gss02b) request from <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>).
>>>>             Expelling: <EBI5-220 IP> (ebi5-220 in
>>>>             ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>)
>>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>>
>>>>
>>>> ===============================================
>>>> *EXAMPLE 2*:
>>>>
>>>>             *EBI5-038*
>>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period
>>>>             expired in cluster GSS.ebi.ac.uk
>>>>             <http://GSS.ebi.ac.uk>. Attempting to reacquire lease.*
>>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue.
>>>>             Probing cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>*
>>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>>             Attempting reconnect.
>>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             ...
>>>>             LOT MORE RESETS BY PEER
>>>>             ...
>>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>>             peer). Attempting reconnect.
>>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>>             gss02a <c1n2>
>>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>>             <GSS02A IP> gss02a <c1n2>
>>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>>             destination is still processing previous node failure)
>>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>>             quorum nodes during cluster probe.
>>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>>             cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>>             Unmounting file systems.*
>>>>
>>>>             *GSS02a*
>>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk
>>>>             <http://ebi-cluster.ebi.ac.uk>) *is being expelled
>>>>             because of an expired lease.* Pings sent: 60. Replies
>>>>             received: 60.
>>>>
>>>>
>>>>
>>>>
>>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>>> the servers seems working fine.. how can i trace better ( to solve) 
>>>> the problem?
>>>>
>>>> In example 2 it seems to me that for some reason the manager are 
>>>> not renewing the lease in time. when this happens , its not a 
>>>> single client.
>>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>>> how can i trace to the source of the problem?
>>>>
>>>>
>>>>
>>>> Thanks in advance for any tips.
>>>>
>>>> Regards,
>>>> Salvatore
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss atgpfsug.org  <http://gpfsug.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/bf1a6c40/attachment-0003.htm>

From service at metamodul.com  Thu Aug 21 14:19:33 2014
From: service at metamodul.com (service at metamodul.com)
Date: Thu, 21 Aug 2014 15:19:33 +0200 (CEST)
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5B62F.1060305@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk> <53F454E3.40803@ebi.ac.uk>
	<53F5ABD7.80107@gpfsug.org> <53F5B62F.1060305@ebi.ac.uk>
Message-ID: <1481989063.92260.1408627173332.open-xchange@oxbaltgw09.schlund.de>

> Now, while its understandable that the destination file is locked for one of
> the "cat", so have to wait

If GPFS is posix compatible i do not understand why a cat should block the other
cat completly meanings on a standard FS you can "cat" from many source to the
same target. Of course the result is not predictable.

>From this point of view i would expect that both "cat" would start writing
immediately thus i would expect a GPFS bug.

All imho.
Hajo

Note: You might test which the input_file in a different directory and i would
test the behaviour if the output_file is on a local FS like /tmp.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/f02dd903/attachment-0003.htm>

From viccornell at gmail.com  Thu Aug 21 14:22:22 2014
From: viccornell at gmail.com (Vic Cornell)
Date: Thu, 21 Aug 2014 14:22:22 +0100
Subject: [gpfsug-discuss] gpfs client expels
In-Reply-To: <53F5F19B.1010603@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
	<53F454E3.40803@ebi.ac.uk>	<53F5ABD7.80107@gpfsug.org>
	<53F5B62F.1060305@ebi.ac.uk>
	<9B247872-CD75-4F86-A10E-33AAB6BD414A@gmail.com>
	<53F5F19B.1010603@ebi.ac.uk>
Message-ID: <0F03996A-2008-4076-9A2B-B4B2BB89E959@gmail.com>

For my system I always use a dedicated admin network - as described in the gpfs manuals - for a gpfs cluster on 10/40GbE where the system will be heavily loaded.

The difference in the stability of the system is very noticeable.

Not sure how/if this would work on GSS - IBM ought to know :-)

Vic


On 21 Aug 2014, at 14:18, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:

> This is an interesting point!
> 
> We use ethernet ( 10g links on the clients) but we dont have a separate network for the admin network. 
> 
> Could you explain this a bit further, because the clients and the servers we have are on different subnet so the packet are routed.. I don't see a practical way to separate them. The clients are blades in a chassis so even if i create 2 interfaces, they will physically use the came "cable" to go to the first switch. even the clients ( 600 clients) have different subsets.
> 
> I will forward this consideration to our network admin , so see if we can work on a dedicated network.
> 
> thanks for your tip.
> 
> Regards,
> Salvatore
> 
> 
> 
> 
> On 21/08/14 14:03, Vic Cornell wrote:
>> Hi Salvatore,
>> 
>> Are you using ethernet or infiniband as the GPFS interconnect to your clients?
>> 
>> If 10/40GbE - do you have a separate admin network?
>> 
>> I have seen behaviour similar to this where the storage traffic causes congestion and the "admin" traffic gets lost or delayed causing expels.
>> 
>> Vic
>> 
>> 
>> 
>> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk> wrote:
>> 
>>> Thanks for the feedback, but we managed to find a scenario that excludes network problems.
>>> 
>>> we have a file called input_file of nearly 100GB:
>>> 
>>> if from client A we do:
>>> 
>>> cat input_file >> output_file
>>> 
>>> it start copying.. and we see waiter goeg a bit up,secs but then they flushes back to 0, so we xcan say that the copy proceed well...
>>> 
>>> 
>>> if now we do the same from another client ( or just another shell on the same client) client B :
>>> 
>>> cat input_file >> output_file
>>> 
>>> 
>>>  ( in other words we are trying to write to the same destination) all the waiters gets up until one node get expelled.
>>> 
>>> 
>>> Now, while its understandable that the destination file is locked for one of the "cat", so have to wait ( and since the file is BIG , have to wait for a while), its not understandable why it stop the renewal lease. 
>>> Why its doen't return just a timeout error on the copy  instead to expel the node? We can reproduce this every time, and since our users to operations like this on files over 100GB each you can imagine the result.
>>> 
>>> 
>>> 
>>> As you can imagine even if its a bit silly to write at the same time to the same destination, its also quite common if we want to dump to a log file logs and for some reason one of the writers, write for a lot of time keeping the file locked.
>>> Our expels are not due to network congestion, but because a write attempts have to wait another one. What i really dont understand is why to take a so expreme mesure to expell jest because a process is waiteing "to too much time".
>>> 
>>> 
>>> I have ticket opened to IBM for this and the issue is under investigation, but no luck so far..
>>> 
>>> Regards,
>>> Salvatore
>>> 
>>> 
>>> 
>>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>>> Hi there,
>>>> 
>>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a better term now GSS is out) and seen ping 'working', but alongside ejections from the cluster.
>>>> The GPFS internode 'ping' is somewhat more circumspect than unix ping - and rightly so.
>>>> 
>>>> In my experience this has _always_ been a network issue of one sort of another.  If the network is experiencing issues, nodes will be ejected.
>>>> Of course it could be unresponsive mmfsd or high loadavg, but I've seen that only twice in 10 years over many versions of GPFS.
>>>> 
>>>> You need to follow the logs through from each machine in time order to determine who could not see who and in what order.
>>>> Your best way forward is to log a SEV2 case with IBM support, directly or via your OEM and collect and supply a snap and traces as required by support.
>>>> 
>>>> Without knowing your full setup, it's hard to help further.
>>>> 
>>>> Jez
>>>> 
>>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>>> Still problems. Here some more detailed examples:
>>>>> 
>>>>> EXAMPLE 1:
>>>>> EBI5-220 ( CLIENT)
>>>>> Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node <GSS02B IP> gss02b
>>>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
>>>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during cluster probe.
>>>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: gpfs1  Reason: SGPanic
>>>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) is now the Group Leader.
>>>>> 
>>>>> GSS02B ( NSD SERVER)
>>>>> ...
>>>>> Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>>>> ...
>>>>> 
>>>>> GSS02a ( NSD SERVER)
>>>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> ebi5-220 <c0n618>
>>>>> 
>>>>> 
>>>>> ===============================================
>>>>> EXAMPLE 2:
>>>>> 
>>>>> EBI5-038
>>>>> Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.
>>>>> Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by peer). Attempting reconnect.
>>>>> ...
>>>>> LOT MORE RESETS BY PEER
>>>>> ...
>>>>> Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
>>>>> Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>>>> Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a <c1n2>
>>>>> Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a <c1n2> (Connection failed because destination is still processing previous node failure)
>>>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during cluster probe.
>>>>> Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. Unmounting file systems.
>>>>> 
>>>>> GSS02a
>>>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. Pings sent: 60. Replies received: 60.
>>>>> 
>>>>> 
>>>>> 
>>>>> In example 1 seems that an NSD was not repliyng to the client, but the servers seems working fine.. how can i trace better ( to solve) the problem? 
>>>>> 
>>>>> In example 2 it seems to me that for some reason the manager are not renewing the lease in time. when this happens , its not a single client. 
>>>>> Loads of them fail to get the lease renewed. Why this is happening? how can i trace to the source of the problem?
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks in advance for any tips.
>>>>> 
>>>>> Regards,
>>>>> Salvatore
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at gpfsug.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at gpfsug.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/a46c2c76/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Fri Aug 22 10:37:42 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Fri, 22 Aug 2014 10:37:42 +0100
Subject: [gpfsug-discuss] gpfs client expels, fs hangind and waiters
In-Reply-To: <53EE0BB1.8000005@ebi.ac.uk>
References: <53EE0BB1.8000005@ebi.ac.uk>
Message-ID: <53F70F66.2010405@ebi.ac.uk>

Hello everyone,

Just to let you know, we found the cause of our problems.

We discovered that not all of the recommend kernel setting was 
configured on the clients ( on server was everything ok, but the clients 
had some setting  missing ), and
IBM support pointed to this document that describes perfectly our issues 
and the fix wich suggest to raise some parameters even higher than the 
standard "best practice" :


http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5091222


Thanks to everyone for the replies.


Regards,
Salvatore


From ewahl at osc.edu  Mon Aug 25 19:55:08 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Mon, 25 Aug 2014 18:55:08 +0000
Subject: [gpfsug-discuss] CNFS using NFS over RDMA?
Message-ID: <C59E5201836F7147BAD35189FFBB35D101164D42DF@USOAPP09V04P.si.lan>

Anyone out there doing CNFS with NFS over RDMA?  Is this even possible?

We currently have been delivering some CNFS services using TCP over IB, but that layer tends to have a large number of bugs all the time.  Like to take a look at moving back down to verbs...

Ed Wahl
OSC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140825/bd329ccd/attachment-0003.htm>