From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:03:21 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:03:21 +0000
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just
	has
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>

Just had a lovely one.

As I'm, sure all of you are aware by now, there's been much fun with some of the NTP Stratum 1 servers not correctly accounting for the leap-seocnd last night.

http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html

You may wish to turn off ntp on your servers and correct your NTP to trusted servers.
A clock skew from ntp.pool.org just took out one of our servers and the node was expelled from the cluster.

Jez
---
Jez Tucker
Senior Sysadmin
Rushes
DDI: +44 (0) 207 851 6276
http://www.rushes.co.uk<http://www.rushes.co.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/174c836c/attachment.htm>

From orlando.richards at ed.ac.uk  Mon Jul  2 14:12:46 2012
From: orlando.richards at ed.ac.uk (Orlando Richards)
Date: Mon, 02 Jul 2012 14:12:46 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <4FF19E4E.3040802@ed.ac.uk>

Hi Jez,

We've had a few issues with the leap second - but so far it has been 
isolated to redhat 6.2 systems.

What OS are you running on your affected server?

Cheers,
Orlando.


On 02/07/12 14:03, Jez Tucker wrote:
> Just had a lovely one.
>
> As I?m, sure all of you are aware by now, there?s been much fun with
> some of the NTP Stratum 1 servers not correctly accounting for the
> leap-seocnd last night.
>
> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>
> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>
> You may wish to turn off ntp on your servers and correct your NTP to
> trusted servers.
>
> A clock skew from ntp.pool.org just took out one of our servers and the
> node was expelled from the cluster.
>
> Jez
>
> ---
>
> Jez Tucker
>
> Senior Sysadmin
>
> Rushes
>
> DDI: +44 (0) 207 851 6276
>
> http://www.rushes.co.uk <http://www.rushes.co.uk/>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
             --
    Dr Orlando Richards
   Information Services
IT Infrastructure Division
        Unix Section
     Tel: 0131 650 4994

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.


From bevans at canditmedia.co.uk  Mon Jul  2 14:27:52 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 14:27:52 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
	just has
In-Reply-To: <4FF19E4E.3040802@ed.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF19E4E.3040802@ed.ac.uk>
Message-ID: <C74ABA98-34B5-4A32-856B-2CAF51D01EEF@canditmedia.co.uk>

I've had a few (this is growing number as the day goes on) number of customers with IBM Storage Manager eating loads of CPU cycles and causing slow downs as a result (SLES 10 SP1, mostly). 

The common factor, at least at this end, seems to be ntp sycning against public ntp pool servers and Java, but Jez's report...... scary stuff...

Cheers,
Barry


On 2 Jul 2012, at 14:12, Orlando Richards wrote:

> Hi Jez,
> 
> We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems.
> 
> What OS are you running on your affected server?
> 
> Cheers,
> Orlando.
> 
> 
> On 02/07/12 14:03, Jez Tucker wrote:
>> Just had a lovely one.
>> 
>> As I?m, sure all of you are aware by now, there?s been much fun with
>> some of the NTP Stratum 1 servers not correctly accounting for the
>> leap-seocnd last night.
>> 
>> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>> 
>> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>> 
>> You may wish to turn off ntp on your servers and correct your NTP to
>> trusted servers.
>> 
>> A clock skew from ntp.pool.org just took out one of our servers and the
>> node was expelled from the cluster.
>> 
>> Jez
>> 
>> ---
>> 
>> Jez Tucker
>> 
>> Senior Sysadmin
>> 
>> Rushes
>> 
>> DDI: +44 (0) 207 851 6276
>> 
>> http://www.rushes.co.uk <http://www.rushes.co.uk/>
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> -- 
>            --
>   Dr Orlando Richards
>  Information Services
> IT Infrastructure Division
>       Unix Section
>    Tel: 0131 650 4994
> 
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/25691e28/attachment.htm>

From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:47:21 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:47:21 +0000
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <4FF19E4E.3040802@ed.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF19E4E.3040802@ed.ac.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9010@WARVWEXC1.uk.deluxe-eu.com>

Funnily enough RH Ent 6.2

> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Orlando Richards
> Sent: 02 July 2012 14:13
> To: gpfsug-discuss at gpfsug.org
> Subject: Re: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
> just has
> 
> Hi Jez,
> 
> We've had a few issues with the leap second - but so far it has been isolated
> to redhat 6.2 systems.
> 
> What OS are you running on your affected server?
> 
> Cheers,
> Orlando.
> 
> 
> On 02/07/12 14:03, Jez Tucker wrote:
> > Just had a lovely one.
> >
> > As I'm, sure all of you are aware by now, there's been much fun with
> > some of the NTP Stratum 1 servers not correctly accounting for the
> > leap-seocnd last night.
> >
> > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-
> Websi
> > tes-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
> >
> > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-
> brings-
> > down-top-websites.html
> >
> > You may wish to turn off ntp on your servers and correct your NTP to
> > trusted servers.
> >
> > A clock skew from ntp.pool.org just took out one of our servers and
> > the node was expelled from the cluster.
> >
> > Jez
> >
> > ---
> >
> > Jez Tucker
> >
> > Senior Sysadmin
> >
> > Rushes
> >
> > DDI: +44 (0) 207 851 6276
> >
> > http://www.rushes.co.uk <http://www.rushes.co.uk/>
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at gpfsug.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> --
>              --
>     Dr Orlando Richards
>    Information Services
> IT Infrastructure Division
>         Unix Section
>      Tel: 0131 650 4994
> 
> The University of Edinburgh is a charitable body, registered in Scotland, with
> registration number SC005336.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From j.buzzard at dundee.ac.uk  Mon Jul  2 14:59:33 2012
From: j.buzzard at dundee.ac.uk (Jonathan Buzzard)
Date: Mon, 2 Jul 2012 14:59:33 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <4FF1A945.5000100@dundee.ac.uk>

On 02/07/12 14:03, Jez Tucker wrote:
> Just had a lovely one.
>
> As I?m, sure all of you are aware by now, there?s been much fun with
> some of the NTP Stratum 1 servers not correctly accounting for the
> leap-seocnd last night.
>
> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>
> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>
> You may wish to turn off ntp on your servers and correct your NTP to
> trusted servers.
>
> A clock skew from ntp.pool.org just took out one of our servers and the
> node was expelled from the cluster.

Hum, not sure I would run my production servers directly off something
from ntp.pool.org, I would at least put a local server in between.

Not notice any problems here, but then we are running latest RHEL 5.8
and latest IBM Storage Manager (10.83) :-)

JAB.

--
Jonathan A. Buzzard             Tel: +441382-386998
Storage Administrator, College of Life Sciences
University of Dundee, DD1 5EH

The University of Dundee is a registered Scottish Charity, No: SC015096


From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:59:34 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:59:34 +0000
Subject: [gpfsug-discuss] Samba mapping of "special" SID entries
In-Reply-To: <F68FAAD16AEC9744BD921F91014D2F0003DFB8@MBX06.ad.oak.ox.ac.uk>
References: <4FE486B2.1050501@ed.ac.uk>
	<F68FAAD16AEC9744BD921F91014D2F0003DFB8@MBX06.ad.oak.ox.ac.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9047@WARVWEXC1.uk.deluxe-eu.com>

Now I've located my GPFSUG from within Outlook...

I'm presuming you're creating an ACL with the equivalent of 2775 permissions and the owner file system being 'nfsv4', rather than 'all'?
Your nfsv3 clients have nfsv4 acl support installed?

Jez


> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Luke Raimbach
> Sent: 22 June 2012 17:33
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] Samba mapping of "special" SID entries
> 
> Hi Orlando,
> 
> I've been having success using Centrify to manage UID/GID mappings for our
> very small mixed cluster (7 x Linux, 1 x Windows 2008R2).
> 
> I've created a map for "CREATOR / OWNER", "SYSTEM", "Domain Admins",
> etc. group SIDs and use the Windows node to manage ACLs. When the
> windows node applies the ACLs, these seem to translate successfully in to
> GPFS ACLs and work nicely for the mixed environment allowing users on
> both Linux and Windows systems to manipulate each other's files.
> 
> People are mounting the FS via NFS (exported via the NSD Linux servers)
> and CIFS (shared from Win2k8R2). The permissions don't look friendly when
> you run ls -l on a Linux system over NFS but the ACLs do their job in
> preserving inheritable permissions, etc. If people want to see the 'real' ACL,
> they need to use mmgetacl on a GPFS attached node (or windows users
> simply click on the security tab under properties of a file).
> 
> Drop me a line off-list if you want to take a look at what we've got remotely.
> I can run a webex session from the Windows node if you want to have a
> good poke around.
> 
> Luke.
> 
> --
> 
> Luke Raimbach
> IT Manager
> Oxford e-Research Centre
> 7 Keble Road,
> Oxford,
> OX1 3QG
> 
> +44(0)1865 610639
> 
> 
> 
> 
> > -----Original Message-----
> > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> > bounces at gpfsug.org] On Behalf Of Orlando Richards
> > Sent: 22 June 2012 15:53
> > To: gpfsug-discuss at gpfsug.org
> > Subject: [gpfsug-discuss] Samba mapping of "special" SID entries
> >
> > Hi all,
> >
> > Has anyone bumped up against the "nfs4: special" option in GPFS/Samba
> > deployments which manipulates how the "owner" and "group owner"
> (and
> > "everybody") behaviour is mapped to ACLs when accessed via the samba
> > stack?
> >
> > In particular, with the "default" setting (if one blindly follows the
> > worked examples on this) of nfs4: special, if a user adds themselves
> > specifically to an ACL, this creates an entry:
> >
> > special:@owner
> >
> > rather than:
> >
> > user:username
> >
> > which has the knock-on effect that if a file/folder is created under
> > this ACL by a different owner (or if ownership changes), the person
> > who put said ACL on to the file/folder no longer has access. Most
> > people find this confusing (which is putting it politely).
> >
> > To further complicate matters, the "special" windows SID's*[1] - such
> > as "CREATOR/OWNER" -  don't seem to work properly in the
> > ctdb/samba/gpfs stack (I don't know if they do in "normal" samba
> > though). IBM don't support CREATOR/OWNER in SONAS*[2] - so it's not
> just me!
> >
> > So my question is - has anyone else been looking into this at all, and
> > if so, do you have any sage words of wisdom to offer?
> >
> > Cheers,
> > Orlando.
> >
> >
> > *[1] http://support.microsoft.com/kb/163846
> > *[2]
> > http://pic.dhe.ibm.com/infocenter/sonasic/sonas1ic/index.jsp?topic=%2F
> > c om.ibm.sonas.doc%2Fadm_authorization_limitations.html
> >
> >
> > --
> >              --
> >     Dr Orlando Richards
> >    Information Services
> > IT Infrastructure Division
> >         Unix Section
> >      Tel: 0131 650 4994
> >
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at gpfsug.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Mon Jul  2 15:05:25 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 14:05:25 +0000
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <4FE8720C.7040007@gmail.com>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>

Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
That said, I've not tried it yet.  It's on my list of things to R&D.

OFED/ROCE/iWARP:

http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html


From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
Sent: 25 June 2012 15:14
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] HPC people - interconnects

On 25/06/12 15:08, Jez Tucker wrote:
Do you all use IB?

Has anyone tried RDMA over 10G via the OFED stack?


Most of our customers we use RDMA over verbs

Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?


--

regards,


Arif
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/0c5cd468/attachment.htm>

From bevans at canditmedia.co.uk  Mon Jul  2 22:16:15 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 22:16:15 +0100
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
	<39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>

It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards

Cheers,
B


On 2 Jul 2012, at 15:05, Jez Tucker wrote:

> Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
> That said, I?ve not tried it yet.  It?s on my list of things to R&D.
>  
> OFED/ROCE/iWARP:
>  
> http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html
>  
>  
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
> Sent: 25 June 2012 15:14
> To: gpfsug-discuss at gpfsug.org
> Subject: Re: [gpfsug-discuss] HPC people - interconnects
>  
> On 25/06/12 15:08, Jez Tucker wrote:
> Do you all use IB?
>  
> Has anyone tried RDMA over 10G via the OFED stack?
>  
>  
> Most of our customers we use RDMA over verbs
> 
> Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?
> 
> -- 
> regards,
>  
> Arif
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/85045865/attachment.htm>

From bevans at canditmedia.co.uk  Mon Jul  2 22:25:02 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 22:25:02 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
	just has
In-Reply-To: <4FF1A945.5000100@dundee.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF1A945.5000100@dundee.ac.uk>
Message-ID: <EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>

This has so far hit all almost all of the places I work with (not so much GPFS crashing, but certainly storage manager going bezerk) - the majority of them do not use public NTP servers. In most cases no one actually noticed until it was pointed out, well worth a quick 'top' of your storage servers if you're using Engenio/LSI/NetApp based units (ie, DS3/4/5000).

The fix is here: http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/

But of course I wouldn't mess with the time in production unless you've got GPFS shutdown first.

Cheers,
B

 
On 2 Jul 2012, at 14:59, Jonathan Buzzard wrote:

> On 02/07/12 14:03, Jez Tucker wrote:
>> Just had a lovely one.
>> 
>> As I?m, sure all of you are aware by now, there?s been much fun with
>> some of the NTP Stratum 1 servers not correctly accounting for the
>> leap-seocnd last night.
>> 
>> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>> 
>> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>> 
>> You may wish to turn off ntp on your servers and correct your NTP to
>> trusted servers.
>> 
>> A clock skew from ntp.pool.org just took out one of our servers and the
>> node was expelled from the cluster.
> 
> Hum, not sure I would run my production servers directly off something
> from ntp.pool.org, I would at least put a local server in between.
> 
> Not notice any problems here, but then we are running latest RHEL 5.8
> and latest IBM Storage Manager (10.83) :-)
> 
> JAB.
> 
> --
> Jonathan A. Buzzard             Tel: +441382-386998
> Storage Administrator, College of Life Sciences
> University of Dundee, DD1 5EH
> 
> The University of Dundee is a registered Scottish Charity, No: SC015096
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/6b478a8d/attachment.htm>

From Jez.Tucker at rushes.co.uk  Tue Jul  3 11:38:10 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Tue, 3 Jul 2012 10:38:10 +0000
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
	<39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>,
	<697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A93A7@WARVWEXC1.uk.deluxe-eu.com>

Here's the stack:

https://www.openfabrics.org/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html

VERBS is supported over 10GbE.  It should work if OFED VERBS == IBM VERBS.

---
Jez Tucker
Senior SysAdmin
Rushes
www.rushes.co.uk<http://www.rushes.co.uk>
________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Barry Evans [bevans at canditmedia.co.uk]
Sent: 02 July 2012 22:16
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] HPC people - interconnects

It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards

Cheers,
B


On 2 Jul 2012, at 15:05, Jez Tucker wrote:

Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
That said, I?ve not tried it yet.  It?s on my list of things to R&D.

OFED/ROCE/iWARP:

http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html


From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
Sent: 25 June 2012 15:14
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] HPC people - interconnects

On 25/06/12 15:08, Jez Tucker wrote:
Do you all use IB?

Has anyone tried RDMA over 10G via the OFED stack?


Most of our customers we use RDMA over verbs

Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?


--

regards,


Arif

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120703/2c9679e9/attachment.htm>

From j.buzzard at dundee.ac.uk  Tue Jul  3 12:01:33 2012
From: j.buzzard at dundee.ac.uk (Jonathan Buzzard)
Date: Tue, 3 Jul 2012 12:01:33 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF1A945.5000100@dundee.ac.uk>
	<EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>
Message-ID: <4FF2D10D.2030701@dundee.ac.uk>

On 02/07/12 22:25, Barry Evans wrote:
> This has so far hit all almost all of the places I work with (not so
> much GPFS crashing, but certainly storage manager going bezerk) - the
> majority of them do not use public NTP servers. In most cases no one
> actually noticed until it was pointed out, well worth a quick 'top' of
> your storage servers if you're using Engenio/LSI/NetApp based units (ie,
> DS3/4/5000).
>
> The fix is here:
> http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/
>

Or you can upgrade to the latest version of storage manager. We are
running 10.83 and it sailed through without issue. Now admittedly most
people have probably not upgraded as it has only been out for a couple
of weeks. I was very prompt on the upgrade as it allows the one Storage
Manager install to manage both IBM DS3/4/5000 and Dell MD3xxx from the
same program.

JAB.

--
Jonathan A. Buzzard             Tel: +441382-386998
Storage Administrator, College of Life Sciences
University of Dundee, DD1 5EH

The University of Dundee is a registered Scottish Charity, No: SC015096


From sfadden at us.ibm.com  Thu Jul  5 17:25:22 2012
From: sfadden at us.ibm.com (Scott Fadden)
Date: Thu, 5 Jul 2012 09:25:22 -0700
Subject: [gpfsug-discuss] GPFS Service Bulletin - July 5 2012
Message-ID: <OF566CBF8E.739C89D9-ON88257A32.005A1671-88257A32.005A3696@us.ibm.com>

Date Added: July 5, 2012
Issue:
IBM has identified an issue with GPFS file systems at versions 3.4 or 3.5 
which were migrated from file systems created with GPFS versions earlier 
than 3.4. This issue can occur only after using the mmmigratefs command 
with the [--fastea]option.
The issue can result in a loss of data, requiring the restoration of data 
from a backup source.
GPFS file systems created with versions earlier than 3.4 should not be 
migrated using the mmmigratefs command with the [--fastea] option until a 
fix is provided from IBM. IBM plans to make the fix available in GPFS 
versions
3.5.0.3 (APAR IV24151) and 3.4.0.15 (APAR IV24150). An ifix will also be 
available from IBM service.
If customers have already migrated file systems from GPFS versions earlier 
than 3.4, IBM service has a fix. Please follow the steps below to 
determine if your system may be affected.
To determine if your system may be affected:
1. Ensure your GPFS file systems are mounted.
2. As a user with GPFS administrator privileges on a machine where your 
GPFS file systems are mounted, issue the command:
        /usr/lpp/mmfs/bin/mmfsadm dump stripe | grep "inode 0"
The command will produce output that identifies locations for the "inode 
0" file for all currently
mounted GPFS file systems. Example output for a file system configured 
with two way meta-data
replication would be in the form:
        inode 0: 3:4098 1:4098
For a file system with no meta-data replication the output would be in the 
form:
        inode 0: 3:4098
The relevant information to look for to see if you may experience a 
problem are the fields denoting <disk>:<sector> for each inode 0 replica 
(e.g. 3:4098 and 1:4098 in these examples).
If each <disk>:<sector> replica only denotes 4098 for the sector field 
then you are not experiencing this problem. If however there is a number 
other than 4098 in the sector output then you are requested to immediately 
call IBM service and reference this problem. The IBM service person will 
walk you thru a fix for correcting the issue.


Scott Fadden
GPFS Technical Marketing 
Desk: (503) 578-5630 
Cell: (503) 880-5833 
sfadden at us.ibm.com
http://www.ibm.com/systems/gpfs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120705/193c1254/attachment.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul  6 11:44:49 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 6 Jul 2012 10:44:49 +0000
Subject: [gpfsug-discuss] Almaden Labs pNFS attending September 20th User
	Group
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AB3F7@WARVWEXC1.uk.deluxe-eu.com>

Hello all

  I've received confirmation from Dean Hildebrand from the Almaden Research Labs that he will attend the September User Group to present pNFS.

Dean is available on the 21st to meet post user group for relevant discussion.
He will be based on London throughout his visit, owing to the proximity to AWE.

If you would like to meet Dean to discuss pNFS further, please arrange this with myself via email.

Cheers

Jez
---
Jez Tucker
Senior Sysadmin
Rushes

GPFSUG Chairman (chair at gpfsug.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120706/1e11d594/attachment.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul 20 09:28:19 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 20 Jul 2012 08:28:19 +0000
Subject: [gpfsug-discuss] Your NSD server loadavg?
Message-ID: <39571EA9316BE44899D59C7A640C13F5305ADDFF@WARVWEXC1.uk.deluxe-eu.com>

Hello

  Just curious to see what your NSD server's loadavg is when under a normal job processing load.  I.E SGE running tasks over NFS.


---
Jez Tucker
Senior Sysadmin
Rushes
DDI: +44 (0) 207 851 6276
http://www.rushes.co.uk<http://www.rushes.co.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120720/895ab83e/attachment.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul 20 18:02:44 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 20 Jul 2012 17:02:44 +0000
Subject: [gpfsug-discuss] Great perf tool
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>


http://collectl.sourceforge.net/index.html

---
Jez Tucker
Senior Sysadmin
Rushes

GPFSUG Chairman (chair at gpfsug.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120720/dbf4e765/attachment.htm>

From janfrode at tanso.net  Fri Jul 20 23:23:54 2012
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Sat, 21 Jul 2012 00:23:54 +0200
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <20120720222354.GB12126@dibs.tanso.net>

On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
> 
> http://collectl.sourceforge.net/index.html

Another great framework for collecting performance data is Performance
CoPilot:

	http://oss.sgi.com/projects/pcp/

It can collect and play live or re-play archived data from several nodes
in the same gui (or tui) player.

PCP is finally scheduled for inclusion in RHEL, so it's hopefully no
longer only us old SGI IRIX admins that will be using it anymore :-)

	http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf

It's already available in EPEL.


  -jf


From viccornell at gmail.com  Sat Jul 21 10:43:33 2012
From: viccornell at gmail.com (Vic Cornell)
Date: Sat, 21 Jul 2012 10:43:33 +0100
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <20120720222354.GB12126@dibs.tanso.net>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
	<20120720222354.GB12126@dibs.tanso.net>
Message-ID: <6259820223170424573@unknownmsgid>

I second that. The great thing about PCP is that it will monitor down
to 1/10 second which is really usefull when you want to see what is
realy going on.

Good news about it inclusion in RHEL. It also has a mac and windows
version so that you can instrument an entire setup and monitor it on
the box of your choice.

Runs best under IRIX though. . . .

Kind Regards,

Vic

Vic Cornell
Application Support Engineer
DataDirect Networks
Davidson House
Forbury Square
Reading
RG1 3EU
United Kingdom

Mobile 07900 660 266
Skype viccornell


www.ddn.com
This email may contain confidential and privileged material for the
sole use of the intended recipient.  Any review or distribution by
others is strictly prohibited.  If you are not the intended recipient
please contact the sender and delete all copies


On 20 Jul 2012, at 23:24, Jan-Frode Myklebust <janfrode at tanso.net> wrote:

> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
>>
>> http://collectl.sourceforge.net/index.html
>
> Another great framework for collecting performance data is Performance
> CoPilot:
>
>    http://oss.sgi.com/projects/pcp/
>
> It can collect and play live or re-play archived data from several nodes
> in the same gui (or tui) player.
>
> PCP is finally scheduled for inclusion in RHEL, so it's hopefully no
> longer only us old SGI IRIX admins that will be using it anymore :-)
>
>    http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf
>
> It's already available in EPEL.
>
>
>  -jf
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Tue Jul 24 09:22:13 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Tue, 24 Jul 2012 08:22:13 +0000
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <20120720222354.GB12126@dibs.tanso.net>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
	<20120720222354.GB12126@dibs.tanso.net>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE76B@WARVWEXC1.uk.deluxe-eu.com>

Perhaps you or Vic could give a quick run through PCP at the next UG meeting?

> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Jan-Frode Myklebust
> Sent: 20 July 2012 23:24
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] Great perf tool
> 
> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
> >
> > http://collectl.sourceforge.net/index.html
> 
> Another great framework for collecting performance data is Performance
> CoPilot:
> 
> 	http://oss.sgi.com/projects/pcp/
> 
> It can collect and play live or re-play archived data from several nodes in the
> same gui (or tui) player.
> 
> PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer
> only us old SGI IRIX admins that will be using it anymore :-)
> 
> 	http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap
> .pdf
> 
> It's already available in EPEL.
> 
> 
>   -jf
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:03:21 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:03:21 +0000
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just
	has
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>

Just had a lovely one.

As I'm, sure all of you are aware by now, there's been much fun with some of the NTP Stratum 1 servers not correctly accounting for the leap-seocnd last night.

http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html

You may wish to turn off ntp on your servers and correct your NTP to trusted servers.
A clock skew from ntp.pool.org just took out one of our servers and the node was expelled from the cluster.

Jez
---
Jez Tucker
Senior Sysadmin
Rushes
DDI: +44 (0) 207 851 6276
http://www.rushes.co.uk<http://www.rushes.co.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/174c836c/attachment-0001.htm>

From orlando.richards at ed.ac.uk  Mon Jul  2 14:12:46 2012
From: orlando.richards at ed.ac.uk (Orlando Richards)
Date: Mon, 02 Jul 2012 14:12:46 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <4FF19E4E.3040802@ed.ac.uk>

Hi Jez,

We've had a few issues with the leap second - but so far it has been 
isolated to redhat 6.2 systems.

What OS are you running on your affected server?

Cheers,
Orlando.


On 02/07/12 14:03, Jez Tucker wrote:
> Just had a lovely one.
>
> As I?m, sure all of you are aware by now, there?s been much fun with
> some of the NTP Stratum 1 servers not correctly accounting for the
> leap-seocnd last night.
>
> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>
> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>
> You may wish to turn off ntp on your servers and correct your NTP to
> trusted servers.
>
> A clock skew from ntp.pool.org just took out one of our servers and the
> node was expelled from the cluster.
>
> Jez
>
> ---
>
> Jez Tucker
>
> Senior Sysadmin
>
> Rushes
>
> DDI: +44 (0) 207 851 6276
>
> http://www.rushes.co.uk <http://www.rushes.co.uk/>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
             --
    Dr Orlando Richards
   Information Services
IT Infrastructure Division
        Unix Section
     Tel: 0131 650 4994

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.


From bevans at canditmedia.co.uk  Mon Jul  2 14:27:52 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 14:27:52 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
	just has
In-Reply-To: <4FF19E4E.3040802@ed.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF19E4E.3040802@ed.ac.uk>
Message-ID: <C74ABA98-34B5-4A32-856B-2CAF51D01EEF@canditmedia.co.uk>

I've had a few (this is growing number as the day goes on) number of customers with IBM Storage Manager eating loads of CPU cycles and causing slow downs as a result (SLES 10 SP1, mostly). 

The common factor, at least at this end, seems to be ntp sycning against public ntp pool servers and Java, but Jez's report...... scary stuff...

Cheers,
Barry


On 2 Jul 2012, at 14:12, Orlando Richards wrote:

> Hi Jez,
> 
> We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems.
> 
> What OS are you running on your affected server?
> 
> Cheers,
> Orlando.
> 
> 
> On 02/07/12 14:03, Jez Tucker wrote:
>> Just had a lovely one.
>> 
>> As I?m, sure all of you are aware by now, there?s been much fun with
>> some of the NTP Stratum 1 servers not correctly accounting for the
>> leap-seocnd last night.
>> 
>> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>> 
>> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>> 
>> You may wish to turn off ntp on your servers and correct your NTP to
>> trusted servers.
>> 
>> A clock skew from ntp.pool.org just took out one of our servers and the
>> node was expelled from the cluster.
>> 
>> Jez
>> 
>> ---
>> 
>> Jez Tucker
>> 
>> Senior Sysadmin
>> 
>> Rushes
>> 
>> DDI: +44 (0) 207 851 6276
>> 
>> http://www.rushes.co.uk <http://www.rushes.co.uk/>
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> -- 
>            --
>   Dr Orlando Richards
>  Information Services
> IT Infrastructure Division
>       Unix Section
>    Tel: 0131 650 4994
> 
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/25691e28/attachment-0001.htm>

From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:47:21 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:47:21 +0000
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <4FF19E4E.3040802@ed.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF19E4E.3040802@ed.ac.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9010@WARVWEXC1.uk.deluxe-eu.com>

Funnily enough RH Ent 6.2

> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Orlando Richards
> Sent: 02 July 2012 14:13
> To: gpfsug-discuss at gpfsug.org
> Subject: Re: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
> just has
> 
> Hi Jez,
> 
> We've had a few issues with the leap second - but so far it has been isolated
> to redhat 6.2 systems.
> 
> What OS are you running on your affected server?
> 
> Cheers,
> Orlando.
> 
> 
> On 02/07/12 14:03, Jez Tucker wrote:
> > Just had a lovely one.
> >
> > As I'm, sure all of you are aware by now, there's been much fun with
> > some of the NTP Stratum 1 servers not correctly accounting for the
> > leap-seocnd last night.
> >
> > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-
> Websi
> > tes-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
> >
> > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-
> brings-
> > down-top-websites.html
> >
> > You may wish to turn off ntp on your servers and correct your NTP to
> > trusted servers.
> >
> > A clock skew from ntp.pool.org just took out one of our servers and
> > the node was expelled from the cluster.
> >
> > Jez
> >
> > ---
> >
> > Jez Tucker
> >
> > Senior Sysadmin
> >
> > Rushes
> >
> > DDI: +44 (0) 207 851 6276
> >
> > http://www.rushes.co.uk <http://www.rushes.co.uk/>
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at gpfsug.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> --
>              --
>     Dr Orlando Richards
>    Information Services
> IT Infrastructure Division
>         Unix Section
>      Tel: 0131 650 4994
> 
> The University of Edinburgh is a charitable body, registered in Scotland, with
> registration number SC005336.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From j.buzzard at dundee.ac.uk  Mon Jul  2 14:59:33 2012
From: j.buzzard at dundee.ac.uk (Jonathan Buzzard)
Date: Mon, 2 Jul 2012 14:59:33 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <4FF1A945.5000100@dundee.ac.uk>

On 02/07/12 14:03, Jez Tucker wrote:
> Just had a lovely one.
>
> As I?m, sure all of you are aware by now, there?s been much fun with
> some of the NTP Stratum 1 servers not correctly accounting for the
> leap-seocnd last night.
>
> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>
> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>
> You may wish to turn off ntp on your servers and correct your NTP to
> trusted servers.
>
> A clock skew from ntp.pool.org just took out one of our servers and the
> node was expelled from the cluster.

Hum, not sure I would run my production servers directly off something
from ntp.pool.org, I would at least put a local server in between.

Not notice any problems here, but then we are running latest RHEL 5.8
and latest IBM Storage Manager (10.83) :-)

JAB.

--
Jonathan A. Buzzard             Tel: +441382-386998
Storage Administrator, College of Life Sciences
University of Dundee, DD1 5EH

The University of Dundee is a registered Scottish Charity, No: SC015096


From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:59:34 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:59:34 +0000
Subject: [gpfsug-discuss] Samba mapping of "special" SID entries
In-Reply-To: <F68FAAD16AEC9744BD921F91014D2F0003DFB8@MBX06.ad.oak.ox.ac.uk>
References: <4FE486B2.1050501@ed.ac.uk>
	<F68FAAD16AEC9744BD921F91014D2F0003DFB8@MBX06.ad.oak.ox.ac.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9047@WARVWEXC1.uk.deluxe-eu.com>

Now I've located my GPFSUG from within Outlook...

I'm presuming you're creating an ACL with the equivalent of 2775 permissions and the owner file system being 'nfsv4', rather than 'all'?
Your nfsv3 clients have nfsv4 acl support installed?

Jez


> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Luke Raimbach
> Sent: 22 June 2012 17:33
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] Samba mapping of "special" SID entries
> 
> Hi Orlando,
> 
> I've been having success using Centrify to manage UID/GID mappings for our
> very small mixed cluster (7 x Linux, 1 x Windows 2008R2).
> 
> I've created a map for "CREATOR / OWNER", "SYSTEM", "Domain Admins",
> etc. group SIDs and use the Windows node to manage ACLs. When the
> windows node applies the ACLs, these seem to translate successfully in to
> GPFS ACLs and work nicely for the mixed environment allowing users on
> both Linux and Windows systems to manipulate each other's files.
> 
> People are mounting the FS via NFS (exported via the NSD Linux servers)
> and CIFS (shared from Win2k8R2). The permissions don't look friendly when
> you run ls -l on a Linux system over NFS but the ACLs do their job in
> preserving inheritable permissions, etc. If people want to see the 'real' ACL,
> they need to use mmgetacl on a GPFS attached node (or windows users
> simply click on the security tab under properties of a file).
> 
> Drop me a line off-list if you want to take a look at what we've got remotely.
> I can run a webex session from the Windows node if you want to have a
> good poke around.
> 
> Luke.
> 
> --
> 
> Luke Raimbach
> IT Manager
> Oxford e-Research Centre
> 7 Keble Road,
> Oxford,
> OX1 3QG
> 
> +44(0)1865 610639
> 
> 
> 
> 
> > -----Original Message-----
> > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> > bounces at gpfsug.org] On Behalf Of Orlando Richards
> > Sent: 22 June 2012 15:53
> > To: gpfsug-discuss at gpfsug.org
> > Subject: [gpfsug-discuss] Samba mapping of "special" SID entries
> >
> > Hi all,
> >
> > Has anyone bumped up against the "nfs4: special" option in GPFS/Samba
> > deployments which manipulates how the "owner" and "group owner"
> (and
> > "everybody") behaviour is mapped to ACLs when accessed via the samba
> > stack?
> >
> > In particular, with the "default" setting (if one blindly follows the
> > worked examples on this) of nfs4: special, if a user adds themselves
> > specifically to an ACL, this creates an entry:
> >
> > special:@owner
> >
> > rather than:
> >
> > user:username
> >
> > which has the knock-on effect that if a file/folder is created under
> > this ACL by a different owner (or if ownership changes), the person
> > who put said ACL on to the file/folder no longer has access. Most
> > people find this confusing (which is putting it politely).
> >
> > To further complicate matters, the "special" windows SID's*[1] - such
> > as "CREATOR/OWNER" -  don't seem to work properly in the
> > ctdb/samba/gpfs stack (I don't know if they do in "normal" samba
> > though). IBM don't support CREATOR/OWNER in SONAS*[2] - so it's not
> just me!
> >
> > So my question is - has anyone else been looking into this at all, and
> > if so, do you have any sage words of wisdom to offer?
> >
> > Cheers,
> > Orlando.
> >
> >
> > *[1] http://support.microsoft.com/kb/163846
> > *[2]
> > http://pic.dhe.ibm.com/infocenter/sonasic/sonas1ic/index.jsp?topic=%2F
> > c om.ibm.sonas.doc%2Fadm_authorization_limitations.html
> >
> >
> > --
> >              --
> >     Dr Orlando Richards
> >    Information Services
> > IT Infrastructure Division
> >         Unix Section
> >      Tel: 0131 650 4994
> >
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at gpfsug.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Mon Jul  2 15:05:25 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 14:05:25 +0000
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <4FE8720C.7040007@gmail.com>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>

Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
That said, I've not tried it yet.  It's on my list of things to R&D.

OFED/ROCE/iWARP:

http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html


From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
Sent: 25 June 2012 15:14
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] HPC people - interconnects

On 25/06/12 15:08, Jez Tucker wrote:
Do you all use IB?

Has anyone tried RDMA over 10G via the OFED stack?


Most of our customers we use RDMA over verbs

Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?


--

regards,


Arif
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/0c5cd468/attachment-0001.htm>

From bevans at canditmedia.co.uk  Mon Jul  2 22:16:15 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 22:16:15 +0100
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
	<39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>

It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards

Cheers,
B


On 2 Jul 2012, at 15:05, Jez Tucker wrote:

> Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
> That said, I?ve not tried it yet.  It?s on my list of things to R&D.
>  
> OFED/ROCE/iWARP:
>  
> http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html
>  
>  
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
> Sent: 25 June 2012 15:14
> To: gpfsug-discuss at gpfsug.org
> Subject: Re: [gpfsug-discuss] HPC people - interconnects
>  
> On 25/06/12 15:08, Jez Tucker wrote:
> Do you all use IB?
>  
> Has anyone tried RDMA over 10G via the OFED stack?
>  
>  
> Most of our customers we use RDMA over verbs
> 
> Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?
> 
> -- 
> regards,
>  
> Arif
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/85045865/attachment-0001.htm>

From bevans at canditmedia.co.uk  Mon Jul  2 22:25:02 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 22:25:02 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
	just has
In-Reply-To: <4FF1A945.5000100@dundee.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF1A945.5000100@dundee.ac.uk>
Message-ID: <EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>

This has so far hit all almost all of the places I work with (not so much GPFS crashing, but certainly storage manager going bezerk) - the majority of them do not use public NTP servers. In most cases no one actually noticed until it was pointed out, well worth a quick 'top' of your storage servers if you're using Engenio/LSI/NetApp based units (ie, DS3/4/5000).

The fix is here: http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/

But of course I wouldn't mess with the time in production unless you've got GPFS shutdown first.

Cheers,
B

 
On 2 Jul 2012, at 14:59, Jonathan Buzzard wrote:

> On 02/07/12 14:03, Jez Tucker wrote:
>> Just had a lovely one.
>> 
>> As I?m, sure all of you are aware by now, there?s been much fun with
>> some of the NTP Stratum 1 servers not correctly accounting for the
>> leap-seocnd last night.
>> 
>> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>> 
>> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>> 
>> You may wish to turn off ntp on your servers and correct your NTP to
>> trusted servers.
>> 
>> A clock skew from ntp.pool.org just took out one of our servers and the
>> node was expelled from the cluster.
> 
> Hum, not sure I would run my production servers directly off something
> from ntp.pool.org, I would at least put a local server in between.
> 
> Not notice any problems here, but then we are running latest RHEL 5.8
> and latest IBM Storage Manager (10.83) :-)
> 
> JAB.
> 
> --
> Jonathan A. Buzzard             Tel: +441382-386998
> Storage Administrator, College of Life Sciences
> University of Dundee, DD1 5EH
> 
> The University of Dundee is a registered Scottish Charity, No: SC015096
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/6b478a8d/attachment-0001.htm>

From Jez.Tucker at rushes.co.uk  Tue Jul  3 11:38:10 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Tue, 3 Jul 2012 10:38:10 +0000
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
	<39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>,
	<697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A93A7@WARVWEXC1.uk.deluxe-eu.com>

Here's the stack:

https://www.openfabrics.org/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html

VERBS is supported over 10GbE.  It should work if OFED VERBS == IBM VERBS.

---
Jez Tucker
Senior SysAdmin
Rushes
www.rushes.co.uk<http://www.rushes.co.uk>
________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Barry Evans [bevans at canditmedia.co.uk]
Sent: 02 July 2012 22:16
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] HPC people - interconnects

It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards

Cheers,
B


On 2 Jul 2012, at 15:05, Jez Tucker wrote:

Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
That said, I?ve not tried it yet.  It?s on my list of things to R&D.

OFED/ROCE/iWARP:

http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html


From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
Sent: 25 June 2012 15:14
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] HPC people - interconnects

On 25/06/12 15:08, Jez Tucker wrote:
Do you all use IB?

Has anyone tried RDMA over 10G via the OFED stack?


Most of our customers we use RDMA over verbs

Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?


--

regards,


Arif

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120703/2c9679e9/attachment-0001.htm>

From j.buzzard at dundee.ac.uk  Tue Jul  3 12:01:33 2012
From: j.buzzard at dundee.ac.uk (Jonathan Buzzard)
Date: Tue, 3 Jul 2012 12:01:33 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF1A945.5000100@dundee.ac.uk>
	<EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>
Message-ID: <4FF2D10D.2030701@dundee.ac.uk>

On 02/07/12 22:25, Barry Evans wrote:
> This has so far hit all almost all of the places I work with (not so
> much GPFS crashing, but certainly storage manager going bezerk) - the
> majority of them do not use public NTP servers. In most cases no one
> actually noticed until it was pointed out, well worth a quick 'top' of
> your storage servers if you're using Engenio/LSI/NetApp based units (ie,
> DS3/4/5000).
>
> The fix is here:
> http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/
>

Or you can upgrade to the latest version of storage manager. We are
running 10.83 and it sailed through without issue. Now admittedly most
people have probably not upgraded as it has only been out for a couple
of weeks. I was very prompt on the upgrade as it allows the one Storage
Manager install to manage both IBM DS3/4/5000 and Dell MD3xxx from the
same program.

JAB.

--
Jonathan A. Buzzard             Tel: +441382-386998
Storage Administrator, College of Life Sciences
University of Dundee, DD1 5EH

The University of Dundee is a registered Scottish Charity, No: SC015096


From sfadden at us.ibm.com  Thu Jul  5 17:25:22 2012
From: sfadden at us.ibm.com (Scott Fadden)
Date: Thu, 5 Jul 2012 09:25:22 -0700
Subject: [gpfsug-discuss] GPFS Service Bulletin - July 5 2012
Message-ID: <OF566CBF8E.739C89D9-ON88257A32.005A1671-88257A32.005A3696@us.ibm.com>

Date Added: July 5, 2012
Issue:
IBM has identified an issue with GPFS file systems at versions 3.4 or 3.5 
which were migrated from file systems created with GPFS versions earlier 
than 3.4. This issue can occur only after using the mmmigratefs command 
with the [--fastea]option.
The issue can result in a loss of data, requiring the restoration of data 
from a backup source.
GPFS file systems created with versions earlier than 3.4 should not be 
migrated using the mmmigratefs command with the [--fastea] option until a 
fix is provided from IBM. IBM plans to make the fix available in GPFS 
versions
3.5.0.3 (APAR IV24151) and 3.4.0.15 (APAR IV24150). An ifix will also be 
available from IBM service.
If customers have already migrated file systems from GPFS versions earlier 
than 3.4, IBM service has a fix. Please follow the steps below to 
determine if your system may be affected.
To determine if your system may be affected:
1. Ensure your GPFS file systems are mounted.
2. As a user with GPFS administrator privileges on a machine where your 
GPFS file systems are mounted, issue the command:
        /usr/lpp/mmfs/bin/mmfsadm dump stripe | grep "inode 0"
The command will produce output that identifies locations for the "inode 
0" file for all currently
mounted GPFS file systems. Example output for a file system configured 
with two way meta-data
replication would be in the form:
        inode 0: 3:4098 1:4098
For a file system with no meta-data replication the output would be in the 
form:
        inode 0: 3:4098
The relevant information to look for to see if you may experience a 
problem are the fields denoting <disk>:<sector> for each inode 0 replica 
(e.g. 3:4098 and 1:4098 in these examples).
If each <disk>:<sector> replica only denotes 4098 for the sector field 
then you are not experiencing this problem. If however there is a number 
other than 4098 in the sector output then you are requested to immediately 
call IBM service and reference this problem. The IBM service person will 
walk you thru a fix for correcting the issue.


Scott Fadden
GPFS Technical Marketing 
Desk: (503) 578-5630 
Cell: (503) 880-5833 
sfadden at us.ibm.com
http://www.ibm.com/systems/gpfs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120705/193c1254/attachment-0001.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul  6 11:44:49 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 6 Jul 2012 10:44:49 +0000
Subject: [gpfsug-discuss] Almaden Labs pNFS attending September 20th User
	Group
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AB3F7@WARVWEXC1.uk.deluxe-eu.com>

Hello all

  I've received confirmation from Dean Hildebrand from the Almaden Research Labs that he will attend the September User Group to present pNFS.

Dean is available on the 21st to meet post user group for relevant discussion.
He will be based on London throughout his visit, owing to the proximity to AWE.

If you would like to meet Dean to discuss pNFS further, please arrange this with myself via email.

Cheers

Jez
---
Jez Tucker
Senior Sysadmin
Rushes

GPFSUG Chairman (chair at gpfsug.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120706/1e11d594/attachment-0001.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul 20 09:28:19 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 20 Jul 2012 08:28:19 +0000
Subject: [gpfsug-discuss] Your NSD server loadavg?
Message-ID: <39571EA9316BE44899D59C7A640C13F5305ADDFF@WARVWEXC1.uk.deluxe-eu.com>

Hello

  Just curious to see what your NSD server's loadavg is when under a normal job processing load.  I.E SGE running tasks over NFS.


---
Jez Tucker
Senior Sysadmin
Rushes
DDI: +44 (0) 207 851 6276
http://www.rushes.co.uk<http://www.rushes.co.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120720/895ab83e/attachment-0001.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul 20 18:02:44 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 20 Jul 2012 17:02:44 +0000
Subject: [gpfsug-discuss] Great perf tool
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>


http://collectl.sourceforge.net/index.html

---
Jez Tucker
Senior Sysadmin
Rushes

GPFSUG Chairman (chair at gpfsug.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120720/dbf4e765/attachment-0001.htm>

From janfrode at tanso.net  Fri Jul 20 23:23:54 2012
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Sat, 21 Jul 2012 00:23:54 +0200
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <20120720222354.GB12126@dibs.tanso.net>

On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
> 
> http://collectl.sourceforge.net/index.html

Another great framework for collecting performance data is Performance
CoPilot:

	http://oss.sgi.com/projects/pcp/

It can collect and play live or re-play archived data from several nodes
in the same gui (or tui) player.

PCP is finally scheduled for inclusion in RHEL, so it's hopefully no
longer only us old SGI IRIX admins that will be using it anymore :-)

	http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf

It's already available in EPEL.


  -jf


From viccornell at gmail.com  Sat Jul 21 10:43:33 2012
From: viccornell at gmail.com (Vic Cornell)
Date: Sat, 21 Jul 2012 10:43:33 +0100
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <20120720222354.GB12126@dibs.tanso.net>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
	<20120720222354.GB12126@dibs.tanso.net>
Message-ID: <6259820223170424573@unknownmsgid>

I second that. The great thing about PCP is that it will monitor down
to 1/10 second which is really usefull when you want to see what is
realy going on.

Good news about it inclusion in RHEL. It also has a mac and windows
version so that you can instrument an entire setup and monitor it on
the box of your choice.

Runs best under IRIX though. . . .

Kind Regards,

Vic

Vic Cornell
Application Support Engineer
DataDirect Networks
Davidson House
Forbury Square
Reading
RG1 3EU
United Kingdom

Mobile 07900 660 266
Skype viccornell


www.ddn.com
This email may contain confidential and privileged material for the
sole use of the intended recipient.  Any review or distribution by
others is strictly prohibited.  If you are not the intended recipient
please contact the sender and delete all copies


On 20 Jul 2012, at 23:24, Jan-Frode Myklebust <janfrode at tanso.net> wrote:

> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
>>
>> http://collectl.sourceforge.net/index.html
>
> Another great framework for collecting performance data is Performance
> CoPilot:
>
>    http://oss.sgi.com/projects/pcp/
>
> It can collect and play live or re-play archived data from several nodes
> in the same gui (or tui) player.
>
> PCP is finally scheduled for inclusion in RHEL, so it's hopefully no
> longer only us old SGI IRIX admins that will be using it anymore :-)
>
>    http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf
>
> It's already available in EPEL.
>
>
>  -jf
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Tue Jul 24 09:22:13 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Tue, 24 Jul 2012 08:22:13 +0000
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <20120720222354.GB12126@dibs.tanso.net>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
	<20120720222354.GB12126@dibs.tanso.net>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE76B@WARVWEXC1.uk.deluxe-eu.com>

Perhaps you or Vic could give a quick run through PCP at the next UG meeting?

> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Jan-Frode Myklebust
> Sent: 20 July 2012 23:24
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] Great perf tool
> 
> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
> >
> > http://collectl.sourceforge.net/index.html
> 
> Another great framework for collecting performance data is Performance
> CoPilot:
> 
> 	http://oss.sgi.com/projects/pcp/
> 
> It can collect and play live or re-play archived data from several nodes in the
> same gui (or tui) player.
> 
> PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer
> only us old SGI IRIX admins that will be using it anymore :-)
> 
> 	http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap
> .pdf
> 
> It's already available in EPEL.
> 
> 
>   -jf
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:03:21 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:03:21 +0000
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just
	has
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>

Just had a lovely one.

As I'm, sure all of you are aware by now, there's been much fun with some of the NTP Stratum 1 servers not correctly accounting for the leap-seocnd last night.

http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html

You may wish to turn off ntp on your servers and correct your NTP to trusted servers.
A clock skew from ntp.pool.org just took out one of our servers and the node was expelled from the cluster.

Jez
---
Jez Tucker
Senior Sysadmin
Rushes
DDI: +44 (0) 207 851 6276
http://www.rushes.co.uk<http://www.rushes.co.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/174c836c/attachment-0002.htm>

From orlando.richards at ed.ac.uk  Mon Jul  2 14:12:46 2012
From: orlando.richards at ed.ac.uk (Orlando Richards)
Date: Mon, 02 Jul 2012 14:12:46 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <4FF19E4E.3040802@ed.ac.uk>

Hi Jez,

We've had a few issues with the leap second - but so far it has been 
isolated to redhat 6.2 systems.

What OS are you running on your affected server?

Cheers,
Orlando.


On 02/07/12 14:03, Jez Tucker wrote:
> Just had a lovely one.
>
> As I?m, sure all of you are aware by now, there?s been much fun with
> some of the NTP Stratum 1 servers not correctly accounting for the
> leap-seocnd last night.
>
> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>
> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>
> You may wish to turn off ntp on your servers and correct your NTP to
> trusted servers.
>
> A clock skew from ntp.pool.org just took out one of our servers and the
> node was expelled from the cluster.
>
> Jez
>
> ---
>
> Jez Tucker
>
> Senior Sysadmin
>
> Rushes
>
> DDI: +44 (0) 207 851 6276
>
> http://www.rushes.co.uk <http://www.rushes.co.uk/>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
             --
    Dr Orlando Richards
   Information Services
IT Infrastructure Division
        Unix Section
     Tel: 0131 650 4994

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.


From bevans at canditmedia.co.uk  Mon Jul  2 14:27:52 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 14:27:52 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
	just has
In-Reply-To: <4FF19E4E.3040802@ed.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF19E4E.3040802@ed.ac.uk>
Message-ID: <C74ABA98-34B5-4A32-856B-2CAF51D01EEF@canditmedia.co.uk>

I've had a few (this is growing number as the day goes on) number of customers with IBM Storage Manager eating loads of CPU cycles and causing slow downs as a result (SLES 10 SP1, mostly). 

The common factor, at least at this end, seems to be ntp sycning against public ntp pool servers and Java, but Jez's report...... scary stuff...

Cheers,
Barry


On 2 Jul 2012, at 14:12, Orlando Richards wrote:

> Hi Jez,
> 
> We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems.
> 
> What OS are you running on your affected server?
> 
> Cheers,
> Orlando.
> 
> 
> On 02/07/12 14:03, Jez Tucker wrote:
>> Just had a lovely one.
>> 
>> As I?m, sure all of you are aware by now, there?s been much fun with
>> some of the NTP Stratum 1 servers not correctly accounting for the
>> leap-seocnd last night.
>> 
>> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>> 
>> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>> 
>> You may wish to turn off ntp on your servers and correct your NTP to
>> trusted servers.
>> 
>> A clock skew from ntp.pool.org just took out one of our servers and the
>> node was expelled from the cluster.
>> 
>> Jez
>> 
>> ---
>> 
>> Jez Tucker
>> 
>> Senior Sysadmin
>> 
>> Rushes
>> 
>> DDI: +44 (0) 207 851 6276
>> 
>> http://www.rushes.co.uk <http://www.rushes.co.uk/>
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> -- 
>            --
>   Dr Orlando Richards
>  Information Services
> IT Infrastructure Division
>       Unix Section
>    Tel: 0131 650 4994
> 
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/25691e28/attachment-0002.htm>

From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:47:21 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:47:21 +0000
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <4FF19E4E.3040802@ed.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF19E4E.3040802@ed.ac.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9010@WARVWEXC1.uk.deluxe-eu.com>

Funnily enough RH Ent 6.2

> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Orlando Richards
> Sent: 02 July 2012 14:13
> To: gpfsug-discuss at gpfsug.org
> Subject: Re: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
> just has
> 
> Hi Jez,
> 
> We've had a few issues with the leap second - but so far it has been isolated
> to redhat 6.2 systems.
> 
> What OS are you running on your affected server?
> 
> Cheers,
> Orlando.
> 
> 
> On 02/07/12 14:03, Jez Tucker wrote:
> > Just had a lovely one.
> >
> > As I'm, sure all of you are aware by now, there's been much fun with
> > some of the NTP Stratum 1 servers not correctly accounting for the
> > leap-seocnd last night.
> >
> > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-
> Websi
> > tes-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
> >
> > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-
> brings-
> > down-top-websites.html
> >
> > You may wish to turn off ntp on your servers and correct your NTP to
> > trusted servers.
> >
> > A clock skew from ntp.pool.org just took out one of our servers and
> > the node was expelled from the cluster.
> >
> > Jez
> >
> > ---
> >
> > Jez Tucker
> >
> > Senior Sysadmin
> >
> > Rushes
> >
> > DDI: +44 (0) 207 851 6276
> >
> > http://www.rushes.co.uk <http://www.rushes.co.uk/>
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at gpfsug.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> --
>              --
>     Dr Orlando Richards
>    Information Services
> IT Infrastructure Division
>         Unix Section
>      Tel: 0131 650 4994
> 
> The University of Edinburgh is a charitable body, registered in Scotland, with
> registration number SC005336.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From j.buzzard at dundee.ac.uk  Mon Jul  2 14:59:33 2012
From: j.buzzard at dundee.ac.uk (Jonathan Buzzard)
Date: Mon, 2 Jul 2012 14:59:33 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <4FF1A945.5000100@dundee.ac.uk>

On 02/07/12 14:03, Jez Tucker wrote:
> Just had a lovely one.
>
> As I?m, sure all of you are aware by now, there?s been much fun with
> some of the NTP Stratum 1 servers not correctly accounting for the
> leap-seocnd last night.
>
> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>
> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>
> You may wish to turn off ntp on your servers and correct your NTP to
> trusted servers.
>
> A clock skew from ntp.pool.org just took out one of our servers and the
> node was expelled from the cluster.

Hum, not sure I would run my production servers directly off something
from ntp.pool.org, I would at least put a local server in between.

Not notice any problems here, but then we are running latest RHEL 5.8
and latest IBM Storage Manager (10.83) :-)

JAB.

--
Jonathan A. Buzzard             Tel: +441382-386998
Storage Administrator, College of Life Sciences
University of Dundee, DD1 5EH

The University of Dundee is a registered Scottish Charity, No: SC015096


From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:59:34 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:59:34 +0000
Subject: [gpfsug-discuss] Samba mapping of "special" SID entries
In-Reply-To: <F68FAAD16AEC9744BD921F91014D2F0003DFB8@MBX06.ad.oak.ox.ac.uk>
References: <4FE486B2.1050501@ed.ac.uk>
	<F68FAAD16AEC9744BD921F91014D2F0003DFB8@MBX06.ad.oak.ox.ac.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9047@WARVWEXC1.uk.deluxe-eu.com>

Now I've located my GPFSUG from within Outlook...

I'm presuming you're creating an ACL with the equivalent of 2775 permissions and the owner file system being 'nfsv4', rather than 'all'?
Your nfsv3 clients have nfsv4 acl support installed?

Jez


> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Luke Raimbach
> Sent: 22 June 2012 17:33
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] Samba mapping of "special" SID entries
> 
> Hi Orlando,
> 
> I've been having success using Centrify to manage UID/GID mappings for our
> very small mixed cluster (7 x Linux, 1 x Windows 2008R2).
> 
> I've created a map for "CREATOR / OWNER", "SYSTEM", "Domain Admins",
> etc. group SIDs and use the Windows node to manage ACLs. When the
> windows node applies the ACLs, these seem to translate successfully in to
> GPFS ACLs and work nicely for the mixed environment allowing users on
> both Linux and Windows systems to manipulate each other's files.
> 
> People are mounting the FS via NFS (exported via the NSD Linux servers)
> and CIFS (shared from Win2k8R2). The permissions don't look friendly when
> you run ls -l on a Linux system over NFS but the ACLs do their job in
> preserving inheritable permissions, etc. If people want to see the 'real' ACL,
> they need to use mmgetacl on a GPFS attached node (or windows users
> simply click on the security tab under properties of a file).
> 
> Drop me a line off-list if you want to take a look at what we've got remotely.
> I can run a webex session from the Windows node if you want to have a
> good poke around.
> 
> Luke.
> 
> --
> 
> Luke Raimbach
> IT Manager
> Oxford e-Research Centre
> 7 Keble Road,
> Oxford,
> OX1 3QG
> 
> +44(0)1865 610639
> 
> 
> 
> 
> > -----Original Message-----
> > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> > bounces at gpfsug.org] On Behalf Of Orlando Richards
> > Sent: 22 June 2012 15:53
> > To: gpfsug-discuss at gpfsug.org
> > Subject: [gpfsug-discuss] Samba mapping of "special" SID entries
> >
> > Hi all,
> >
> > Has anyone bumped up against the "nfs4: special" option in GPFS/Samba
> > deployments which manipulates how the "owner" and "group owner"
> (and
> > "everybody") behaviour is mapped to ACLs when accessed via the samba
> > stack?
> >
> > In particular, with the "default" setting (if one blindly follows the
> > worked examples on this) of nfs4: special, if a user adds themselves
> > specifically to an ACL, this creates an entry:
> >
> > special:@owner
> >
> > rather than:
> >
> > user:username
> >
> > which has the knock-on effect that if a file/folder is created under
> > this ACL by a different owner (or if ownership changes), the person
> > who put said ACL on to the file/folder no longer has access. Most
> > people find this confusing (which is putting it politely).
> >
> > To further complicate matters, the "special" windows SID's*[1] - such
> > as "CREATOR/OWNER" -  don't seem to work properly in the
> > ctdb/samba/gpfs stack (I don't know if they do in "normal" samba
> > though). IBM don't support CREATOR/OWNER in SONAS*[2] - so it's not
> just me!
> >
> > So my question is - has anyone else been looking into this at all, and
> > if so, do you have any sage words of wisdom to offer?
> >
> > Cheers,
> > Orlando.
> >
> >
> > *[1] http://support.microsoft.com/kb/163846
> > *[2]
> > http://pic.dhe.ibm.com/infocenter/sonasic/sonas1ic/index.jsp?topic=%2F
> > c om.ibm.sonas.doc%2Fadm_authorization_limitations.html
> >
> >
> > --
> >              --
> >     Dr Orlando Richards
> >    Information Services
> > IT Infrastructure Division
> >         Unix Section
> >      Tel: 0131 650 4994
> >
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at gpfsug.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Mon Jul  2 15:05:25 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 14:05:25 +0000
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <4FE8720C.7040007@gmail.com>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>

Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
That said, I've not tried it yet.  It's on my list of things to R&D.

OFED/ROCE/iWARP:

http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html


From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
Sent: 25 June 2012 15:14
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] HPC people - interconnects

On 25/06/12 15:08, Jez Tucker wrote:
Do you all use IB?

Has anyone tried RDMA over 10G via the OFED stack?


Most of our customers we use RDMA over verbs

Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?


--

regards,


Arif
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/0c5cd468/attachment-0002.htm>

From bevans at canditmedia.co.uk  Mon Jul  2 22:16:15 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 22:16:15 +0100
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
	<39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>

It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards

Cheers,
B


On 2 Jul 2012, at 15:05, Jez Tucker wrote:

> Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
> That said, I?ve not tried it yet.  It?s on my list of things to R&D.
>  
> OFED/ROCE/iWARP:
>  
> http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html
>  
>  
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
> Sent: 25 June 2012 15:14
> To: gpfsug-discuss at gpfsug.org
> Subject: Re: [gpfsug-discuss] HPC people - interconnects
>  
> On 25/06/12 15:08, Jez Tucker wrote:
> Do you all use IB?
>  
> Has anyone tried RDMA over 10G via the OFED stack?
>  
>  
> Most of our customers we use RDMA over verbs
> 
> Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?
> 
> -- 
> regards,
>  
> Arif
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/85045865/attachment-0002.htm>

From bevans at canditmedia.co.uk  Mon Jul  2 22:25:02 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 22:25:02 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
	just has
In-Reply-To: <4FF1A945.5000100@dundee.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF1A945.5000100@dundee.ac.uk>
Message-ID: <EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>

This has so far hit all almost all of the places I work with (not so much GPFS crashing, but certainly storage manager going bezerk) - the majority of them do not use public NTP servers. In most cases no one actually noticed until it was pointed out, well worth a quick 'top' of your storage servers if you're using Engenio/LSI/NetApp based units (ie, DS3/4/5000).

The fix is here: http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/

But of course I wouldn't mess with the time in production unless you've got GPFS shutdown first.

Cheers,
B

 
On 2 Jul 2012, at 14:59, Jonathan Buzzard wrote:

> On 02/07/12 14:03, Jez Tucker wrote:
>> Just had a lovely one.
>> 
>> As I?m, sure all of you are aware by now, there?s been much fun with
>> some of the NTP Stratum 1 servers not correctly accounting for the
>> leap-seocnd last night.
>> 
>> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>> 
>> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>> 
>> You may wish to turn off ntp on your servers and correct your NTP to
>> trusted servers.
>> 
>> A clock skew from ntp.pool.org just took out one of our servers and the
>> node was expelled from the cluster.
> 
> Hum, not sure I would run my production servers directly off something
> from ntp.pool.org, I would at least put a local server in between.
> 
> Not notice any problems here, but then we are running latest RHEL 5.8
> and latest IBM Storage Manager (10.83) :-)
> 
> JAB.
> 
> --
> Jonathan A. Buzzard             Tel: +441382-386998
> Storage Administrator, College of Life Sciences
> University of Dundee, DD1 5EH
> 
> The University of Dundee is a registered Scottish Charity, No: SC015096
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/6b478a8d/attachment-0002.htm>

From Jez.Tucker at rushes.co.uk  Tue Jul  3 11:38:10 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Tue, 3 Jul 2012 10:38:10 +0000
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
	<39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>,
	<697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A93A7@WARVWEXC1.uk.deluxe-eu.com>

Here's the stack:

https://www.openfabrics.org/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html

VERBS is supported over 10GbE.  It should work if OFED VERBS == IBM VERBS.

---
Jez Tucker
Senior SysAdmin
Rushes
www.rushes.co.uk<http://www.rushes.co.uk>
________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Barry Evans [bevans at canditmedia.co.uk]
Sent: 02 July 2012 22:16
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] HPC people - interconnects

It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards

Cheers,
B


On 2 Jul 2012, at 15:05, Jez Tucker wrote:

Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
That said, I?ve not tried it yet.  It?s on my list of things to R&D.

OFED/ROCE/iWARP:

http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html


From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
Sent: 25 June 2012 15:14
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] HPC people - interconnects

On 25/06/12 15:08, Jez Tucker wrote:
Do you all use IB?

Has anyone tried RDMA over 10G via the OFED stack?


Most of our customers we use RDMA over verbs

Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?


--

regards,


Arif

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120703/2c9679e9/attachment-0002.htm>

From j.buzzard at dundee.ac.uk  Tue Jul  3 12:01:33 2012
From: j.buzzard at dundee.ac.uk (Jonathan Buzzard)
Date: Tue, 3 Jul 2012 12:01:33 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF1A945.5000100@dundee.ac.uk>
	<EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>
Message-ID: <4FF2D10D.2030701@dundee.ac.uk>

On 02/07/12 22:25, Barry Evans wrote:
> This has so far hit all almost all of the places I work with (not so
> much GPFS crashing, but certainly storage manager going bezerk) - the
> majority of them do not use public NTP servers. In most cases no one
> actually noticed until it was pointed out, well worth a quick 'top' of
> your storage servers if you're using Engenio/LSI/NetApp based units (ie,
> DS3/4/5000).
>
> The fix is here:
> http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/
>

Or you can upgrade to the latest version of storage manager. We are
running 10.83 and it sailed through without issue. Now admittedly most
people have probably not upgraded as it has only been out for a couple
of weeks. I was very prompt on the upgrade as it allows the one Storage
Manager install to manage both IBM DS3/4/5000 and Dell MD3xxx from the
same program.

JAB.

--
Jonathan A. Buzzard             Tel: +441382-386998
Storage Administrator, College of Life Sciences
University of Dundee, DD1 5EH

The University of Dundee is a registered Scottish Charity, No: SC015096


From sfadden at us.ibm.com  Thu Jul  5 17:25:22 2012
From: sfadden at us.ibm.com (Scott Fadden)
Date: Thu, 5 Jul 2012 09:25:22 -0700
Subject: [gpfsug-discuss] GPFS Service Bulletin - July 5 2012
Message-ID: <OF566CBF8E.739C89D9-ON88257A32.005A1671-88257A32.005A3696@us.ibm.com>

Date Added: July 5, 2012
Issue:
IBM has identified an issue with GPFS file systems at versions 3.4 or 3.5 
which were migrated from file systems created with GPFS versions earlier 
than 3.4. This issue can occur only after using the mmmigratefs command 
with the [--fastea]option.
The issue can result in a loss of data, requiring the restoration of data 
from a backup source.
GPFS file systems created with versions earlier than 3.4 should not be 
migrated using the mmmigratefs command with the [--fastea] option until a 
fix is provided from IBM. IBM plans to make the fix available in GPFS 
versions
3.5.0.3 (APAR IV24151) and 3.4.0.15 (APAR IV24150). An ifix will also be 
available from IBM service.
If customers have already migrated file systems from GPFS versions earlier 
than 3.4, IBM service has a fix. Please follow the steps below to 
determine if your system may be affected.
To determine if your system may be affected:
1. Ensure your GPFS file systems are mounted.
2. As a user with GPFS administrator privileges on a machine where your 
GPFS file systems are mounted, issue the command:
        /usr/lpp/mmfs/bin/mmfsadm dump stripe | grep "inode 0"
The command will produce output that identifies locations for the "inode 
0" file for all currently
mounted GPFS file systems. Example output for a file system configured 
with two way meta-data
replication would be in the form:
        inode 0: 3:4098 1:4098
For a file system with no meta-data replication the output would be in the 
form:
        inode 0: 3:4098
The relevant information to look for to see if you may experience a 
problem are the fields denoting <disk>:<sector> for each inode 0 replica 
(e.g. 3:4098 and 1:4098 in these examples).
If each <disk>:<sector> replica only denotes 4098 for the sector field 
then you are not experiencing this problem. If however there is a number 
other than 4098 in the sector output then you are requested to immediately 
call IBM service and reference this problem. The IBM service person will 
walk you thru a fix for correcting the issue.


Scott Fadden
GPFS Technical Marketing 
Desk: (503) 578-5630 
Cell: (503) 880-5833 
sfadden at us.ibm.com
http://www.ibm.com/systems/gpfs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120705/193c1254/attachment-0002.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul  6 11:44:49 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 6 Jul 2012 10:44:49 +0000
Subject: [gpfsug-discuss] Almaden Labs pNFS attending September 20th User
	Group
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AB3F7@WARVWEXC1.uk.deluxe-eu.com>

Hello all

  I've received confirmation from Dean Hildebrand from the Almaden Research Labs that he will attend the September User Group to present pNFS.

Dean is available on the 21st to meet post user group for relevant discussion.
He will be based on London throughout his visit, owing to the proximity to AWE.

If you would like to meet Dean to discuss pNFS further, please arrange this with myself via email.

Cheers

Jez
---
Jez Tucker
Senior Sysadmin
Rushes

GPFSUG Chairman (chair at gpfsug.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120706/1e11d594/attachment-0002.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul 20 09:28:19 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 20 Jul 2012 08:28:19 +0000
Subject: [gpfsug-discuss] Your NSD server loadavg?
Message-ID: <39571EA9316BE44899D59C7A640C13F5305ADDFF@WARVWEXC1.uk.deluxe-eu.com>

Hello

  Just curious to see what your NSD server's loadavg is when under a normal job processing load.  I.E SGE running tasks over NFS.


---
Jez Tucker
Senior Sysadmin
Rushes
DDI: +44 (0) 207 851 6276
http://www.rushes.co.uk<http://www.rushes.co.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120720/895ab83e/attachment-0002.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul 20 18:02:44 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 20 Jul 2012 17:02:44 +0000
Subject: [gpfsug-discuss] Great perf tool
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>


http://collectl.sourceforge.net/index.html

---
Jez Tucker
Senior Sysadmin
Rushes

GPFSUG Chairman (chair at gpfsug.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120720/dbf4e765/attachment-0002.htm>

From janfrode at tanso.net  Fri Jul 20 23:23:54 2012
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Sat, 21 Jul 2012 00:23:54 +0200
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <20120720222354.GB12126@dibs.tanso.net>

On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
> 
> http://collectl.sourceforge.net/index.html

Another great framework for collecting performance data is Performance
CoPilot:

	http://oss.sgi.com/projects/pcp/

It can collect and play live or re-play archived data from several nodes
in the same gui (or tui) player.

PCP is finally scheduled for inclusion in RHEL, so it's hopefully no
longer only us old SGI IRIX admins that will be using it anymore :-)

	http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf

It's already available in EPEL.


  -jf


From viccornell at gmail.com  Sat Jul 21 10:43:33 2012
From: viccornell at gmail.com (Vic Cornell)
Date: Sat, 21 Jul 2012 10:43:33 +0100
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <20120720222354.GB12126@dibs.tanso.net>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
	<20120720222354.GB12126@dibs.tanso.net>
Message-ID: <6259820223170424573@unknownmsgid>

I second that. The great thing about PCP is that it will monitor down
to 1/10 second which is really usefull when you want to see what is
realy going on.

Good news about it inclusion in RHEL. It also has a mac and windows
version so that you can instrument an entire setup and monitor it on
the box of your choice.

Runs best under IRIX though. . . .

Kind Regards,

Vic

Vic Cornell
Application Support Engineer
DataDirect Networks
Davidson House
Forbury Square
Reading
RG1 3EU
United Kingdom

Mobile 07900 660 266
Skype viccornell


www.ddn.com
This email may contain confidential and privileged material for the
sole use of the intended recipient.  Any review or distribution by
others is strictly prohibited.  If you are not the intended recipient
please contact the sender and delete all copies


On 20 Jul 2012, at 23:24, Jan-Frode Myklebust <janfrode at tanso.net> wrote:

> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
>>
>> http://collectl.sourceforge.net/index.html
>
> Another great framework for collecting performance data is Performance
> CoPilot:
>
>    http://oss.sgi.com/projects/pcp/
>
> It can collect and play live or re-play archived data from several nodes
> in the same gui (or tui) player.
>
> PCP is finally scheduled for inclusion in RHEL, so it's hopefully no
> longer only us old SGI IRIX admins that will be using it anymore :-)
>
>    http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf
>
> It's already available in EPEL.
>
>
>  -jf
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Tue Jul 24 09:22:13 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Tue, 24 Jul 2012 08:22:13 +0000
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <20120720222354.GB12126@dibs.tanso.net>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
	<20120720222354.GB12126@dibs.tanso.net>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE76B@WARVWEXC1.uk.deluxe-eu.com>

Perhaps you or Vic could give a quick run through PCP at the next UG meeting?

> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Jan-Frode Myklebust
> Sent: 20 July 2012 23:24
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] Great perf tool
> 
> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
> >
> > http://collectl.sourceforge.net/index.html
> 
> Another great framework for collecting performance data is Performance
> CoPilot:
> 
> 	http://oss.sgi.com/projects/pcp/
> 
> It can collect and play live or re-play archived data from several nodes in the
> same gui (or tui) player.
> 
> PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer
> only us old SGI IRIX admins that will be using it anymore :-)
> 
> 	http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap
> .pdf
> 
> It's already available in EPEL.
> 
> 
>   -jf
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:03:21 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:03:21 +0000
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just
	has
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>

Just had a lovely one.

As I'm, sure all of you are aware by now, there's been much fun with some of the NTP Stratum 1 servers not correctly accounting for the leap-seocnd last night.

http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html

You may wish to turn off ntp on your servers and correct your NTP to trusted servers.
A clock skew from ntp.pool.org just took out one of our servers and the node was expelled from the cluster.

Jez
---
Jez Tucker
Senior Sysadmin
Rushes
DDI: +44 (0) 207 851 6276
http://www.rushes.co.uk<http://www.rushes.co.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/174c836c/attachment-0003.htm>

From orlando.richards at ed.ac.uk  Mon Jul  2 14:12:46 2012
From: orlando.richards at ed.ac.uk (Orlando Richards)
Date: Mon, 02 Jul 2012 14:12:46 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <4FF19E4E.3040802@ed.ac.uk>

Hi Jez,

We've had a few issues with the leap second - but so far it has been 
isolated to redhat 6.2 systems.

What OS are you running on your affected server?

Cheers,
Orlando.


On 02/07/12 14:03, Jez Tucker wrote:
> Just had a lovely one.
>
> As I?m, sure all of you are aware by now, there?s been much fun with
> some of the NTP Stratum 1 servers not correctly accounting for the
> leap-seocnd last night.
>
> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>
> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>
> You may wish to turn off ntp on your servers and correct your NTP to
> trusted servers.
>
> A clock skew from ntp.pool.org just took out one of our servers and the
> node was expelled from the cluster.
>
> Jez
>
> ---
>
> Jez Tucker
>
> Senior Sysadmin
>
> Rushes
>
> DDI: +44 (0) 207 851 6276
>
> http://www.rushes.co.uk <http://www.rushes.co.uk/>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
             --
    Dr Orlando Richards
   Information Services
IT Infrastructure Division
        Unix Section
     Tel: 0131 650 4994

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.


From bevans at canditmedia.co.uk  Mon Jul  2 14:27:52 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 14:27:52 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
	just has
In-Reply-To: <4FF19E4E.3040802@ed.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF19E4E.3040802@ed.ac.uk>
Message-ID: <C74ABA98-34B5-4A32-856B-2CAF51D01EEF@canditmedia.co.uk>

I've had a few (this is growing number as the day goes on) number of customers with IBM Storage Manager eating loads of CPU cycles and causing slow downs as a result (SLES 10 SP1, mostly). 

The common factor, at least at this end, seems to be ntp sycning against public ntp pool servers and Java, but Jez's report...... scary stuff...

Cheers,
Barry


On 2 Jul 2012, at 14:12, Orlando Richards wrote:

> Hi Jez,
> 
> We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems.
> 
> What OS are you running on your affected server?
> 
> Cheers,
> Orlando.
> 
> 
> On 02/07/12 14:03, Jez Tucker wrote:
>> Just had a lovely one.
>> 
>> As I?m, sure all of you are aware by now, there?s been much fun with
>> some of the NTP Stratum 1 servers not correctly accounting for the
>> leap-seocnd last night.
>> 
>> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>> 
>> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>> 
>> You may wish to turn off ntp on your servers and correct your NTP to
>> trusted servers.
>> 
>> A clock skew from ntp.pool.org just took out one of our servers and the
>> node was expelled from the cluster.
>> 
>> Jez
>> 
>> ---
>> 
>> Jez Tucker
>> 
>> Senior Sysadmin
>> 
>> Rushes
>> 
>> DDI: +44 (0) 207 851 6276
>> 
>> http://www.rushes.co.uk <http://www.rushes.co.uk/>
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> -- 
>            --
>   Dr Orlando Richards
>  Information Services
> IT Infrastructure Division
>       Unix Section
>    Tel: 0131 650 4994
> 
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/25691e28/attachment-0003.htm>

From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:47:21 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:47:21 +0000
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <4FF19E4E.3040802@ed.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF19E4E.3040802@ed.ac.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9010@WARVWEXC1.uk.deluxe-eu.com>

Funnily enough RH Ent 6.2

> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Orlando Richards
> Sent: 02 July 2012 14:13
> To: gpfsug-discuss at gpfsug.org
> Subject: Re: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
> just has
> 
> Hi Jez,
> 
> We've had a few issues with the leap second - but so far it has been isolated
> to redhat 6.2 systems.
> 
> What OS are you running on your affected server?
> 
> Cheers,
> Orlando.
> 
> 
> On 02/07/12 14:03, Jez Tucker wrote:
> > Just had a lovely one.
> >
> > As I'm, sure all of you are aware by now, there's been much fun with
> > some of the NTP Stratum 1 servers not correctly accounting for the
> > leap-seocnd last night.
> >
> > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-
> Websi
> > tes-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
> >
> > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-
> brings-
> > down-top-websites.html
> >
> > You may wish to turn off ntp on your servers and correct your NTP to
> > trusted servers.
> >
> > A clock skew from ntp.pool.org just took out one of our servers and
> > the node was expelled from the cluster.
> >
> > Jez
> >
> > ---
> >
> > Jez Tucker
> >
> > Senior Sysadmin
> >
> > Rushes
> >
> > DDI: +44 (0) 207 851 6276
> >
> > http://www.rushes.co.uk <http://www.rushes.co.uk/>
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at gpfsug.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> --
>              --
>     Dr Orlando Richards
>    Information Services
> IT Infrastructure Division
>         Unix Section
>      Tel: 0131 650 4994
> 
> The University of Edinburgh is a charitable body, registered in Scotland, with
> registration number SC005336.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From j.buzzard at dundee.ac.uk  Mon Jul  2 14:59:33 2012
From: j.buzzard at dundee.ac.uk (Jonathan Buzzard)
Date: Mon, 2 Jul 2012 14:59:33 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <4FF1A945.5000100@dundee.ac.uk>

On 02/07/12 14:03, Jez Tucker wrote:
> Just had a lovely one.
>
> As I?m, sure all of you are aware by now, there?s been much fun with
> some of the NTP Stratum 1 servers not correctly accounting for the
> leap-seocnd last night.
>
> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>
> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>
> You may wish to turn off ntp on your servers and correct your NTP to
> trusted servers.
>
> A clock skew from ntp.pool.org just took out one of our servers and the
> node was expelled from the cluster.

Hum, not sure I would run my production servers directly off something
from ntp.pool.org, I would at least put a local server in between.

Not notice any problems here, but then we are running latest RHEL 5.8
and latest IBM Storage Manager (10.83) :-)

JAB.

--
Jonathan A. Buzzard             Tel: +441382-386998
Storage Administrator, College of Life Sciences
University of Dundee, DD1 5EH

The University of Dundee is a registered Scottish Charity, No: SC015096


From Jez.Tucker at rushes.co.uk  Mon Jul  2 14:59:34 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 13:59:34 +0000
Subject: [gpfsug-discuss] Samba mapping of "special" SID entries
In-Reply-To: <F68FAAD16AEC9744BD921F91014D2F0003DFB8@MBX06.ad.oak.ox.ac.uk>
References: <4FE486B2.1050501@ed.ac.uk>
	<F68FAAD16AEC9744BD921F91014D2F0003DFB8@MBX06.ad.oak.ox.ac.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9047@WARVWEXC1.uk.deluxe-eu.com>

Now I've located my GPFSUG from within Outlook...

I'm presuming you're creating an ACL with the equivalent of 2775 permissions and the owner file system being 'nfsv4', rather than 'all'?
Your nfsv3 clients have nfsv4 acl support installed?

Jez


> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Luke Raimbach
> Sent: 22 June 2012 17:33
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] Samba mapping of "special" SID entries
> 
> Hi Orlando,
> 
> I've been having success using Centrify to manage UID/GID mappings for our
> very small mixed cluster (7 x Linux, 1 x Windows 2008R2).
> 
> I've created a map for "CREATOR / OWNER", "SYSTEM", "Domain Admins",
> etc. group SIDs and use the Windows node to manage ACLs. When the
> windows node applies the ACLs, these seem to translate successfully in to
> GPFS ACLs and work nicely for the mixed environment allowing users on
> both Linux and Windows systems to manipulate each other's files.
> 
> People are mounting the FS via NFS (exported via the NSD Linux servers)
> and CIFS (shared from Win2k8R2). The permissions don't look friendly when
> you run ls -l on a Linux system over NFS but the ACLs do their job in
> preserving inheritable permissions, etc. If people want to see the 'real' ACL,
> they need to use mmgetacl on a GPFS attached node (or windows users
> simply click on the security tab under properties of a file).
> 
> Drop me a line off-list if you want to take a look at what we've got remotely.
> I can run a webex session from the Windows node if you want to have a
> good poke around.
> 
> Luke.
> 
> --
> 
> Luke Raimbach
> IT Manager
> Oxford e-Research Centre
> 7 Keble Road,
> Oxford,
> OX1 3QG
> 
> +44(0)1865 610639
> 
> 
> 
> 
> > -----Original Message-----
> > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> > bounces at gpfsug.org] On Behalf Of Orlando Richards
> > Sent: 22 June 2012 15:53
> > To: gpfsug-discuss at gpfsug.org
> > Subject: [gpfsug-discuss] Samba mapping of "special" SID entries
> >
> > Hi all,
> >
> > Has anyone bumped up against the "nfs4: special" option in GPFS/Samba
> > deployments which manipulates how the "owner" and "group owner"
> (and
> > "everybody") behaviour is mapped to ACLs when accessed via the samba
> > stack?
> >
> > In particular, with the "default" setting (if one blindly follows the
> > worked examples on this) of nfs4: special, if a user adds themselves
> > specifically to an ACL, this creates an entry:
> >
> > special:@owner
> >
> > rather than:
> >
> > user:username
> >
> > which has the knock-on effect that if a file/folder is created under
> > this ACL by a different owner (or if ownership changes), the person
> > who put said ACL on to the file/folder no longer has access. Most
> > people find this confusing (which is putting it politely).
> >
> > To further complicate matters, the "special" windows SID's*[1] - such
> > as "CREATOR/OWNER" -  don't seem to work properly in the
> > ctdb/samba/gpfs stack (I don't know if they do in "normal" samba
> > though). IBM don't support CREATOR/OWNER in SONAS*[2] - so it's not
> just me!
> >
> > So my question is - has anyone else been looking into this at all, and
> > if so, do you have any sage words of wisdom to offer?
> >
> > Cheers,
> > Orlando.
> >
> >
> > *[1] http://support.microsoft.com/kb/163846
> > *[2]
> > http://pic.dhe.ibm.com/infocenter/sonasic/sonas1ic/index.jsp?topic=%2F
> > c om.ibm.sonas.doc%2Fadm_authorization_limitations.html
> >
> >
> > --
> >              --
> >     Dr Orlando Richards
> >    Information Services
> > IT Infrastructure Division
> >         Unix Section
> >      Tel: 0131 650 4994
> >
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at gpfsug.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Mon Jul  2 15:05:25 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Mon, 2 Jul 2012 14:05:25 +0000
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <4FE8720C.7040007@gmail.com>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>

Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
That said, I've not tried it yet.  It's on my list of things to R&D.

OFED/ROCE/iWARP:

http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html


From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
Sent: 25 June 2012 15:14
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] HPC people - interconnects

On 25/06/12 15:08, Jez Tucker wrote:
Do you all use IB?

Has anyone tried RDMA over 10G via the OFED stack?


Most of our customers we use RDMA over verbs

Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?


--

regards,


Arif
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/0c5cd468/attachment-0003.htm>

From bevans at canditmedia.co.uk  Mon Jul  2 22:16:15 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 22:16:15 +0100
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
	<39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>

It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards

Cheers,
B


On 2 Jul 2012, at 15:05, Jez Tucker wrote:

> Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
> That said, I?ve not tried it yet.  It?s on my list of things to R&D.
>  
> OFED/ROCE/iWARP:
>  
> http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html
>  
>  
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
> Sent: 25 June 2012 15:14
> To: gpfsug-discuss at gpfsug.org
> Subject: Re: [gpfsug-discuss] HPC people - interconnects
>  
> On 25/06/12 15:08, Jez Tucker wrote:
> Do you all use IB?
>  
> Has anyone tried RDMA over 10G via the OFED stack?
>  
>  
> Most of our customers we use RDMA over verbs
> 
> Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?
> 
> -- 
> regards,
>  
> Arif
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/85045865/attachment-0003.htm>

From bevans at canditmedia.co.uk  Mon Jul  2 22:25:02 2012
From: bevans at canditmedia.co.uk (Barry Evans)
Date: Mon, 2 Jul 2012 22:25:02 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
	just has
In-Reply-To: <4FF1A945.5000100@dundee.ac.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF1A945.5000100@dundee.ac.uk>
Message-ID: <EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>

This has so far hit all almost all of the places I work with (not so much GPFS crashing, but certainly storage manager going bezerk) - the majority of them do not use public NTP servers. In most cases no one actually noticed until it was pointed out, well worth a quick 'top' of your storage servers if you're using Engenio/LSI/NetApp based units (ie, DS3/4/5000).

The fix is here: http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/

But of course I wouldn't mess with the time in production unless you've got GPFS shutdown first.

Cheers,
B

 
On 2 Jul 2012, at 14:59, Jonathan Buzzard wrote:

> On 02/07/12 14:03, Jez Tucker wrote:
>> Just had a lovely one.
>> 
>> As I?m, sure all of you are aware by now, there?s been much fun with
>> some of the NTP Stratum 1 servers not correctly accounting for the
>> leap-seocnd last night.
>> 
>> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml
>> 
>> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html
>> 
>> You may wish to turn off ntp on your servers and correct your NTP to
>> trusted servers.
>> 
>> A clock skew from ntp.pool.org just took out one of our servers and the
>> node was expelled from the cluster.
> 
> Hum, not sure I would run my production servers directly off something
> from ntp.pool.org, I would at least put a local server in between.
> 
> Not notice any problems here, but then we are running latest RHEL 5.8
> and latest IBM Storage Manager (10.83) :-)
> 
> JAB.
> 
> --
> Jonathan A. Buzzard             Tel: +441382-386998
> Storage Administrator, College of Life Sciences
> University of Dundee, DD1 5EH
> 
> The University of Dundee is a registered Scottish Charity, No: SC015096
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120702/6b478a8d/attachment-0003.htm>

From Jez.Tucker at rushes.co.uk  Tue Jul  3 11:38:10 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Tue, 3 Jul 2012 10:38:10 +0000
Subject: [gpfsug-discuss] HPC people - interconnects
In-Reply-To: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com>
	<4FE8720C.7040007@gmail.com>
	<39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>,
	<697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305A93A7@WARVWEXC1.uk.deluxe-eu.com>

Here's the stack:

https://www.openfabrics.org/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html

VERBS is supported over 10GbE.  It should work if OFED VERBS == IBM VERBS.

---
Jez Tucker
Senior SysAdmin
Rushes
www.rushes.co.uk<http://www.rushes.co.uk>
________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Barry Evans [bevans at canditmedia.co.uk]
Sent: 02 July 2012 22:16
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] HPC people - interconnects

It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards

Cheers,
B


On 2 Jul 2012, at 15:05, Jez Tucker wrote:

Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA.
That said, I?ve not tried it yet.  It?s on my list of things to R&D.

OFED/ROCE/iWARP:

http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html


From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali
Sent: 25 June 2012 15:14
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] HPC people - interconnects

On 25/06/12 15:08, Jez Tucker wrote:
Do you all use IB?

Has anyone tried RDMA over 10G via the OFED stack?


Most of our customers we use RDMA over verbs

Is this the same thing you mentioned  a few weeks ago with respect to ROCE. Does gpfs even support this?


--

regards,


Arif

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120703/2c9679e9/attachment-0003.htm>

From j.buzzard at dundee.ac.uk  Tue Jul  3 12:01:33 2012
From: j.buzzard at dundee.ac.uk (Jonathan Buzzard)
Date: Tue, 3 Jul 2012 12:01:33 +0100
Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it
 just has
In-Reply-To: <EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>
References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com>
	<4FF1A945.5000100@dundee.ac.uk>
	<EA951E70-058E-4FCC-92C7-367102CB56F2@canditmedia.co.uk>
Message-ID: <4FF2D10D.2030701@dundee.ac.uk>

On 02/07/12 22:25, Barry Evans wrote:
> This has so far hit all almost all of the places I work with (not so
> much GPFS crashing, but certainly storage manager going bezerk) - the
> majority of them do not use public NTP servers. In most cases no one
> actually noticed until it was pointed out, well worth a quick 'top' of
> your storage servers if you're using Engenio/LSI/NetApp based units (ie,
> DS3/4/5000).
>
> The fix is here:
> http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/
>

Or you can upgrade to the latest version of storage manager. We are
running 10.83 and it sailed through without issue. Now admittedly most
people have probably not upgraded as it has only been out for a couple
of weeks. I was very prompt on the upgrade as it allows the one Storage
Manager install to manage both IBM DS3/4/5000 and Dell MD3xxx from the
same program.

JAB.

--
Jonathan A. Buzzard             Tel: +441382-386998
Storage Administrator, College of Life Sciences
University of Dundee, DD1 5EH

The University of Dundee is a registered Scottish Charity, No: SC015096


From sfadden at us.ibm.com  Thu Jul  5 17:25:22 2012
From: sfadden at us.ibm.com (Scott Fadden)
Date: Thu, 5 Jul 2012 09:25:22 -0700
Subject: [gpfsug-discuss] GPFS Service Bulletin - July 5 2012
Message-ID: <OF566CBF8E.739C89D9-ON88257A32.005A1671-88257A32.005A3696@us.ibm.com>

Date Added: July 5, 2012
Issue:
IBM has identified an issue with GPFS file systems at versions 3.4 or 3.5 
which were migrated from file systems created with GPFS versions earlier 
than 3.4. This issue can occur only after using the mmmigratefs command 
with the [--fastea]option.
The issue can result in a loss of data, requiring the restoration of data 
from a backup source.
GPFS file systems created with versions earlier than 3.4 should not be 
migrated using the mmmigratefs command with the [--fastea] option until a 
fix is provided from IBM. IBM plans to make the fix available in GPFS 
versions
3.5.0.3 (APAR IV24151) and 3.4.0.15 (APAR IV24150). An ifix will also be 
available from IBM service.
If customers have already migrated file systems from GPFS versions earlier 
than 3.4, IBM service has a fix. Please follow the steps below to 
determine if your system may be affected.
To determine if your system may be affected:
1. Ensure your GPFS file systems are mounted.
2. As a user with GPFS administrator privileges on a machine where your 
GPFS file systems are mounted, issue the command:
        /usr/lpp/mmfs/bin/mmfsadm dump stripe | grep "inode 0"
The command will produce output that identifies locations for the "inode 
0" file for all currently
mounted GPFS file systems. Example output for a file system configured 
with two way meta-data
replication would be in the form:
        inode 0: 3:4098 1:4098
For a file system with no meta-data replication the output would be in the 
form:
        inode 0: 3:4098
The relevant information to look for to see if you may experience a 
problem are the fields denoting <disk>:<sector> for each inode 0 replica 
(e.g. 3:4098 and 1:4098 in these examples).
If each <disk>:<sector> replica only denotes 4098 for the sector field 
then you are not experiencing this problem. If however there is a number 
other than 4098 in the sector output then you are requested to immediately 
call IBM service and reference this problem. The IBM service person will 
walk you thru a fix for correcting the issue.


Scott Fadden
GPFS Technical Marketing 
Desk: (503) 578-5630 
Cell: (503) 880-5833 
sfadden at us.ibm.com
http://www.ibm.com/systems/gpfs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120705/193c1254/attachment-0003.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul  6 11:44:49 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 6 Jul 2012 10:44:49 +0000
Subject: [gpfsug-discuss] Almaden Labs pNFS attending September 20th User
	Group
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AB3F7@WARVWEXC1.uk.deluxe-eu.com>

Hello all

  I've received confirmation from Dean Hildebrand from the Almaden Research Labs that he will attend the September User Group to present pNFS.

Dean is available on the 21st to meet post user group for relevant discussion.
He will be based on London throughout his visit, owing to the proximity to AWE.

If you would like to meet Dean to discuss pNFS further, please arrange this with myself via email.

Cheers

Jez
---
Jez Tucker
Senior Sysadmin
Rushes

GPFSUG Chairman (chair at gpfsug.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120706/1e11d594/attachment-0003.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul 20 09:28:19 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 20 Jul 2012 08:28:19 +0000
Subject: [gpfsug-discuss] Your NSD server loadavg?
Message-ID: <39571EA9316BE44899D59C7A640C13F5305ADDFF@WARVWEXC1.uk.deluxe-eu.com>

Hello

  Just curious to see what your NSD server's loadavg is when under a normal job processing load.  I.E SGE running tasks over NFS.


---
Jez Tucker
Senior Sysadmin
Rushes
DDI: +44 (0) 207 851 6276
http://www.rushes.co.uk<http://www.rushes.co.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120720/895ab83e/attachment-0003.htm>

From Jez.Tucker at rushes.co.uk  Fri Jul 20 18:02:44 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Fri, 20 Jul 2012 17:02:44 +0000
Subject: [gpfsug-discuss] Great perf tool
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>


http://collectl.sourceforge.net/index.html

---
Jez Tucker
Senior Sysadmin
Rushes

GPFSUG Chairman (chair at gpfsug.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20120720/dbf4e765/attachment-0003.htm>

From janfrode at tanso.net  Fri Jul 20 23:23:54 2012
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Sat, 21 Jul 2012 00:23:54 +0200
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
Message-ID: <20120720222354.GB12126@dibs.tanso.net>

On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
> 
> http://collectl.sourceforge.net/index.html

Another great framework for collecting performance data is Performance
CoPilot:

	http://oss.sgi.com/projects/pcp/

It can collect and play live or re-play archived data from several nodes
in the same gui (or tui) player.

PCP is finally scheduled for inclusion in RHEL, so it's hopefully no
longer only us old SGI IRIX admins that will be using it anymore :-)

	http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf

It's already available in EPEL.


  -jf


From viccornell at gmail.com  Sat Jul 21 10:43:33 2012
From: viccornell at gmail.com (Vic Cornell)
Date: Sat, 21 Jul 2012 10:43:33 +0100
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <20120720222354.GB12126@dibs.tanso.net>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
	<20120720222354.GB12126@dibs.tanso.net>
Message-ID: <6259820223170424573@unknownmsgid>

I second that. The great thing about PCP is that it will monitor down
to 1/10 second which is really usefull when you want to see what is
realy going on.

Good news about it inclusion in RHEL. It also has a mac and windows
version so that you can instrument an entire setup and monitor it on
the box of your choice.

Runs best under IRIX though. . . .

Kind Regards,

Vic

Vic Cornell
Application Support Engineer
DataDirect Networks
Davidson House
Forbury Square
Reading
RG1 3EU
United Kingdom

Mobile 07900 660 266
Skype viccornell


www.ddn.com
This email may contain confidential and privileged material for the
sole use of the intended recipient.  Any review or distribution by
others is strictly prohibited.  If you are not the intended recipient
please contact the sender and delete all copies


On 20 Jul 2012, at 23:24, Jan-Frode Myklebust <janfrode at tanso.net> wrote:

> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
>>
>> http://collectl.sourceforge.net/index.html
>
> Another great framework for collecting performance data is Performance
> CoPilot:
>
>    http://oss.sgi.com/projects/pcp/
>
> It can collect and play live or re-play archived data from several nodes
> in the same gui (or tui) player.
>
> PCP is finally scheduled for inclusion in RHEL, so it's hopefully no
> longer only us old SGI IRIX admins that will be using it anymore :-)
>
>    http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf
>
> It's already available in EPEL.
>
>
>  -jf
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Jez.Tucker at rushes.co.uk  Tue Jul 24 09:22:13 2012
From: Jez.Tucker at rushes.co.uk (Jez Tucker)
Date: Tue, 24 Jul 2012 08:22:13 +0000
Subject: [gpfsug-discuss] Great perf tool
In-Reply-To: <20120720222354.GB12126@dibs.tanso.net>
References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com>
	<20120720222354.GB12126@dibs.tanso.net>
Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE76B@WARVWEXC1.uk.deluxe-eu.com>

Perhaps you or Vic could give a quick run through PCP at the next UG meeting?

> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-
> bounces at gpfsug.org] On Behalf Of Jan-Frode Myklebust
> Sent: 20 July 2012 23:24
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] Great perf tool
> 
> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote:
> >
> > http://collectl.sourceforge.net/index.html
> 
> Another great framework for collecting performance data is Performance
> CoPilot:
> 
> 	http://oss.sgi.com/projects/pcp/
> 
> It can collect and play live or re-play archived data from several nodes in the
> same gui (or tui) player.
> 
> PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer
> only us old SGI IRIX admins that will be using it anymore :-)
> 
> 	http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap
> .pdf
> 
> It's already available in EPEL.
> 
> 
>   -jf
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss