From Jez.Tucker at rushes.co.uk Mon Jul 2 14:03:21 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:03:21 +0000 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has Message-ID: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Just had a lovely one. As I'm, sure all of you are aware by now, there's been much fun with some of the NTP Stratum 1 servers not correctly accounting for the leap-seocnd last night. http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html You may wish to turn off ntp on your servers and correct your NTP to trusted servers. A clock skew from ntp.pool.org just took out one of our servers and the node was expelled from the cluster. Jez --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Mon Jul 2 14:12:46 2012 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Mon, 02 Jul 2012 14:12:46 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <4FF19E4E.3040802@ed.ac.uk> Hi Jez, We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems. What OS are you running on your affected server? Cheers, Orlando. On 02/07/12 14:03, Jez Tucker wrote: > Just had a lovely one. > > As I?m, sure all of you are aware by now, there?s been much fun with > some of the NTP Stratum 1 servers not correctly accounting for the > leap-seocnd last night. > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html > > You may wish to turn off ntp on your servers and correct your NTP to > trusted servers. > > A clock skew from ntp.pool.org just took out one of our servers and the > node was expelled from the cluster. > > Jez > > --- > > Jez Tucker > > Senior Sysadmin > > Rushes > > DDI: +44 (0) 207 851 6276 > > http://www.rushes.co.uk > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From bevans at canditmedia.co.uk Mon Jul 2 14:27:52 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 14:27:52 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF19E4E.3040802@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF19E4E.3040802@ed.ac.uk> Message-ID: I've had a few (this is growing number as the day goes on) number of customers with IBM Storage Manager eating loads of CPU cycles and causing slow downs as a result (SLES 10 SP1, mostly). The common factor, at least at this end, seems to be ntp sycning against public ntp pool servers and Java, but Jez's report...... scary stuff... Cheers, Barry On 2 Jul 2012, at 14:12, Orlando Richards wrote: > Hi Jez, > > We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems. > > What OS are you running on your affected server? > > Cheers, > Orlando. > > > On 02/07/12 14:03, Jez Tucker wrote: >> Just had a lovely one. >> >> As I?m, sure all of you are aware by now, there?s been much fun with >> some of the NTP Stratum 1 servers not correctly accounting for the >> leap-seocnd last night. >> >> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml >> >> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html >> >> You may wish to turn off ntp on your servers and correct your NTP to >> trusted servers. >> >> A clock skew from ntp.pool.org just took out one of our servers and the >> node was expelled from the cluster. >> >> Jez >> >> --- >> >> Jez Tucker >> >> Senior Sysadmin >> >> Rushes >> >> DDI: +44 (0) 207 851 6276 >> >> http://www.rushes.co.uk >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > -- > Dr Orlando Richards > Information Services > IT Infrastructure Division > Unix Section > Tel: 0131 650 4994 > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Mon Jul 2 14:47:21 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:47:21 +0000 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF19E4E.3040802@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF19E4E.3040802@ed.ac.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9010@WARVWEXC1.uk.deluxe-eu.com> Funnily enough RH Ent 6.2 > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Orlando Richards > Sent: 02 July 2012 14:13 > To: gpfsug-discuss at gpfsug.org > Subject: Re: [gpfsug-discuss] NTP leap-second can take out GPFS server - it > just has > > Hi Jez, > > We've had a few issues with the leap second - but so far it has been isolated > to redhat 6.2 systems. > > What OS are you running on your affected server? > > Cheers, > Orlando. > > > On 02/07/12 14:03, Jez Tucker wrote: > > Just had a lovely one. > > > > As I'm, sure all of you are aware by now, there's been much fun with > > some of the NTP Stratum 1 servers not correctly accounting for the > > leap-seocnd last night. > > > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012- > Websi > > tes-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second- > brings- > > down-top-websites.html > > > > You may wish to turn off ntp on your servers and correct your NTP to > > trusted servers. > > > > A clock skew from ntp.pool.org just took out one of our servers and > > the node was expelled from the cluster. > > > > Jez > > > > --- > > > > Jez Tucker > > > > Senior Sysadmin > > > > Rushes > > > > DDI: +44 (0) 207 851 6276 > > > > http://www.rushes.co.uk > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > -- > Dr Orlando Richards > Information Services > IT Infrastructure Division > Unix Section > Tel: 0131 650 4994 > > The University of Edinburgh is a charitable body, registered in Scotland, with > registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From j.buzzard at dundee.ac.uk Mon Jul 2 14:59:33 2012 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Mon, 2 Jul 2012 14:59:33 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <4FF1A945.5000100@dundee.ac.uk> On 02/07/12 14:03, Jez Tucker wrote: > Just had a lovely one. > > As I?m, sure all of you are aware by now, there?s been much fun with > some of the NTP Stratum 1 servers not correctly accounting for the > leap-seocnd last night. > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html > > You may wish to turn off ntp on your servers and correct your NTP to > trusted servers. > > A clock skew from ntp.pool.org just took out one of our servers and the > node was expelled from the cluster. Hum, not sure I would run my production servers directly off something from ntp.pool.org, I would at least put a local server in between. Not notice any problems here, but then we are running latest RHEL 5.8 and latest IBM Storage Manager (10.83) :-) JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH The University of Dundee is a registered Scottish Charity, No: SC015096 From Jez.Tucker at rushes.co.uk Mon Jul 2 14:59:34 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:59:34 +0000 Subject: [gpfsug-discuss] Samba mapping of "special" SID entries In-Reply-To: References: <4FE486B2.1050501@ed.ac.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9047@WARVWEXC1.uk.deluxe-eu.com> Now I've located my GPFSUG from within Outlook... I'm presuming you're creating an ACL with the equivalent of 2775 permissions and the owner file system being 'nfsv4', rather than 'all'? Your nfsv3 clients have nfsv4 acl support installed? Jez > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Luke Raimbach > Sent: 22 June 2012 17:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Samba mapping of "special" SID entries > > Hi Orlando, > > I've been having success using Centrify to manage UID/GID mappings for our > very small mixed cluster (7 x Linux, 1 x Windows 2008R2). > > I've created a map for "CREATOR / OWNER", "SYSTEM", "Domain Admins", > etc. group SIDs and use the Windows node to manage ACLs. When the > windows node applies the ACLs, these seem to translate successfully in to > GPFS ACLs and work nicely for the mixed environment allowing users on > both Linux and Windows systems to manipulate each other's files. > > People are mounting the FS via NFS (exported via the NSD Linux servers) > and CIFS (shared from Win2k8R2). The permissions don't look friendly when > you run ls -l on a Linux system over NFS but the ACLs do their job in > preserving inheritable permissions, etc. If people want to see the 'real' ACL, > they need to use mmgetacl on a GPFS attached node (or windows users > simply click on the security tab under properties of a file). > > Drop me a line off-list if you want to take a look at what we've got remotely. > I can run a webex session from the Windows node if you want to have a > good poke around. > > Luke. > > -- > > Luke Raimbach > IT Manager > Oxford e-Research Centre > 7 Keble Road, > Oxford, > OX1 3QG > > +44(0)1865 610639 > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > > bounces at gpfsug.org] On Behalf Of Orlando Richards > > Sent: 22 June 2012 15:53 > > To: gpfsug-discuss at gpfsug.org > > Subject: [gpfsug-discuss] Samba mapping of "special" SID entries > > > > Hi all, > > > > Has anyone bumped up against the "nfs4: special" option in GPFS/Samba > > deployments which manipulates how the "owner" and "group owner" > (and > > "everybody") behaviour is mapped to ACLs when accessed via the samba > > stack? > > > > In particular, with the "default" setting (if one blindly follows the > > worked examples on this) of nfs4: special, if a user adds themselves > > specifically to an ACL, this creates an entry: > > > > special:@owner > > > > rather than: > > > > user:username > > > > which has the knock-on effect that if a file/folder is created under > > this ACL by a different owner (or if ownership changes), the person > > who put said ACL on to the file/folder no longer has access. Most > > people find this confusing (which is putting it politely). > > > > To further complicate matters, the "special" windows SID's*[1] - such > > as "CREATOR/OWNER" - don't seem to work properly in the > > ctdb/samba/gpfs stack (I don't know if they do in "normal" samba > > though). IBM don't support CREATOR/OWNER in SONAS*[2] - so it's not > just me! > > > > So my question is - has anyone else been looking into this at all, and > > if so, do you have any sage words of wisdom to offer? > > > > Cheers, > > Orlando. > > > > > > *[1] http://support.microsoft.com/kb/163846 > > *[2] > > http://pic.dhe.ibm.com/infocenter/sonasic/sonas1ic/index.jsp?topic=%2F > > c om.ibm.sonas.doc%2Fadm_authorization_limitations.html > > > > > > -- > > -- > > Dr Orlando Richards > > Information Services > > IT Infrastructure Division > > Unix Section > > Tel: 0131 650 4994 > > > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Mon Jul 2 15:05:25 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 14:05:25 +0000 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <4FE8720C.7040007@gmail.com> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. That said, I've not tried it yet. It's on my list of things to R&D. OFED/ROCE/iWARP: http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali Sent: 25 June 2012 15:14 To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] HPC people - interconnects On 25/06/12 15:08, Jez Tucker wrote: Do you all use IB? Has anyone tried RDMA over 10G via the OFED stack? Most of our customers we use RDMA over verbs Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? -- regards, Arif -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at canditmedia.co.uk Mon Jul 2 22:16:15 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 22:16:15 +0100 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards Cheers, B On 2 Jul 2012, at 15:05, Jez Tucker wrote: > Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. > That said, I?ve not tried it yet. It?s on my list of things to R&D. > > OFED/ROCE/iWARP: > > http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html > > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali > Sent: 25 June 2012 15:14 > To: gpfsug-discuss at gpfsug.org > Subject: Re: [gpfsug-discuss] HPC people - interconnects > > On 25/06/12 15:08, Jez Tucker wrote: > Do you all use IB? > > Has anyone tried RDMA over 10G via the OFED stack? > > > Most of our customers we use RDMA over verbs > > Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? > > -- > regards, > > Arif > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at canditmedia.co.uk Mon Jul 2 22:25:02 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 22:25:02 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF1A945.5000100@dundee.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF1A945.5000100@dundee.ac.uk> Message-ID: This has so far hit all almost all of the places I work with (not so much GPFS crashing, but certainly storage manager going bezerk) - the majority of them do not use public NTP servers. In most cases no one actually noticed until it was pointed out, well worth a quick 'top' of your storage servers if you're using Engenio/LSI/NetApp based units (ie, DS3/4/5000). The fix is here: http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ But of course I wouldn't mess with the time in production unless you've got GPFS shutdown first. Cheers, B On 2 Jul 2012, at 14:59, Jonathan Buzzard wrote: > On 02/07/12 14:03, Jez Tucker wrote: >> Just had a lovely one. >> >> As I?m, sure all of you are aware by now, there?s been much fun with >> some of the NTP Stratum 1 servers not correctly accounting for the >> leap-seocnd last night. >> >> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml >> >> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html >> >> You may wish to turn off ntp on your servers and correct your NTP to >> trusted servers. >> >> A clock skew from ntp.pool.org just took out one of our servers and the >> node was expelled from the cluster. > > Hum, not sure I would run my production servers directly off something > from ntp.pool.org, I would at least put a local server in between. > > Not notice any problems here, but then we are running latest RHEL 5.8 > and latest IBM Storage Manager (10.83) :-) > > JAB. > > -- > Jonathan A. Buzzard Tel: +441382-386998 > Storage Administrator, College of Life Sciences > University of Dundee, DD1 5EH > > The University of Dundee is a registered Scottish Charity, No: SC015096 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Tue Jul 3 11:38:10 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Tue, 3 Jul 2012 10:38:10 +0000 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>, <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A93A7@WARVWEXC1.uk.deluxe-eu.com> Here's the stack: https://www.openfabrics.org/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html VERBS is supported over 10GbE. It should work if OFED VERBS == IBM VERBS. --- Jez Tucker Senior SysAdmin Rushes www.rushes.co.uk ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Barry Evans [bevans at canditmedia.co.uk] Sent: 02 July 2012 22:16 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HPC people - interconnects It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards Cheers, B On 2 Jul 2012, at 15:05, Jez Tucker wrote: Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. That said, I?ve not tried it yet. It?s on my list of things to R&D. OFED/ROCE/iWARP: http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali Sent: 25 June 2012 15:14 To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] HPC people - interconnects On 25/06/12 15:08, Jez Tucker wrote: Do you all use IB? Has anyone tried RDMA over 10G via the OFED stack? Most of our customers we use RDMA over verbs Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? -- regards, Arif _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.buzzard at dundee.ac.uk Tue Jul 3 12:01:33 2012 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Tue, 3 Jul 2012 12:01:33 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF1A945.5000100@dundee.ac.uk> Message-ID: <4FF2D10D.2030701@dundee.ac.uk> On 02/07/12 22:25, Barry Evans wrote: > This has so far hit all almost all of the places I work with (not so > much GPFS crashing, but certainly storage manager going bezerk) - the > majority of them do not use public NTP servers. In most cases no one > actually noticed until it was pointed out, well worth a quick 'top' of > your storage servers if you're using Engenio/LSI/NetApp based units (ie, > DS3/4/5000). > > The fix is here: > http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ > Or you can upgrade to the latest version of storage manager. We are running 10.83 and it sailed through without issue. Now admittedly most people have probably not upgraded as it has only been out for a couple of weeks. I was very prompt on the upgrade as it allows the one Storage Manager install to manage both IBM DS3/4/5000 and Dell MD3xxx from the same program. JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH The University of Dundee is a registered Scottish Charity, No: SC015096 From sfadden at us.ibm.com Thu Jul 5 17:25:22 2012 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 5 Jul 2012 09:25:22 -0700 Subject: [gpfsug-discuss] GPFS Service Bulletin - July 5 2012 Message-ID: Date Added: July 5, 2012 Issue: IBM has identified an issue with GPFS file systems at versions 3.4 or 3.5 which were migrated from file systems created with GPFS versions earlier than 3.4. This issue can occur only after using the mmmigratefs command with the [--fastea]option. The issue can result in a loss of data, requiring the restoration of data from a backup source. GPFS file systems created with versions earlier than 3.4 should not be migrated using the mmmigratefs command with the [--fastea] option until a fix is provided from IBM. IBM plans to make the fix available in GPFS versions 3.5.0.3 (APAR IV24151) and 3.4.0.15 (APAR IV24150). An ifix will also be available from IBM service. If customers have already migrated file systems from GPFS versions earlier than 3.4, IBM service has a fix. Please follow the steps below to determine if your system may be affected. To determine if your system may be affected: 1. Ensure your GPFS file systems are mounted. 2. As a user with GPFS administrator privileges on a machine where your GPFS file systems are mounted, issue the command: /usr/lpp/mmfs/bin/mmfsadm dump stripe | grep "inode 0" The command will produce output that identifies locations for the "inode 0" file for all currently mounted GPFS file systems. Example output for a file system configured with two way meta-data replication would be in the form: inode 0: 3:4098 1:4098 For a file system with no meta-data replication the output would be in the form: inode 0: 3:4098 The relevant information to look for to see if you may experience a problem are the fields denoting : for each inode 0 replica (e.g. 3:4098 and 1:4098 in these examples). If each : replica only denotes 4098 for the sector field then you are not experiencing this problem. If however there is a number other than 4098 in the sector output then you are requested to immediately call IBM service and reference this problem. The IBM service person will walk you thru a fix for correcting the issue. Scott Fadden GPFS Technical Marketing Desk: (503) 578-5630 Cell: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/gpfs -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 6 11:44:49 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 6 Jul 2012 10:44:49 +0000 Subject: [gpfsug-discuss] Almaden Labs pNFS attending September 20th User Group Message-ID: <39571EA9316BE44899D59C7A640C13F5305AB3F7@WARVWEXC1.uk.deluxe-eu.com> Hello all I've received confirmation from Dean Hildebrand from the Almaden Research Labs that he will attend the September User Group to present pNFS. Dean is available on the 21st to meet post user group for relevant discussion. He will be based on London throughout his visit, owing to the proximity to AWE. If you would like to meet Dean to discuss pNFS further, please arrange this with myself via email. Cheers Jez --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 20 09:28:19 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 20 Jul 2012 08:28:19 +0000 Subject: [gpfsug-discuss] Your NSD server loadavg? Message-ID: <39571EA9316BE44899D59C7A640C13F5305ADDFF@WARVWEXC1.uk.deluxe-eu.com> Hello Just curious to see what your NSD server's loadavg is when under a normal job processing load. I.E SGE running tasks over NFS. --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 20 18:02:44 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 20 Jul 2012 17:02:44 +0000 Subject: [gpfsug-discuss] Great perf tool Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> http://collectl.sourceforge.net/index.html --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Fri Jul 20 23:23:54 2012 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sat, 21 Jul 2012 00:23:54 +0200 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <20120720222354.GB12126@dibs.tanso.net> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: > > http://collectl.sourceforge.net/index.html Another great framework for collecting performance data is Performance CoPilot: http://oss.sgi.com/projects/pcp/ It can collect and play live or re-play archived data from several nodes in the same gui (or tui) player. PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer only us old SGI IRIX admins that will be using it anymore :-) http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf It's already available in EPEL. -jf From viccornell at gmail.com Sat Jul 21 10:43:33 2012 From: viccornell at gmail.com (Vic Cornell) Date: Sat, 21 Jul 2012 10:43:33 +0100 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <20120720222354.GB12126@dibs.tanso.net> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> <20120720222354.GB12126@dibs.tanso.net> Message-ID: <6259820223170424573@unknownmsgid> I second that. The great thing about PCP is that it will monitor down to 1/10 second which is really usefull when you want to see what is realy going on. Good news about it inclusion in RHEL. It also has a mac and windows version so that you can instrument an entire setup and monitor it on the box of your choice. Runs best under IRIX though. . . . Kind Regards, Vic Vic Cornell Application Support Engineer DataDirect Networks Davidson House Forbury Square Reading RG1 3EU United Kingdom Mobile 07900 660 266 Skype viccornell www.ddn.com This email may contain confidential and privileged material for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient please contact the sender and delete all copies On 20 Jul 2012, at 23:24, Jan-Frode Myklebust wrote: > On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: >> >> http://collectl.sourceforge.net/index.html > > Another great framework for collecting performance data is Performance > CoPilot: > > http://oss.sgi.com/projects/pcp/ > > It can collect and play live or re-play archived data from several nodes > in the same gui (or tui) player. > > PCP is finally scheduled for inclusion in RHEL, so it's hopefully no > longer only us old SGI IRIX admins that will be using it anymore :-) > > http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf > > It's already available in EPEL. > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Tue Jul 24 09:22:13 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Tue, 24 Jul 2012 08:22:13 +0000 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <20120720222354.GB12126@dibs.tanso.net> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> <20120720222354.GB12126@dibs.tanso.net> Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE76B@WARVWEXC1.uk.deluxe-eu.com> Perhaps you or Vic could give a quick run through PCP at the next UG meeting? > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Jan-Frode Myklebust > Sent: 20 July 2012 23:24 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Great perf tool > > On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: > > > > http://collectl.sourceforge.net/index.html > > Another great framework for collecting performance data is Performance > CoPilot: > > http://oss.sgi.com/projects/pcp/ > > It can collect and play live or re-play archived data from several nodes in the > same gui (or tui) player. > > PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer > only us old SGI IRIX admins that will be using it anymore :-) > > http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap > .pdf > > It's already available in EPEL. > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Mon Jul 2 14:03:21 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:03:21 +0000 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has Message-ID: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Just had a lovely one. As I'm, sure all of you are aware by now, there's been much fun with some of the NTP Stratum 1 servers not correctly accounting for the leap-seocnd last night. http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html You may wish to turn off ntp on your servers and correct your NTP to trusted servers. A clock skew from ntp.pool.org just took out one of our servers and the node was expelled from the cluster. Jez --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Mon Jul 2 14:12:46 2012 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Mon, 02 Jul 2012 14:12:46 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <4FF19E4E.3040802@ed.ac.uk> Hi Jez, We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems. What OS are you running on your affected server? Cheers, Orlando. On 02/07/12 14:03, Jez Tucker wrote: > Just had a lovely one. > > As I?m, sure all of you are aware by now, there?s been much fun with > some of the NTP Stratum 1 servers not correctly accounting for the > leap-seocnd last night. > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html > > You may wish to turn off ntp on your servers and correct your NTP to > trusted servers. > > A clock skew from ntp.pool.org just took out one of our servers and the > node was expelled from the cluster. > > Jez > > --- > > Jez Tucker > > Senior Sysadmin > > Rushes > > DDI: +44 (0) 207 851 6276 > > http://www.rushes.co.uk > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From bevans at canditmedia.co.uk Mon Jul 2 14:27:52 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 14:27:52 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF19E4E.3040802@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF19E4E.3040802@ed.ac.uk> Message-ID: I've had a few (this is growing number as the day goes on) number of customers with IBM Storage Manager eating loads of CPU cycles and causing slow downs as a result (SLES 10 SP1, mostly). The common factor, at least at this end, seems to be ntp sycning against public ntp pool servers and Java, but Jez's report...... scary stuff... Cheers, Barry On 2 Jul 2012, at 14:12, Orlando Richards wrote: > Hi Jez, > > We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems. > > What OS are you running on your affected server? > > Cheers, > Orlando. > > > On 02/07/12 14:03, Jez Tucker wrote: >> Just had a lovely one. >> >> As I?m, sure all of you are aware by now, there?s been much fun with >> some of the NTP Stratum 1 servers not correctly accounting for the >> leap-seocnd last night. >> >> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml >> >> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html >> >> You may wish to turn off ntp on your servers and correct your NTP to >> trusted servers. >> >> A clock skew from ntp.pool.org just took out one of our servers and the >> node was expelled from the cluster. >> >> Jez >> >> --- >> >> Jez Tucker >> >> Senior Sysadmin >> >> Rushes >> >> DDI: +44 (0) 207 851 6276 >> >> http://www.rushes.co.uk >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > -- > Dr Orlando Richards > Information Services > IT Infrastructure Division > Unix Section > Tel: 0131 650 4994 > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Mon Jul 2 14:47:21 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:47:21 +0000 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF19E4E.3040802@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF19E4E.3040802@ed.ac.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9010@WARVWEXC1.uk.deluxe-eu.com> Funnily enough RH Ent 6.2 > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Orlando Richards > Sent: 02 July 2012 14:13 > To: gpfsug-discuss at gpfsug.org > Subject: Re: [gpfsug-discuss] NTP leap-second can take out GPFS server - it > just has > > Hi Jez, > > We've had a few issues with the leap second - but so far it has been isolated > to redhat 6.2 systems. > > What OS are you running on your affected server? > > Cheers, > Orlando. > > > On 02/07/12 14:03, Jez Tucker wrote: > > Just had a lovely one. > > > > As I'm, sure all of you are aware by now, there's been much fun with > > some of the NTP Stratum 1 servers not correctly accounting for the > > leap-seocnd last night. > > > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012- > Websi > > tes-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second- > brings- > > down-top-websites.html > > > > You may wish to turn off ntp on your servers and correct your NTP to > > trusted servers. > > > > A clock skew from ntp.pool.org just took out one of our servers and > > the node was expelled from the cluster. > > > > Jez > > > > --- > > > > Jez Tucker > > > > Senior Sysadmin > > > > Rushes > > > > DDI: +44 (0) 207 851 6276 > > > > http://www.rushes.co.uk > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > -- > Dr Orlando Richards > Information Services > IT Infrastructure Division > Unix Section > Tel: 0131 650 4994 > > The University of Edinburgh is a charitable body, registered in Scotland, with > registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From j.buzzard at dundee.ac.uk Mon Jul 2 14:59:33 2012 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Mon, 2 Jul 2012 14:59:33 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <4FF1A945.5000100@dundee.ac.uk> On 02/07/12 14:03, Jez Tucker wrote: > Just had a lovely one. > > As I?m, sure all of you are aware by now, there?s been much fun with > some of the NTP Stratum 1 servers not correctly accounting for the > leap-seocnd last night. > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html > > You may wish to turn off ntp on your servers and correct your NTP to > trusted servers. > > A clock skew from ntp.pool.org just took out one of our servers and the > node was expelled from the cluster. Hum, not sure I would run my production servers directly off something from ntp.pool.org, I would at least put a local server in between. Not notice any problems here, but then we are running latest RHEL 5.8 and latest IBM Storage Manager (10.83) :-) JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH The University of Dundee is a registered Scottish Charity, No: SC015096 From Jez.Tucker at rushes.co.uk Mon Jul 2 14:59:34 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:59:34 +0000 Subject: [gpfsug-discuss] Samba mapping of "special" SID entries In-Reply-To: References: <4FE486B2.1050501@ed.ac.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9047@WARVWEXC1.uk.deluxe-eu.com> Now I've located my GPFSUG from within Outlook... I'm presuming you're creating an ACL with the equivalent of 2775 permissions and the owner file system being 'nfsv4', rather than 'all'? Your nfsv3 clients have nfsv4 acl support installed? Jez > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Luke Raimbach > Sent: 22 June 2012 17:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Samba mapping of "special" SID entries > > Hi Orlando, > > I've been having success using Centrify to manage UID/GID mappings for our > very small mixed cluster (7 x Linux, 1 x Windows 2008R2). > > I've created a map for "CREATOR / OWNER", "SYSTEM", "Domain Admins", > etc. group SIDs and use the Windows node to manage ACLs. When the > windows node applies the ACLs, these seem to translate successfully in to > GPFS ACLs and work nicely for the mixed environment allowing users on > both Linux and Windows systems to manipulate each other's files. > > People are mounting the FS via NFS (exported via the NSD Linux servers) > and CIFS (shared from Win2k8R2). The permissions don't look friendly when > you run ls -l on a Linux system over NFS but the ACLs do their job in > preserving inheritable permissions, etc. If people want to see the 'real' ACL, > they need to use mmgetacl on a GPFS attached node (or windows users > simply click on the security tab under properties of a file). > > Drop me a line off-list if you want to take a look at what we've got remotely. > I can run a webex session from the Windows node if you want to have a > good poke around. > > Luke. > > -- > > Luke Raimbach > IT Manager > Oxford e-Research Centre > 7 Keble Road, > Oxford, > OX1 3QG > > +44(0)1865 610639 > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > > bounces at gpfsug.org] On Behalf Of Orlando Richards > > Sent: 22 June 2012 15:53 > > To: gpfsug-discuss at gpfsug.org > > Subject: [gpfsug-discuss] Samba mapping of "special" SID entries > > > > Hi all, > > > > Has anyone bumped up against the "nfs4: special" option in GPFS/Samba > > deployments which manipulates how the "owner" and "group owner" > (and > > "everybody") behaviour is mapped to ACLs when accessed via the samba > > stack? > > > > In particular, with the "default" setting (if one blindly follows the > > worked examples on this) of nfs4: special, if a user adds themselves > > specifically to an ACL, this creates an entry: > > > > special:@owner > > > > rather than: > > > > user:username > > > > which has the knock-on effect that if a file/folder is created under > > this ACL by a different owner (or if ownership changes), the person > > who put said ACL on to the file/folder no longer has access. Most > > people find this confusing (which is putting it politely). > > > > To further complicate matters, the "special" windows SID's*[1] - such > > as "CREATOR/OWNER" - don't seem to work properly in the > > ctdb/samba/gpfs stack (I don't know if they do in "normal" samba > > though). IBM don't support CREATOR/OWNER in SONAS*[2] - so it's not > just me! > > > > So my question is - has anyone else been looking into this at all, and > > if so, do you have any sage words of wisdom to offer? > > > > Cheers, > > Orlando. > > > > > > *[1] http://support.microsoft.com/kb/163846 > > *[2] > > http://pic.dhe.ibm.com/infocenter/sonasic/sonas1ic/index.jsp?topic=%2F > > c om.ibm.sonas.doc%2Fadm_authorization_limitations.html > > > > > > -- > > -- > > Dr Orlando Richards > > Information Services > > IT Infrastructure Division > > Unix Section > > Tel: 0131 650 4994 > > > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Mon Jul 2 15:05:25 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 14:05:25 +0000 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <4FE8720C.7040007@gmail.com> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. That said, I've not tried it yet. It's on my list of things to R&D. OFED/ROCE/iWARP: http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali Sent: 25 June 2012 15:14 To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] HPC people - interconnects On 25/06/12 15:08, Jez Tucker wrote: Do you all use IB? Has anyone tried RDMA over 10G via the OFED stack? Most of our customers we use RDMA over verbs Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? -- regards, Arif -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at canditmedia.co.uk Mon Jul 2 22:16:15 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 22:16:15 +0100 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards Cheers, B On 2 Jul 2012, at 15:05, Jez Tucker wrote: > Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. > That said, I?ve not tried it yet. It?s on my list of things to R&D. > > OFED/ROCE/iWARP: > > http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html > > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali > Sent: 25 June 2012 15:14 > To: gpfsug-discuss at gpfsug.org > Subject: Re: [gpfsug-discuss] HPC people - interconnects > > On 25/06/12 15:08, Jez Tucker wrote: > Do you all use IB? > > Has anyone tried RDMA over 10G via the OFED stack? > > > Most of our customers we use RDMA over verbs > > Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? > > -- > regards, > > Arif > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at canditmedia.co.uk Mon Jul 2 22:25:02 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 22:25:02 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF1A945.5000100@dundee.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF1A945.5000100@dundee.ac.uk> Message-ID: This has so far hit all almost all of the places I work with (not so much GPFS crashing, but certainly storage manager going bezerk) - the majority of them do not use public NTP servers. In most cases no one actually noticed until it was pointed out, well worth a quick 'top' of your storage servers if you're using Engenio/LSI/NetApp based units (ie, DS3/4/5000). The fix is here: http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ But of course I wouldn't mess with the time in production unless you've got GPFS shutdown first. Cheers, B On 2 Jul 2012, at 14:59, Jonathan Buzzard wrote: > On 02/07/12 14:03, Jez Tucker wrote: >> Just had a lovely one. >> >> As I?m, sure all of you are aware by now, there?s been much fun with >> some of the NTP Stratum 1 servers not correctly accounting for the >> leap-seocnd last night. >> >> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml >> >> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html >> >> You may wish to turn off ntp on your servers and correct your NTP to >> trusted servers. >> >> A clock skew from ntp.pool.org just took out one of our servers and the >> node was expelled from the cluster. > > Hum, not sure I would run my production servers directly off something > from ntp.pool.org, I would at least put a local server in between. > > Not notice any problems here, but then we are running latest RHEL 5.8 > and latest IBM Storage Manager (10.83) :-) > > JAB. > > -- > Jonathan A. Buzzard Tel: +441382-386998 > Storage Administrator, College of Life Sciences > University of Dundee, DD1 5EH > > The University of Dundee is a registered Scottish Charity, No: SC015096 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Tue Jul 3 11:38:10 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Tue, 3 Jul 2012 10:38:10 +0000 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>, <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A93A7@WARVWEXC1.uk.deluxe-eu.com> Here's the stack: https://www.openfabrics.org/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html VERBS is supported over 10GbE. It should work if OFED VERBS == IBM VERBS. --- Jez Tucker Senior SysAdmin Rushes www.rushes.co.uk ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Barry Evans [bevans at canditmedia.co.uk] Sent: 02 July 2012 22:16 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HPC people - interconnects It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards Cheers, B On 2 Jul 2012, at 15:05, Jez Tucker wrote: Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. That said, I?ve not tried it yet. It?s on my list of things to R&D. OFED/ROCE/iWARP: http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali Sent: 25 June 2012 15:14 To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] HPC people - interconnects On 25/06/12 15:08, Jez Tucker wrote: Do you all use IB? Has anyone tried RDMA over 10G via the OFED stack? Most of our customers we use RDMA over verbs Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? -- regards, Arif _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.buzzard at dundee.ac.uk Tue Jul 3 12:01:33 2012 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Tue, 3 Jul 2012 12:01:33 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF1A945.5000100@dundee.ac.uk> Message-ID: <4FF2D10D.2030701@dundee.ac.uk> On 02/07/12 22:25, Barry Evans wrote: > This has so far hit all almost all of the places I work with (not so > much GPFS crashing, but certainly storage manager going bezerk) - the > majority of them do not use public NTP servers. In most cases no one > actually noticed until it was pointed out, well worth a quick 'top' of > your storage servers if you're using Engenio/LSI/NetApp based units (ie, > DS3/4/5000). > > The fix is here: > http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ > Or you can upgrade to the latest version of storage manager. We are running 10.83 and it sailed through without issue. Now admittedly most people have probably not upgraded as it has only been out for a couple of weeks. I was very prompt on the upgrade as it allows the one Storage Manager install to manage both IBM DS3/4/5000 and Dell MD3xxx from the same program. JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH The University of Dundee is a registered Scottish Charity, No: SC015096 From sfadden at us.ibm.com Thu Jul 5 17:25:22 2012 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 5 Jul 2012 09:25:22 -0700 Subject: [gpfsug-discuss] GPFS Service Bulletin - July 5 2012 Message-ID: Date Added: July 5, 2012 Issue: IBM has identified an issue with GPFS file systems at versions 3.4 or 3.5 which were migrated from file systems created with GPFS versions earlier than 3.4. This issue can occur only after using the mmmigratefs command with the [--fastea]option. The issue can result in a loss of data, requiring the restoration of data from a backup source. GPFS file systems created with versions earlier than 3.4 should not be migrated using the mmmigratefs command with the [--fastea] option until a fix is provided from IBM. IBM plans to make the fix available in GPFS versions 3.5.0.3 (APAR IV24151) and 3.4.0.15 (APAR IV24150). An ifix will also be available from IBM service. If customers have already migrated file systems from GPFS versions earlier than 3.4, IBM service has a fix. Please follow the steps below to determine if your system may be affected. To determine if your system may be affected: 1. Ensure your GPFS file systems are mounted. 2. As a user with GPFS administrator privileges on a machine where your GPFS file systems are mounted, issue the command: /usr/lpp/mmfs/bin/mmfsadm dump stripe | grep "inode 0" The command will produce output that identifies locations for the "inode 0" file for all currently mounted GPFS file systems. Example output for a file system configured with two way meta-data replication would be in the form: inode 0: 3:4098 1:4098 For a file system with no meta-data replication the output would be in the form: inode 0: 3:4098 The relevant information to look for to see if you may experience a problem are the fields denoting : for each inode 0 replica (e.g. 3:4098 and 1:4098 in these examples). If each : replica only denotes 4098 for the sector field then you are not experiencing this problem. If however there is a number other than 4098 in the sector output then you are requested to immediately call IBM service and reference this problem. The IBM service person will walk you thru a fix for correcting the issue. Scott Fadden GPFS Technical Marketing Desk: (503) 578-5630 Cell: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/gpfs -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 6 11:44:49 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 6 Jul 2012 10:44:49 +0000 Subject: [gpfsug-discuss] Almaden Labs pNFS attending September 20th User Group Message-ID: <39571EA9316BE44899D59C7A640C13F5305AB3F7@WARVWEXC1.uk.deluxe-eu.com> Hello all I've received confirmation from Dean Hildebrand from the Almaden Research Labs that he will attend the September User Group to present pNFS. Dean is available on the 21st to meet post user group for relevant discussion. He will be based on London throughout his visit, owing to the proximity to AWE. If you would like to meet Dean to discuss pNFS further, please arrange this with myself via email. Cheers Jez --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 20 09:28:19 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 20 Jul 2012 08:28:19 +0000 Subject: [gpfsug-discuss] Your NSD server loadavg? Message-ID: <39571EA9316BE44899D59C7A640C13F5305ADDFF@WARVWEXC1.uk.deluxe-eu.com> Hello Just curious to see what your NSD server's loadavg is when under a normal job processing load. I.E SGE running tasks over NFS. --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 20 18:02:44 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 20 Jul 2012 17:02:44 +0000 Subject: [gpfsug-discuss] Great perf tool Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> http://collectl.sourceforge.net/index.html --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Fri Jul 20 23:23:54 2012 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sat, 21 Jul 2012 00:23:54 +0200 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <20120720222354.GB12126@dibs.tanso.net> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: > > http://collectl.sourceforge.net/index.html Another great framework for collecting performance data is Performance CoPilot: http://oss.sgi.com/projects/pcp/ It can collect and play live or re-play archived data from several nodes in the same gui (or tui) player. PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer only us old SGI IRIX admins that will be using it anymore :-) http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf It's already available in EPEL. -jf From viccornell at gmail.com Sat Jul 21 10:43:33 2012 From: viccornell at gmail.com (Vic Cornell) Date: Sat, 21 Jul 2012 10:43:33 +0100 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <20120720222354.GB12126@dibs.tanso.net> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> <20120720222354.GB12126@dibs.tanso.net> Message-ID: <6259820223170424573@unknownmsgid> I second that. The great thing about PCP is that it will monitor down to 1/10 second which is really usefull when you want to see what is realy going on. Good news about it inclusion in RHEL. It also has a mac and windows version so that you can instrument an entire setup and monitor it on the box of your choice. Runs best under IRIX though. . . . Kind Regards, Vic Vic Cornell Application Support Engineer DataDirect Networks Davidson House Forbury Square Reading RG1 3EU United Kingdom Mobile 07900 660 266 Skype viccornell www.ddn.com This email may contain confidential and privileged material for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient please contact the sender and delete all copies On 20 Jul 2012, at 23:24, Jan-Frode Myklebust wrote: > On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: >> >> http://collectl.sourceforge.net/index.html > > Another great framework for collecting performance data is Performance > CoPilot: > > http://oss.sgi.com/projects/pcp/ > > It can collect and play live or re-play archived data from several nodes > in the same gui (or tui) player. > > PCP is finally scheduled for inclusion in RHEL, so it's hopefully no > longer only us old SGI IRIX admins that will be using it anymore :-) > > http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf > > It's already available in EPEL. > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Tue Jul 24 09:22:13 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Tue, 24 Jul 2012 08:22:13 +0000 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <20120720222354.GB12126@dibs.tanso.net> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> <20120720222354.GB12126@dibs.tanso.net> Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE76B@WARVWEXC1.uk.deluxe-eu.com> Perhaps you or Vic could give a quick run through PCP at the next UG meeting? > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Jan-Frode Myklebust > Sent: 20 July 2012 23:24 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Great perf tool > > On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: > > > > http://collectl.sourceforge.net/index.html > > Another great framework for collecting performance data is Performance > CoPilot: > > http://oss.sgi.com/projects/pcp/ > > It can collect and play live or re-play archived data from several nodes in the > same gui (or tui) player. > > PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer > only us old SGI IRIX admins that will be using it anymore :-) > > http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap > .pdf > > It's already available in EPEL. > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Mon Jul 2 14:03:21 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:03:21 +0000 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has Message-ID: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Just had a lovely one. As I'm, sure all of you are aware by now, there's been much fun with some of the NTP Stratum 1 servers not correctly accounting for the leap-seocnd last night. http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html You may wish to turn off ntp on your servers and correct your NTP to trusted servers. A clock skew from ntp.pool.org just took out one of our servers and the node was expelled from the cluster. Jez --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Mon Jul 2 14:12:46 2012 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Mon, 02 Jul 2012 14:12:46 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <4FF19E4E.3040802@ed.ac.uk> Hi Jez, We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems. What OS are you running on your affected server? Cheers, Orlando. On 02/07/12 14:03, Jez Tucker wrote: > Just had a lovely one. > > As I?m, sure all of you are aware by now, there?s been much fun with > some of the NTP Stratum 1 servers not correctly accounting for the > leap-seocnd last night. > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html > > You may wish to turn off ntp on your servers and correct your NTP to > trusted servers. > > A clock skew from ntp.pool.org just took out one of our servers and the > node was expelled from the cluster. > > Jez > > --- > > Jez Tucker > > Senior Sysadmin > > Rushes > > DDI: +44 (0) 207 851 6276 > > http://www.rushes.co.uk > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From bevans at canditmedia.co.uk Mon Jul 2 14:27:52 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 14:27:52 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF19E4E.3040802@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF19E4E.3040802@ed.ac.uk> Message-ID: I've had a few (this is growing number as the day goes on) number of customers with IBM Storage Manager eating loads of CPU cycles and causing slow downs as a result (SLES 10 SP1, mostly). The common factor, at least at this end, seems to be ntp sycning against public ntp pool servers and Java, but Jez's report...... scary stuff... Cheers, Barry On 2 Jul 2012, at 14:12, Orlando Richards wrote: > Hi Jez, > > We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems. > > What OS are you running on your affected server? > > Cheers, > Orlando. > > > On 02/07/12 14:03, Jez Tucker wrote: >> Just had a lovely one. >> >> As I?m, sure all of you are aware by now, there?s been much fun with >> some of the NTP Stratum 1 servers not correctly accounting for the >> leap-seocnd last night. >> >> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml >> >> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html >> >> You may wish to turn off ntp on your servers and correct your NTP to >> trusted servers. >> >> A clock skew from ntp.pool.org just took out one of our servers and the >> node was expelled from the cluster. >> >> Jez >> >> --- >> >> Jez Tucker >> >> Senior Sysadmin >> >> Rushes >> >> DDI: +44 (0) 207 851 6276 >> >> http://www.rushes.co.uk >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > -- > Dr Orlando Richards > Information Services > IT Infrastructure Division > Unix Section > Tel: 0131 650 4994 > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Mon Jul 2 14:47:21 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:47:21 +0000 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF19E4E.3040802@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF19E4E.3040802@ed.ac.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9010@WARVWEXC1.uk.deluxe-eu.com> Funnily enough RH Ent 6.2 > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Orlando Richards > Sent: 02 July 2012 14:13 > To: gpfsug-discuss at gpfsug.org > Subject: Re: [gpfsug-discuss] NTP leap-second can take out GPFS server - it > just has > > Hi Jez, > > We've had a few issues with the leap second - but so far it has been isolated > to redhat 6.2 systems. > > What OS are you running on your affected server? > > Cheers, > Orlando. > > > On 02/07/12 14:03, Jez Tucker wrote: > > Just had a lovely one. > > > > As I'm, sure all of you are aware by now, there's been much fun with > > some of the NTP Stratum 1 servers not correctly accounting for the > > leap-seocnd last night. > > > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012- > Websi > > tes-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second- > brings- > > down-top-websites.html > > > > You may wish to turn off ntp on your servers and correct your NTP to > > trusted servers. > > > > A clock skew from ntp.pool.org just took out one of our servers and > > the node was expelled from the cluster. > > > > Jez > > > > --- > > > > Jez Tucker > > > > Senior Sysadmin > > > > Rushes > > > > DDI: +44 (0) 207 851 6276 > > > > http://www.rushes.co.uk > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > -- > Dr Orlando Richards > Information Services > IT Infrastructure Division > Unix Section > Tel: 0131 650 4994 > > The University of Edinburgh is a charitable body, registered in Scotland, with > registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From j.buzzard at dundee.ac.uk Mon Jul 2 14:59:33 2012 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Mon, 2 Jul 2012 14:59:33 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <4FF1A945.5000100@dundee.ac.uk> On 02/07/12 14:03, Jez Tucker wrote: > Just had a lovely one. > > As I?m, sure all of you are aware by now, there?s been much fun with > some of the NTP Stratum 1 servers not correctly accounting for the > leap-seocnd last night. > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html > > You may wish to turn off ntp on your servers and correct your NTP to > trusted servers. > > A clock skew from ntp.pool.org just took out one of our servers and the > node was expelled from the cluster. Hum, not sure I would run my production servers directly off something from ntp.pool.org, I would at least put a local server in between. Not notice any problems here, but then we are running latest RHEL 5.8 and latest IBM Storage Manager (10.83) :-) JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH The University of Dundee is a registered Scottish Charity, No: SC015096 From Jez.Tucker at rushes.co.uk Mon Jul 2 14:59:34 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:59:34 +0000 Subject: [gpfsug-discuss] Samba mapping of "special" SID entries In-Reply-To: References: <4FE486B2.1050501@ed.ac.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9047@WARVWEXC1.uk.deluxe-eu.com> Now I've located my GPFSUG from within Outlook... I'm presuming you're creating an ACL with the equivalent of 2775 permissions and the owner file system being 'nfsv4', rather than 'all'? Your nfsv3 clients have nfsv4 acl support installed? Jez > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Luke Raimbach > Sent: 22 June 2012 17:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Samba mapping of "special" SID entries > > Hi Orlando, > > I've been having success using Centrify to manage UID/GID mappings for our > very small mixed cluster (7 x Linux, 1 x Windows 2008R2). > > I've created a map for "CREATOR / OWNER", "SYSTEM", "Domain Admins", > etc. group SIDs and use the Windows node to manage ACLs. When the > windows node applies the ACLs, these seem to translate successfully in to > GPFS ACLs and work nicely for the mixed environment allowing users on > both Linux and Windows systems to manipulate each other's files. > > People are mounting the FS via NFS (exported via the NSD Linux servers) > and CIFS (shared from Win2k8R2). The permissions don't look friendly when > you run ls -l on a Linux system over NFS but the ACLs do their job in > preserving inheritable permissions, etc. If people want to see the 'real' ACL, > they need to use mmgetacl on a GPFS attached node (or windows users > simply click on the security tab under properties of a file). > > Drop me a line off-list if you want to take a look at what we've got remotely. > I can run a webex session from the Windows node if you want to have a > good poke around. > > Luke. > > -- > > Luke Raimbach > IT Manager > Oxford e-Research Centre > 7 Keble Road, > Oxford, > OX1 3QG > > +44(0)1865 610639 > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > > bounces at gpfsug.org] On Behalf Of Orlando Richards > > Sent: 22 June 2012 15:53 > > To: gpfsug-discuss at gpfsug.org > > Subject: [gpfsug-discuss] Samba mapping of "special" SID entries > > > > Hi all, > > > > Has anyone bumped up against the "nfs4: special" option in GPFS/Samba > > deployments which manipulates how the "owner" and "group owner" > (and > > "everybody") behaviour is mapped to ACLs when accessed via the samba > > stack? > > > > In particular, with the "default" setting (if one blindly follows the > > worked examples on this) of nfs4: special, if a user adds themselves > > specifically to an ACL, this creates an entry: > > > > special:@owner > > > > rather than: > > > > user:username > > > > which has the knock-on effect that if a file/folder is created under > > this ACL by a different owner (or if ownership changes), the person > > who put said ACL on to the file/folder no longer has access. Most > > people find this confusing (which is putting it politely). > > > > To further complicate matters, the "special" windows SID's*[1] - such > > as "CREATOR/OWNER" - don't seem to work properly in the > > ctdb/samba/gpfs stack (I don't know if they do in "normal" samba > > though). IBM don't support CREATOR/OWNER in SONAS*[2] - so it's not > just me! > > > > So my question is - has anyone else been looking into this at all, and > > if so, do you have any sage words of wisdom to offer? > > > > Cheers, > > Orlando. > > > > > > *[1] http://support.microsoft.com/kb/163846 > > *[2] > > http://pic.dhe.ibm.com/infocenter/sonasic/sonas1ic/index.jsp?topic=%2F > > c om.ibm.sonas.doc%2Fadm_authorization_limitations.html > > > > > > -- > > -- > > Dr Orlando Richards > > Information Services > > IT Infrastructure Division > > Unix Section > > Tel: 0131 650 4994 > > > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Mon Jul 2 15:05:25 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 14:05:25 +0000 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <4FE8720C.7040007@gmail.com> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. That said, I've not tried it yet. It's on my list of things to R&D. OFED/ROCE/iWARP: http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali Sent: 25 June 2012 15:14 To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] HPC people - interconnects On 25/06/12 15:08, Jez Tucker wrote: Do you all use IB? Has anyone tried RDMA over 10G via the OFED stack? Most of our customers we use RDMA over verbs Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? -- regards, Arif -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at canditmedia.co.uk Mon Jul 2 22:16:15 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 22:16:15 +0100 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards Cheers, B On 2 Jul 2012, at 15:05, Jez Tucker wrote: > Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. > That said, I?ve not tried it yet. It?s on my list of things to R&D. > > OFED/ROCE/iWARP: > > http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html > > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali > Sent: 25 June 2012 15:14 > To: gpfsug-discuss at gpfsug.org > Subject: Re: [gpfsug-discuss] HPC people - interconnects > > On 25/06/12 15:08, Jez Tucker wrote: > Do you all use IB? > > Has anyone tried RDMA over 10G via the OFED stack? > > > Most of our customers we use RDMA over verbs > > Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? > > -- > regards, > > Arif > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at canditmedia.co.uk Mon Jul 2 22:25:02 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 22:25:02 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF1A945.5000100@dundee.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF1A945.5000100@dundee.ac.uk> Message-ID: This has so far hit all almost all of the places I work with (not so much GPFS crashing, but certainly storage manager going bezerk) - the majority of them do not use public NTP servers. In most cases no one actually noticed until it was pointed out, well worth a quick 'top' of your storage servers if you're using Engenio/LSI/NetApp based units (ie, DS3/4/5000). The fix is here: http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ But of course I wouldn't mess with the time in production unless you've got GPFS shutdown first. Cheers, B On 2 Jul 2012, at 14:59, Jonathan Buzzard wrote: > On 02/07/12 14:03, Jez Tucker wrote: >> Just had a lovely one. >> >> As I?m, sure all of you are aware by now, there?s been much fun with >> some of the NTP Stratum 1 servers not correctly accounting for the >> leap-seocnd last night. >> >> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml >> >> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html >> >> You may wish to turn off ntp on your servers and correct your NTP to >> trusted servers. >> >> A clock skew from ntp.pool.org just took out one of our servers and the >> node was expelled from the cluster. > > Hum, not sure I would run my production servers directly off something > from ntp.pool.org, I would at least put a local server in between. > > Not notice any problems here, but then we are running latest RHEL 5.8 > and latest IBM Storage Manager (10.83) :-) > > JAB. > > -- > Jonathan A. Buzzard Tel: +441382-386998 > Storage Administrator, College of Life Sciences > University of Dundee, DD1 5EH > > The University of Dundee is a registered Scottish Charity, No: SC015096 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Tue Jul 3 11:38:10 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Tue, 3 Jul 2012 10:38:10 +0000 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>, <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A93A7@WARVWEXC1.uk.deluxe-eu.com> Here's the stack: https://www.openfabrics.org/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html VERBS is supported over 10GbE. It should work if OFED VERBS == IBM VERBS. --- Jez Tucker Senior SysAdmin Rushes www.rushes.co.uk ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Barry Evans [bevans at canditmedia.co.uk] Sent: 02 July 2012 22:16 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HPC people - interconnects It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards Cheers, B On 2 Jul 2012, at 15:05, Jez Tucker wrote: Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. That said, I?ve not tried it yet. It?s on my list of things to R&D. OFED/ROCE/iWARP: http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali Sent: 25 June 2012 15:14 To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] HPC people - interconnects On 25/06/12 15:08, Jez Tucker wrote: Do you all use IB? Has anyone tried RDMA over 10G via the OFED stack? Most of our customers we use RDMA over verbs Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? -- regards, Arif _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.buzzard at dundee.ac.uk Tue Jul 3 12:01:33 2012 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Tue, 3 Jul 2012 12:01:33 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF1A945.5000100@dundee.ac.uk> Message-ID: <4FF2D10D.2030701@dundee.ac.uk> On 02/07/12 22:25, Barry Evans wrote: > This has so far hit all almost all of the places I work with (not so > much GPFS crashing, but certainly storage manager going bezerk) - the > majority of them do not use public NTP servers. In most cases no one > actually noticed until it was pointed out, well worth a quick 'top' of > your storage servers if you're using Engenio/LSI/NetApp based units (ie, > DS3/4/5000). > > The fix is here: > http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ > Or you can upgrade to the latest version of storage manager. We are running 10.83 and it sailed through without issue. Now admittedly most people have probably not upgraded as it has only been out for a couple of weeks. I was very prompt on the upgrade as it allows the one Storage Manager install to manage both IBM DS3/4/5000 and Dell MD3xxx from the same program. JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH The University of Dundee is a registered Scottish Charity, No: SC015096 From sfadden at us.ibm.com Thu Jul 5 17:25:22 2012 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 5 Jul 2012 09:25:22 -0700 Subject: [gpfsug-discuss] GPFS Service Bulletin - July 5 2012 Message-ID: Date Added: July 5, 2012 Issue: IBM has identified an issue with GPFS file systems at versions 3.4 or 3.5 which were migrated from file systems created with GPFS versions earlier than 3.4. This issue can occur only after using the mmmigratefs command with the [--fastea]option. The issue can result in a loss of data, requiring the restoration of data from a backup source. GPFS file systems created with versions earlier than 3.4 should not be migrated using the mmmigratefs command with the [--fastea] option until a fix is provided from IBM. IBM plans to make the fix available in GPFS versions 3.5.0.3 (APAR IV24151) and 3.4.0.15 (APAR IV24150). An ifix will also be available from IBM service. If customers have already migrated file systems from GPFS versions earlier than 3.4, IBM service has a fix. Please follow the steps below to determine if your system may be affected. To determine if your system may be affected: 1. Ensure your GPFS file systems are mounted. 2. As a user with GPFS administrator privileges on a machine where your GPFS file systems are mounted, issue the command: /usr/lpp/mmfs/bin/mmfsadm dump stripe | grep "inode 0" The command will produce output that identifies locations for the "inode 0" file for all currently mounted GPFS file systems. Example output for a file system configured with two way meta-data replication would be in the form: inode 0: 3:4098 1:4098 For a file system with no meta-data replication the output would be in the form: inode 0: 3:4098 The relevant information to look for to see if you may experience a problem are the fields denoting : for each inode 0 replica (e.g. 3:4098 and 1:4098 in these examples). If each : replica only denotes 4098 for the sector field then you are not experiencing this problem. If however there is a number other than 4098 in the sector output then you are requested to immediately call IBM service and reference this problem. The IBM service person will walk you thru a fix for correcting the issue. Scott Fadden GPFS Technical Marketing Desk: (503) 578-5630 Cell: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/gpfs -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 6 11:44:49 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 6 Jul 2012 10:44:49 +0000 Subject: [gpfsug-discuss] Almaden Labs pNFS attending September 20th User Group Message-ID: <39571EA9316BE44899D59C7A640C13F5305AB3F7@WARVWEXC1.uk.deluxe-eu.com> Hello all I've received confirmation from Dean Hildebrand from the Almaden Research Labs that he will attend the September User Group to present pNFS. Dean is available on the 21st to meet post user group for relevant discussion. He will be based on London throughout his visit, owing to the proximity to AWE. If you would like to meet Dean to discuss pNFS further, please arrange this with myself via email. Cheers Jez --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 20 09:28:19 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 20 Jul 2012 08:28:19 +0000 Subject: [gpfsug-discuss] Your NSD server loadavg? Message-ID: <39571EA9316BE44899D59C7A640C13F5305ADDFF@WARVWEXC1.uk.deluxe-eu.com> Hello Just curious to see what your NSD server's loadavg is when under a normal job processing load. I.E SGE running tasks over NFS. --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 20 18:02:44 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 20 Jul 2012 17:02:44 +0000 Subject: [gpfsug-discuss] Great perf tool Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> http://collectl.sourceforge.net/index.html --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Fri Jul 20 23:23:54 2012 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sat, 21 Jul 2012 00:23:54 +0200 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <20120720222354.GB12126@dibs.tanso.net> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: > > http://collectl.sourceforge.net/index.html Another great framework for collecting performance data is Performance CoPilot: http://oss.sgi.com/projects/pcp/ It can collect and play live or re-play archived data from several nodes in the same gui (or tui) player. PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer only us old SGI IRIX admins that will be using it anymore :-) http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf It's already available in EPEL. -jf From viccornell at gmail.com Sat Jul 21 10:43:33 2012 From: viccornell at gmail.com (Vic Cornell) Date: Sat, 21 Jul 2012 10:43:33 +0100 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <20120720222354.GB12126@dibs.tanso.net> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> <20120720222354.GB12126@dibs.tanso.net> Message-ID: <6259820223170424573@unknownmsgid> I second that. The great thing about PCP is that it will monitor down to 1/10 second which is really usefull when you want to see what is realy going on. Good news about it inclusion in RHEL. It also has a mac and windows version so that you can instrument an entire setup and monitor it on the box of your choice. Runs best under IRIX though. . . . Kind Regards, Vic Vic Cornell Application Support Engineer DataDirect Networks Davidson House Forbury Square Reading RG1 3EU United Kingdom Mobile 07900 660 266 Skype viccornell www.ddn.com This email may contain confidential and privileged material for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient please contact the sender and delete all copies On 20 Jul 2012, at 23:24, Jan-Frode Myklebust wrote: > On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: >> >> http://collectl.sourceforge.net/index.html > > Another great framework for collecting performance data is Performance > CoPilot: > > http://oss.sgi.com/projects/pcp/ > > It can collect and play live or re-play archived data from several nodes > in the same gui (or tui) player. > > PCP is finally scheduled for inclusion in RHEL, so it's hopefully no > longer only us old SGI IRIX admins that will be using it anymore :-) > > http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf > > It's already available in EPEL. > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Tue Jul 24 09:22:13 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Tue, 24 Jul 2012 08:22:13 +0000 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <20120720222354.GB12126@dibs.tanso.net> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> <20120720222354.GB12126@dibs.tanso.net> Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE76B@WARVWEXC1.uk.deluxe-eu.com> Perhaps you or Vic could give a quick run through PCP at the next UG meeting? > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Jan-Frode Myklebust > Sent: 20 July 2012 23:24 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Great perf tool > > On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: > > > > http://collectl.sourceforge.net/index.html > > Another great framework for collecting performance data is Performance > CoPilot: > > http://oss.sgi.com/projects/pcp/ > > It can collect and play live or re-play archived data from several nodes in the > same gui (or tui) player. > > PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer > only us old SGI IRIX admins that will be using it anymore :-) > > http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap > .pdf > > It's already available in EPEL. > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Mon Jul 2 14:03:21 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:03:21 +0000 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has Message-ID: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Just had a lovely one. As I'm, sure all of you are aware by now, there's been much fun with some of the NTP Stratum 1 servers not correctly accounting for the leap-seocnd last night. http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html You may wish to turn off ntp on your servers and correct your NTP to trusted servers. A clock skew from ntp.pool.org just took out one of our servers and the node was expelled from the cluster. Jez --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Mon Jul 2 14:12:46 2012 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Mon, 02 Jul 2012 14:12:46 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <4FF19E4E.3040802@ed.ac.uk> Hi Jez, We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems. What OS are you running on your affected server? Cheers, Orlando. On 02/07/12 14:03, Jez Tucker wrote: > Just had a lovely one. > > As I?m, sure all of you are aware by now, there?s been much fun with > some of the NTP Stratum 1 servers not correctly accounting for the > leap-seocnd last night. > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html > > You may wish to turn off ntp on your servers and correct your NTP to > trusted servers. > > A clock skew from ntp.pool.org just took out one of our servers and the > node was expelled from the cluster. > > Jez > > --- > > Jez Tucker > > Senior Sysadmin > > Rushes > > DDI: +44 (0) 207 851 6276 > > http://www.rushes.co.uk > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From bevans at canditmedia.co.uk Mon Jul 2 14:27:52 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 14:27:52 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF19E4E.3040802@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF19E4E.3040802@ed.ac.uk> Message-ID: I've had a few (this is growing number as the day goes on) number of customers with IBM Storage Manager eating loads of CPU cycles and causing slow downs as a result (SLES 10 SP1, mostly). The common factor, at least at this end, seems to be ntp sycning against public ntp pool servers and Java, but Jez's report...... scary stuff... Cheers, Barry On 2 Jul 2012, at 14:12, Orlando Richards wrote: > Hi Jez, > > We've had a few issues with the leap second - but so far it has been isolated to redhat 6.2 systems. > > What OS are you running on your affected server? > > Cheers, > Orlando. > > > On 02/07/12 14:03, Jez Tucker wrote: >> Just had a lovely one. >> >> As I?m, sure all of you are aware by now, there?s been much fun with >> some of the NTP Stratum 1 servers not correctly accounting for the >> leap-seocnd last night. >> >> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml >> >> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html >> >> You may wish to turn off ntp on your servers and correct your NTP to >> trusted servers. >> >> A clock skew from ntp.pool.org just took out one of our servers and the >> node was expelled from the cluster. >> >> Jez >> >> --- >> >> Jez Tucker >> >> Senior Sysadmin >> >> Rushes >> >> DDI: +44 (0) 207 851 6276 >> >> http://www.rushes.co.uk >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > -- > Dr Orlando Richards > Information Services > IT Infrastructure Division > Unix Section > Tel: 0131 650 4994 > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Mon Jul 2 14:47:21 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:47:21 +0000 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF19E4E.3040802@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF19E4E.3040802@ed.ac.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9010@WARVWEXC1.uk.deluxe-eu.com> Funnily enough RH Ent 6.2 > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Orlando Richards > Sent: 02 July 2012 14:13 > To: gpfsug-discuss at gpfsug.org > Subject: Re: [gpfsug-discuss] NTP leap-second can take out GPFS server - it > just has > > Hi Jez, > > We've had a few issues with the leap second - but so far it has been isolated > to redhat 6.2 systems. > > What OS are you running on your affected server? > > Cheers, > Orlando. > > > On 02/07/12 14:03, Jez Tucker wrote: > > Just had a lovely one. > > > > As I'm, sure all of you are aware by now, there's been much fun with > > some of the NTP Stratum 1 servers not correctly accounting for the > > leap-seocnd last night. > > > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012- > Websi > > tes-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second- > brings- > > down-top-websites.html > > > > You may wish to turn off ntp on your servers and correct your NTP to > > trusted servers. > > > > A clock skew from ntp.pool.org just took out one of our servers and > > the node was expelled from the cluster. > > > > Jez > > > > --- > > > > Jez Tucker > > > > Senior Sysadmin > > > > Rushes > > > > DDI: +44 (0) 207 851 6276 > > > > http://www.rushes.co.uk > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > -- > Dr Orlando Richards > Information Services > IT Infrastructure Division > Unix Section > Tel: 0131 650 4994 > > The University of Edinburgh is a charitable body, registered in Scotland, with > registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From j.buzzard at dundee.ac.uk Mon Jul 2 14:59:33 2012 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Mon, 2 Jul 2012 14:59:33 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <4FF1A945.5000100@dundee.ac.uk> On 02/07/12 14:03, Jez Tucker wrote: > Just had a lovely one. > > As I?m, sure all of you are aware by now, there?s been much fun with > some of the NTP Stratum 1 servers not correctly accounting for the > leap-seocnd last night. > > http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml > > http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html > > You may wish to turn off ntp on your servers and correct your NTP to > trusted servers. > > A clock skew from ntp.pool.org just took out one of our servers and the > node was expelled from the cluster. Hum, not sure I would run my production servers directly off something from ntp.pool.org, I would at least put a local server in between. Not notice any problems here, but then we are running latest RHEL 5.8 and latest IBM Storage Manager (10.83) :-) JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH The University of Dundee is a registered Scottish Charity, No: SC015096 From Jez.Tucker at rushes.co.uk Mon Jul 2 14:59:34 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 13:59:34 +0000 Subject: [gpfsug-discuss] Samba mapping of "special" SID entries In-Reply-To: References: <4FE486B2.1050501@ed.ac.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9047@WARVWEXC1.uk.deluxe-eu.com> Now I've located my GPFSUG from within Outlook... I'm presuming you're creating an ACL with the equivalent of 2775 permissions and the owner file system being 'nfsv4', rather than 'all'? Your nfsv3 clients have nfsv4 acl support installed? Jez > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Luke Raimbach > Sent: 22 June 2012 17:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Samba mapping of "special" SID entries > > Hi Orlando, > > I've been having success using Centrify to manage UID/GID mappings for our > very small mixed cluster (7 x Linux, 1 x Windows 2008R2). > > I've created a map for "CREATOR / OWNER", "SYSTEM", "Domain Admins", > etc. group SIDs and use the Windows node to manage ACLs. When the > windows node applies the ACLs, these seem to translate successfully in to > GPFS ACLs and work nicely for the mixed environment allowing users on > both Linux and Windows systems to manipulate each other's files. > > People are mounting the FS via NFS (exported via the NSD Linux servers) > and CIFS (shared from Win2k8R2). The permissions don't look friendly when > you run ls -l on a Linux system over NFS but the ACLs do their job in > preserving inheritable permissions, etc. If people want to see the 'real' ACL, > they need to use mmgetacl on a GPFS attached node (or windows users > simply click on the security tab under properties of a file). > > Drop me a line off-list if you want to take a look at what we've got remotely. > I can run a webex session from the Windows node if you want to have a > good poke around. > > Luke. > > -- > > Luke Raimbach > IT Manager > Oxford e-Research Centre > 7 Keble Road, > Oxford, > OX1 3QG > > +44(0)1865 610639 > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > > bounces at gpfsug.org] On Behalf Of Orlando Richards > > Sent: 22 June 2012 15:53 > > To: gpfsug-discuss at gpfsug.org > > Subject: [gpfsug-discuss] Samba mapping of "special" SID entries > > > > Hi all, > > > > Has anyone bumped up against the "nfs4: special" option in GPFS/Samba > > deployments which manipulates how the "owner" and "group owner" > (and > > "everybody") behaviour is mapped to ACLs when accessed via the samba > > stack? > > > > In particular, with the "default" setting (if one blindly follows the > > worked examples on this) of nfs4: special, if a user adds themselves > > specifically to an ACL, this creates an entry: > > > > special:@owner > > > > rather than: > > > > user:username > > > > which has the knock-on effect that if a file/folder is created under > > this ACL by a different owner (or if ownership changes), the person > > who put said ACL on to the file/folder no longer has access. Most > > people find this confusing (which is putting it politely). > > > > To further complicate matters, the "special" windows SID's*[1] - such > > as "CREATOR/OWNER" - don't seem to work properly in the > > ctdb/samba/gpfs stack (I don't know if they do in "normal" samba > > though). IBM don't support CREATOR/OWNER in SONAS*[2] - so it's not > just me! > > > > So my question is - has anyone else been looking into this at all, and > > if so, do you have any sage words of wisdom to offer? > > > > Cheers, > > Orlando. > > > > > > *[1] http://support.microsoft.com/kb/163846 > > *[2] > > http://pic.dhe.ibm.com/infocenter/sonasic/sonas1ic/index.jsp?topic=%2F > > c om.ibm.sonas.doc%2Fadm_authorization_limitations.html > > > > > > -- > > -- > > Dr Orlando Richards > > Information Services > > IT Infrastructure Division > > Unix Section > > Tel: 0131 650 4994 > > > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Mon Jul 2 15:05:25 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Mon, 2 Jul 2012 14:05:25 +0000 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <4FE8720C.7040007@gmail.com> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. That said, I've not tried it yet. It's on my list of things to R&D. OFED/ROCE/iWARP: http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali Sent: 25 June 2012 15:14 To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] HPC people - interconnects On 25/06/12 15:08, Jez Tucker wrote: Do you all use IB? Has anyone tried RDMA over 10G via the OFED stack? Most of our customers we use RDMA over verbs Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? -- regards, Arif -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at canditmedia.co.uk Mon Jul 2 22:16:15 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 22:16:15 +0100 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards Cheers, B On 2 Jul 2012, at 15:05, Jez Tucker wrote: > Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. > That said, I?ve not tried it yet. It?s on my list of things to R&D. > > OFED/ROCE/iWARP: > > http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html > > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali > Sent: 25 June 2012 15:14 > To: gpfsug-discuss at gpfsug.org > Subject: Re: [gpfsug-discuss] HPC people - interconnects > > On 25/06/12 15:08, Jez Tucker wrote: > Do you all use IB? > > Has anyone tried RDMA over 10G via the OFED stack? > > > Most of our customers we use RDMA over verbs > > Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? > > -- > regards, > > Arif > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at canditmedia.co.uk Mon Jul 2 22:25:02 2012 From: bevans at canditmedia.co.uk (Barry Evans) Date: Mon, 2 Jul 2012 22:25:02 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: <4FF1A945.5000100@dundee.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF1A945.5000100@dundee.ac.uk> Message-ID: This has so far hit all almost all of the places I work with (not so much GPFS crashing, but certainly storage manager going bezerk) - the majority of them do not use public NTP servers. In most cases no one actually noticed until it was pointed out, well worth a quick 'top' of your storage servers if you're using Engenio/LSI/NetApp based units (ie, DS3/4/5000). The fix is here: http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ But of course I wouldn't mess with the time in production unless you've got GPFS shutdown first. Cheers, B On 2 Jul 2012, at 14:59, Jonathan Buzzard wrote: > On 02/07/12 14:03, Jez Tucker wrote: >> Just had a lovely one. >> >> As I?m, sure all of you are aware by now, there?s been much fun with >> some of the NTP Stratum 1 servers not correctly accounting for the >> leap-seocnd last night. >> >> http://www.dailymail.co.uk/news/article-2167588/Leap-second-2012-Websites-crash-time-hiccup-caused-online-chaos.html?ito=feeds-newsxml >> >> http://www.telegraph.co.uk/technology/news/9369671/Leap-second-brings-down-top-websites.html >> >> You may wish to turn off ntp on your servers and correct your NTP to >> trusted servers. >> >> A clock skew from ntp.pool.org just took out one of our servers and the >> node was expelled from the cluster. > > Hum, not sure I would run my production servers directly off something > from ntp.pool.org, I would at least put a local server in between. > > Not notice any problems here, but then we are running latest RHEL 5.8 > and latest IBM Storage Manager (10.83) :-) > > JAB. > > -- > Jonathan A. Buzzard Tel: +441382-386998 > Storage Administrator, College of Life Sciences > University of Dundee, DD1 5EH > > The University of Dundee is a registered Scottish Charity, No: SC015096 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Tue Jul 3 11:38:10 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Tue, 3 Jul 2012 10:38:10 +0000 Subject: [gpfsug-discuss] HPC people - interconnects In-Reply-To: <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> References: <39571EA9316BE44899D59C7A640C13F5305A65AD@WARVWEXC1.uk.deluxe-eu.com> <4FE8720C.7040007@gmail.com> <39571EA9316BE44899D59C7A640C13F5305A9078@WARVWEXC1.uk.deluxe-eu.com>, <697F7062-D779-48F3-AACA-53FCAA68BB66@canditmedia.co.uk> Message-ID: <39571EA9316BE44899D59C7A640C13F5305A93A7@WARVWEXC1.uk.deluxe-eu.com> Here's the stack: https://www.openfabrics.org/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html VERBS is supported over 10GbE. It should work if OFED VERBS == IBM VERBS. --- Jez Tucker Senior SysAdmin Rushes www.rushes.co.uk ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Barry Evans [bevans at canditmedia.co.uk] Sent: 02 July 2012 22:16 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HPC people - interconnects It's verbs support you need - not sure where OFED is up to with verbs over 10G, if that's even on the cards Cheers, B On 2 Jul 2012, at 15:05, Jez Tucker wrote: Theoretically using the OFED stack with a supporting card (E.G. Myrinet 10GbE) then you should be able to leverage RDMA. That said, I?ve not tried it yet. It?s on my list of things to R&D. OFED/ROCE/iWARP: http://voidreflections.blogspot.co.uk/2011/08/soft-roce-alternative-to-soft-iwarp.html From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Arif Ali Sent: 25 June 2012 15:14 To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] HPC people - interconnects On 25/06/12 15:08, Jez Tucker wrote: Do you all use IB? Has anyone tried RDMA over 10G via the OFED stack? Most of our customers we use RDMA over verbs Is this the same thing you mentioned a few weeks ago with respect to ROCE. Does gpfs even support this? -- regards, Arif _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.buzzard at dundee.ac.uk Tue Jul 3 12:01:33 2012 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Tue, 3 Jul 2012 12:01:33 +0100 Subject: [gpfsug-discuss] NTP leap-second can take out GPFS server - it just has In-Reply-To: References: <39571EA9316BE44899D59C7A640C13F5305A8FD2@WARVWEXC1.uk.deluxe-eu.com> <4FF1A945.5000100@dundee.ac.uk> Message-ID: <4FF2D10D.2030701@dundee.ac.uk> On 02/07/12 22:25, Barry Evans wrote: > This has so far hit all almost all of the places I work with (not so > much GPFS crashing, but certainly storage manager going bezerk) - the > majority of them do not use public NTP servers. In most cases no one > actually noticed until it was pointed out, well worth a quick 'top' of > your storage servers if you're using Engenio/LSI/NetApp based units (ie, > DS3/4/5000). > > The fix is here: > http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ > Or you can upgrade to the latest version of storage manager. We are running 10.83 and it sailed through without issue. Now admittedly most people have probably not upgraded as it has only been out for a couple of weeks. I was very prompt on the upgrade as it allows the one Storage Manager install to manage both IBM DS3/4/5000 and Dell MD3xxx from the same program. JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH The University of Dundee is a registered Scottish Charity, No: SC015096 From sfadden at us.ibm.com Thu Jul 5 17:25:22 2012 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 5 Jul 2012 09:25:22 -0700 Subject: [gpfsug-discuss] GPFS Service Bulletin - July 5 2012 Message-ID: Date Added: July 5, 2012 Issue: IBM has identified an issue with GPFS file systems at versions 3.4 or 3.5 which were migrated from file systems created with GPFS versions earlier than 3.4. This issue can occur only after using the mmmigratefs command with the [--fastea]option. The issue can result in a loss of data, requiring the restoration of data from a backup source. GPFS file systems created with versions earlier than 3.4 should not be migrated using the mmmigratefs command with the [--fastea] option until a fix is provided from IBM. IBM plans to make the fix available in GPFS versions 3.5.0.3 (APAR IV24151) and 3.4.0.15 (APAR IV24150). An ifix will also be available from IBM service. If customers have already migrated file systems from GPFS versions earlier than 3.4, IBM service has a fix. Please follow the steps below to determine if your system may be affected. To determine if your system may be affected: 1. Ensure your GPFS file systems are mounted. 2. As a user with GPFS administrator privileges on a machine where your GPFS file systems are mounted, issue the command: /usr/lpp/mmfs/bin/mmfsadm dump stripe | grep "inode 0" The command will produce output that identifies locations for the "inode 0" file for all currently mounted GPFS file systems. Example output for a file system configured with two way meta-data replication would be in the form: inode 0: 3:4098 1:4098 For a file system with no meta-data replication the output would be in the form: inode 0: 3:4098 The relevant information to look for to see if you may experience a problem are the fields denoting : for each inode 0 replica (e.g. 3:4098 and 1:4098 in these examples). If each : replica only denotes 4098 for the sector field then you are not experiencing this problem. If however there is a number other than 4098 in the sector output then you are requested to immediately call IBM service and reference this problem. The IBM service person will walk you thru a fix for correcting the issue. Scott Fadden GPFS Technical Marketing Desk: (503) 578-5630 Cell: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/gpfs -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 6 11:44:49 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 6 Jul 2012 10:44:49 +0000 Subject: [gpfsug-discuss] Almaden Labs pNFS attending September 20th User Group Message-ID: <39571EA9316BE44899D59C7A640C13F5305AB3F7@WARVWEXC1.uk.deluxe-eu.com> Hello all I've received confirmation from Dean Hildebrand from the Almaden Research Labs that he will attend the September User Group to present pNFS. Dean is available on the 21st to meet post user group for relevant discussion. He will be based on London throughout his visit, owing to the proximity to AWE. If you would like to meet Dean to discuss pNFS further, please arrange this with myself via email. Cheers Jez --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 20 09:28:19 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 20 Jul 2012 08:28:19 +0000 Subject: [gpfsug-discuss] Your NSD server loadavg? Message-ID: <39571EA9316BE44899D59C7A640C13F5305ADDFF@WARVWEXC1.uk.deluxe-eu.com> Hello Just curious to see what your NSD server's loadavg is when under a normal job processing load. I.E SGE running tasks over NFS. --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Fri Jul 20 18:02:44 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 20 Jul 2012 17:02:44 +0000 Subject: [gpfsug-discuss] Great perf tool Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> http://collectl.sourceforge.net/index.html --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Fri Jul 20 23:23:54 2012 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sat, 21 Jul 2012 00:23:54 +0200 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <20120720222354.GB12126@dibs.tanso.net> On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: > > http://collectl.sourceforge.net/index.html Another great framework for collecting performance data is Performance CoPilot: http://oss.sgi.com/projects/pcp/ It can collect and play live or re-play archived data from several nodes in the same gui (or tui) player. PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer only us old SGI IRIX admins that will be using it anymore :-) http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf It's already available in EPEL. -jf From viccornell at gmail.com Sat Jul 21 10:43:33 2012 From: viccornell at gmail.com (Vic Cornell) Date: Sat, 21 Jul 2012 10:43:33 +0100 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <20120720222354.GB12126@dibs.tanso.net> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> <20120720222354.GB12126@dibs.tanso.net> Message-ID: <6259820223170424573@unknownmsgid> I second that. The great thing about PCP is that it will monitor down to 1/10 second which is really usefull when you want to see what is realy going on. Good news about it inclusion in RHEL. It also has a mac and windows version so that you can instrument an entire setup and monitor it on the box of your choice. Runs best under IRIX though. . . . Kind Regards, Vic Vic Cornell Application Support Engineer DataDirect Networks Davidson House Forbury Square Reading RG1 3EU United Kingdom Mobile 07900 660 266 Skype viccornell www.ddn.com This email may contain confidential and privileged material for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient please contact the sender and delete all copies On 20 Jul 2012, at 23:24, Jan-Frode Myklebust wrote: > On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: >> >> http://collectl.sourceforge.net/index.html > > Another great framework for collecting performance data is Performance > CoPilot: > > http://oss.sgi.com/projects/pcp/ > > It can collect and play live or re-play archived data from several nodes > in the same gui (or tui) player. > > PCP is finally scheduled for inclusion in RHEL, so it's hopefully no > longer only us old SGI IRIX admins that will be using it anymore :-) > > http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap.pdf > > It's already available in EPEL. > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Jez.Tucker at rushes.co.uk Tue Jul 24 09:22:13 2012 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Tue, 24 Jul 2012 08:22:13 +0000 Subject: [gpfsug-discuss] Great perf tool In-Reply-To: <20120720222354.GB12126@dibs.tanso.net> References: <39571EA9316BE44899D59C7A640C13F5305AE1B0@WARVWEXC1.uk.deluxe-eu.com> <20120720222354.GB12126@dibs.tanso.net> Message-ID: <39571EA9316BE44899D59C7A640C13F5305AE76B@WARVWEXC1.uk.deluxe-eu.com> Perhaps you or Vic could give a quick run through PCP at the next UG meeting? > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Jan-Frode Myklebust > Sent: 20 July 2012 23:24 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Great perf tool > > On Fri, Jul 20, 2012 at 05:02:44PM +0000, Jez Tucker wrote: > > > > http://collectl.sourceforge.net/index.html > > Another great framework for collecting performance data is Performance > CoPilot: > > http://oss.sgi.com/projects/pcp/ > > It can collect and play live or re-play archived data from several nodes in the > same gui (or tui) player. > > PCP is finally scheduled for inclusion in RHEL, so it's hopefully no longer > only us old SGI IRIX admins that will be using it anymore :-) > > http://rhsummit.files.wordpress.com/2012/03/burke_rhel_roadmap > .pdf > > It's already available in EPEL. > > > -jf > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss